cljinit woes
Earlier this week it was pointed out to me that the latest JDK updates (8u202/11.0.2) has a rather large startup/performance issue when running Clojure in certain modes.
@cl4es https://t.co/3sGSA1PmFF I have a potential fix in Clojure but oooof!
— Ghadi Shayban (@smashthepast) 18 februari 2019
Others were reporting similar issues, and all seemed to be related to Clojure and startup. We tried narrowing down the slowdown to something simpler with some success, and identified a few bottlenecks.
The root issue is a security-sensitive class initialization bug fix that ensures static methods aren’t marked as resolved until the class has been fully initialized.
The manifestation of the issue in clojure seems more complex than our simple reproducers suggested, but with some assistance from Ghadi Shayban along with ample help from David Holmes and Vladimir Ivanov we now have a pretty good picture of what’s going on.
I’ll reason about the issues using Java code. Sorry. :-)
Baseline
Let’s start with a no-op baseline where we do nothing a million times. Typically such code would be eliminated by the JIT pretty fast:
public class Baseline {
static {
long start = System.nanoTime();
for (int i = 0; i < 1_000_000; i++) {
}
long end = System.nanoTime();
System.out.println("Elapsed: " + (end - start) + " ns");
}
public static void main(String... args) {}
}
Result:
$ ~/jdks/11.0.1/bin/java Baseline.java
Elapsed: 4964249 ns
$ ~/jdks/11.0.2/bin/java Baseline.java
Elapsed: 4946020 ns
It takes a few milliseconds for the JIT to wake up and eliminate
the loop, which is what I expected. There’s also no regression
in 11.0.2. So now we at least know the issue isn’t just a
result of doing a lot of work in a <clinit>
. Phew!
Bad
public class Bad {
static void foo() {}
static {
long start = System.nanoTime();
for (int i = 0; i < 1_000_000; i++) {
foo();
}
long end = System.nanoTime();
System.out.println("Elapsed: " + (end - start) + " ns");
}
public static void main(String... args) {}
}
When Bad
initializes, it will call static method foo
a million
times.
$ ~/jdks/11.0.1/bin/java Bad.java
Elapsed: 14956790 ns
$ ~/jdks/11.0.2/bin/java Bad.java
Elapsed: 1596766461 ns
Two orders of magnitude worse in 11.0.2: Oooof!
Profiling we see a lot of time spent resolving static methods, with massive overheads in the interpreter and in the JIT compiler code.
In essence foo
will be resolved and linked every time it’s called,
a result of the bug fix that went into 8u202, 11.0.2, etc. Resolving
a method is relatively expensive, and aggressively cached - but now
with the caveat that the method resolution won’t be cached until its
holder class has been fully initialized.
This particular case is accidentally improved in 13-b07 by JDK-8188133, mostly by alleviating and avoiding work in the JIT compiler. This doesn’t seem to do much for the clojure case, though.
Good
As a workaround we can move static utility methods to a utility class that is allowed to initialize completely before calling into it.
public class Good {
public static class StartupsLittleHelper {
static void foo() {}
}
static {
long start = System.nanoTime();
for (int i = 0; i < 1_000_000; i++) {
StartupsLittleHelper.foo();
}
long end = System.nanoTime();
System.out.println("Elapsed: " + (end - start) + " ns");
}
public static void main(String... args) {}
}
Good
is equivalent to Bad
, just that foo
has been moved to StartupsLittleHelper
.
Result:
$ ~/jdks/11.0.1/bin/java Good.java
Elapsed: 9130700 ns
$ ~/jdks/11.0.2/bin/java Good.java
Elapsed: 8938148 ns
No regression in 11.0.2, and in fact it’s faster than Bad
in
both 11.0.1 and 11.0.2. This is likely because the compiler can be more
aggressive when dealing with fully initialized classes.
AlsoBad
Just a caution: Refactoring out static methods to helper classes is likely not going to be enough if all they do is call back into the not-fully-initialized class:
public class AlsoBad {
static void foo() {}
public static class StartupsLittleHelper {
static void foo() { AlsoBad.foo(); }
}
static {
long start = System.nanoTime();
for (int i = 0; i < 1_000_000; i++) {
StartupsLittleHelper.foo();
}
long end = System.nanoTime();
System.out.println("Elapsed: " + (end - start) + " ns");
}
public static void main(String... args) {}
}
$ ~/jdks/11.0.1/bin/java AlsoBad.java
Elapsed: 18997877 ns
$ ~/jdks/11.0.2/bin/java AlsoBad.java
Elapsed: 1694381527 ns
Based on the profiles I’ve collected running the clojure reproducers, it seems like
there’s a mix of Bad
and AlsoBad
style calls happening.
What now?
We’ll work to resolve some of the corner cases exposed in the OpenJDK here. There are no guarantees of how much we’ll be able to recuperate. At the very least it will take some time before a fix can be delivered.
Working around the performance drop by refactoring heavy-lifting to utility classes should have lasting benefits.