A story about starting up...
Something I’ve gotten increasingly involved in over the years is keeping startup of the OpenJDK in check. It’s been a meandering journey, and this is a post about that journey. You’ve been warned!
2011-2013: Joining Oracle
I joined the Java SE Performance team at Oracle late 2011. Initially I worked on a few internal projects, mostly proprietary features and various support systems. A lot of time spent running benchmarks, setting up labs, doing some automation… that sort of stuff.
While improving and automating our regression tracking, we observed there was a slight but steady increase in startup time, and most of the time we couldn’t really pin it down to anything in particular. A “Death by a thousand cuts”.
So I spent some time learning what the JVM does during startup, learning various profiling and diagnostic techniques - callgrind and kcachegrind, perf stat
etc - and eventually wrote some internal tools to help zoom in on what happens early on in the lifetime of the JVM.
Early and now defunct versions of my tools was based on JFR - which missed a few spots during the earliest bootstrap but still was good enough for a few things. Slowly but surely I started pinning down the exact cause of a few tiny regressions.
2014: Hello OpenJDK!
JDK 8 had been released, and JDK 9 development was ramping up. Around this time I decided to try and get more involved in actual OpenJDK development. I had made a couple rare appearances on the OpenJDK lists before, but not really made anything real.
But now I went looking for some simple improvements I could work on on my own. I didn’t really have any set goals at this point except getting more comfortable with the process and attain the Committer role so I could work more independently.
In hindsight I’m especially thankful to Mike Duigou for helping out sponsoring a few patches and mentoring me in these early days. And it’s quite fitting to this story that the first patch I contributed this time around fixed a really tiny startup inefficiency.
Everything counts, right?
By the end of the year I was fixing quite a few small inefficiencies, and some that actually had some impact.
2015: Learning the ropes
I started finding more and more improvements to work on, and perhaps got a bit too excited:
- A complicated rewrite of ZipEntry to address startup time which might have caused a few bugs down the line..
- A G1 optimization! What was I ever doing in GC code?! :)
Still, most of the things I did was small micro-optimizations, and as such was met mostly with apathy. We should be focusing more on big ticket features! Mind the process overheads, the cost of testing! And the long release cycles…
Yes, I knew most improvements I’d done so far are insignificant in isolation. But also that it’s the eventual accumulation of improvements (or regressions) that make or break a product. So I ignored the negativity and kept finding myself small things to work on as side projects, hoping that it’d eventually add up.
Some time during all this people noticed that the bootstrap overheads of lambdas was non-trivial. Aleksey Shipilëv filed a bug that was bounced around for a bit before finally ending up on my desk. I guess I was becoming that guy…
2016: Jigsaw is coming.
Except for a few tiny improvements, I didn’t make much early progress on making lambdas bootstrap faster. Instead I got pulled into the Jigsaw project.
Jigsaw had been going on for a while, and the current builds were - dare I say it - slow. Running Hello World could take upwards of 400ms on my workstation, compared to the ~100ms startup times of JDK 8 on the same machine. Ouch!
Alas, everything is connected.
While there were definitely other things to optimize in the jigsaw implementation itself, a large chunk of the reason running anything with these Jigsaw builds was slow was due early use of lambdas. So while we agreed to remove any use of lambdas in the bootstrap sequence for the time being – which cut the startup times roughly in half – I also found time to work on some new ideas to bootstrap lambdas faster that came out of the jigsaw project itself.
One such idea was to use a jlink plugin to generate some of the classes generated by lambda usage. This proved somewhat successful, and overheads of ISC and lambda usage kept going down as I added similar things.
Optimizing things during link time was a cool, new thing for the OpenJDK, so I even held a talk about it at JVMLS:
2017: Life in a post-Jigsaw world
Thanks to a huge amount of effort from Mandy Chung, Alan Bateman and others, myself included, we were able to trim things down substantially, while not letting implementation details stand in the way for the project goals.
Still, JDK 9 shipped with some startup regressions. Maybe 10-25ms longer to run a Hello World compared to JDK 8, depending on machine and OS. However, this was the first release in the more rapid release cadence. As it was not destined to become an LTS release, we saw an opportunity to deal with a few of these regressions. It felt the attitude towards small, incremental improvements became more open and encouraging.
So we kept at it. Alan found a way to improve the module resolution, cutting away 10-15ms in one big chunk. I kept finding minor things to micro-optimize, both in the JDK libraries and inside the VM (turns out Jigsaw wasn’t the only thing to blame for those regressions in JDK 9!).
When we wrapped up development of JDK 10 startup numbers were quite evenly matched with JDK 8 (on most machines).
Around this time I found time to release a simple tool I had written to help investigating JVM startup: bytestacks. Maybe someone could use it to find something that we’d missed.
2018: Crouching Lambdas, Hidden LambdaForms
I was looking at a small startup regression due the introduction of condy when it dawned on me that much of the initial work done bootstrapping lambdas came was actually unnecessary - and there was an easy - but hacky - fix that reduced the overhead by 75%.
I found a hack which reduce the one-off bootstrap overhead of using lambdas by ~75% (15ms or so): https://t.co/MdStfKDe8D
— redestad (@cl4es) 21 februari 2018
How had I missed this for years?!
This became a good talking point for another JVMLS talk, in which I tried to outline some current challenges:
I guess I’ve delivered on some of the ideas I discussed in that talk, but I’ve also gone off in entirely different directions. And of course: for every thing you fix, at least two new challenges appears.
Most recently I finished up a number of optimizations targetting startup overheads of ISC, including some MethodHandle API performance improvements that improves setup costs (number of classes generated etc) of certain shapes. JDK 12 will need about half the amount of time and resources to bootstrap ISC callsites compared to JDK 11, while retaining peak performance characteristics.
Overall things are set to be quite a bit snappier. And I seem to keep finding things to do and learn about. Great, huh?
I intend to blog more in-depth about a few of the things outlined above in the weeks to come. Some of it might even get interesting.