This is the first article in the series Boost startup time on JVM where I explore different options to reduce startup time for non trivial JVM-based web applications.
JVM-based applications have traditionally struggled with slow startup times. However, recent improvements in the JDK and better support from frameworks like Spring Boot have made it much easier to speed up startup.
We can now achieve a blazing fast startup time with a limited effort!
In this article the first of series I’ll share my experience with a first option
- Why is so important to be fast at startup?
- How fast can we go ?
- What happens at startup in a HotSpot JVM
- Avoid the trap of oversimplified applications!
- A cheap option for fast startup time: project CRaC
- Boost Live Kitchen startup time!
- Guidelines and attention points
- Wrap up
Why is so important to be fast at startup? Link to heading
A fast startup time brings the following benefits mainly but not only in cloud environment:
- 💸 Cost saving: only needed server are available to serve current traffic
- 💪 Resilience: the system can react to on unexpected traffic peaks without the need of over provisioning
- 💰 Unlock “Scale to zero” That means no server active when there is no need because it is possible to start them up very quickly.
This means that we have the possibility to make a relevant reduction of cloud infrastructure costs and keep same or better level of resilience 🤑
How fast can we go? Link to heading
I assume you’re eager to know, ‘What improvements have you made? Show me the numbers!’'
I was able to boost startup time from 3/4 seconds to about 160 ms 🚀 on two pseudo realistic apps: Live Kitchen and Spring pet clinic
it is more than 90% of speed improvement! …not bad! 😎
Using an approach with
- almost no change on application code 🎉
- some additional work on infrastructure needed only once 🎉
But also with
- some additional concerns to consider
- some additional problems to solve
⚠ Note: the figures above are related on two pseudo realistic ( but still not real production ) application so take them as an indicator of what is possible to achieve
I hope that now your curiosity is triggered so let’s start to see one of the options to achieve fast startup time on jvm
What happens at startup in a HotSpot JVM Link to heading
A Java Virtual Machine ( JVM ) is a software machine that simulates what a real machine does. Like a real machine, it has an instruction set, a virtual computer architecture and an execution model.
Java source code is compiled into bytecode that is executed by the HotSpot JVM on a virtual stack machine that handle different instructions.Each instruction identified by an 8-bit numerical opcode; hence, the name bytecode.
An HotSpot JVM contains performance-boosting just-in-time (JIT) compilation technology that profiles your program’s execution and selectively optimizes “hot spots” the parts it decides will benefit the most by compiling and caching them into native code on the fly using knowledge of the underlying system architecture.
Implications:
-
✅ dynamic features ( reflection , dynamic class loading ) make program more expressive
-
✅ dynamic compilation means more information to make better optimization decisions
-
❌ the cost of this dynamism is slower startup time
We will see how to keep all those dynamism benefits and improve the startup time with CRaC - Checkpoint & Restore at Checkpoint but before that let’s avoid wating time with some common traps
Avoid the trap of oversimplified applications! Link to heading
Working with overly simplistic applications—such as those relying on in-memory databases, lacking HTTP calls, or avoiding serialization—can give a misleading sense of mastery over concepts and technologies 😮
However, when we transition to real-world production environments, we encounter complex, real-life challenges that are much harder to resolve 😓
How I fell into the trap again!: the PetClinic and in-memory database case Link to heading
It happened to me again! 😓 I was trying to improve the startup time of the PetClinic application using CRaC. Following some tutorials, I got it working—and it was so easy! But then, I tried connecting it to a remote MySQL database, and that’s when the struggle began 😢.
You definitely don’t want this to happen in a production environment!
If you’re interested, I’ve shared a separate post where I explain how I forked the PetClinic application and successfully applied CRaC concepts to make it work.
A pseudo realistic app to the rescue: Live Kitchen Link to heading
To work with a realistic application i built Live Kitchen The stack used is
- jdk 21 / kotlin / Spring boot 3.3 / multi module
This application has
- non trivial domain logic: elaboration of optimized recipe steps execution plan
- remote mysql database
- parallel http calls to an external service with virtual threads .This means handle with data serialization
- thymeleaf template engine that use reflection
- multi modular code to allow domain infrastructure decopling
A cheap option for fast startup time: project CRaC Link to heading
CRaC is based on Checkpoint & Restore In Userspace (CRIU), a project to implement checkpoint and restore functionality for Linux. CRIU allows freezing a container or an individual application and restoring it from the saved checkpoint files.
Restoring an application from a previous checkpoint means to leverage most of JVM work done previously and allows a much faster startup time 🚀
Requirements:
-
Linux where CRIU is available
-
Jdk distribution with built-in support for CRaC like Azul Zulu
-
Spring 6.1 and SpringBoot 3.2 ( since 2023 ) or ( Micronaut / Quarkus )
-
The presence of the
org.crac:crac
library in the classpath. -
Specifying the required
java
command-line parameters like-XX:CRaCCheckpointTo=PATH
or-XX:CRaCRestoreFrom=PATH
-
CRaC requires that all connections must be closed before checkpoint and be restorable on restore. What this means ?
This is a quite cheap solution in terms of development time! 😎 Developers keep using actual application with with minor changes. ( See points 4, 5, 6 ) An additional work needed to create a docker image including the checkpoint is needed only once!
Boost Live Kitchen startup time! Link to heading
The Live Kitchen application uses Spring Data Jdbc to connect to mysql.
- On Linux make sure you have the permissions to run CRIU
sudo chown root:root $JAVA_HOME/lib/criu
sudo chmod u+s $JAVA_HOME/lib/criu
- Add crac dependency
<dependency>
<groupId>org.crac</groupId>
<artifactId>crac</artifactId>
<version>1.4.0</version>
</dependency>
- enable hikari pool suspension
spring.datasource.hikari.allow-pool-suspension=true
- start with a flag to indicate where to store the checkpoint
java -XX:CRaCCheckpointTo=./tmp_manual_checkpoint -jar application/target/*.jar
output
Bootstrapping Spring Data JDBC repositories in DEFAULT mode.
...
...
Started LiveKitchenApplicationKt in 2.616 seconds (process running for 2.95)
- trigger a dump of checkpoint with the following command on PID related to previous step
jcmd $1 JDK.checkpoint
output
Starting checkpoint
Suspending Hikari pool
Evicting Hikari connections
- restore application from the same checkpoint 🏃
java -XX:CRaCRestoreFrom=./tmp_manual_checkpoint
output
Resuming Hikari pool
Tomcat started on port 8080 (http) with context path '/'
Spring-managed lifecycle restart completed (restored JVM running for 165 ms)
So ….. the 💪 enhanced Live kitchen application started in 165 ms instead of 3 seconds! 🎉 😎
Feel free to check it out and give me feedbacks!
Guidelines and attention points Link to heading
Azul CRaC guidelines Link to heading
The creator of CRaC specification reccommend to identify all classes in your code that are considered “resources” and to react properly to checkpoint creation or restore events.
Those resources should implement the org.crac.Resource that force us to specify what to do before and after a checkpoint happens with beforeCheckpoint()
and afterRestore()
which are callbacks by the JVM.
Example of resource implementation
public class MyResource implements Resource {
public MyClass() {
Core.getGlobalContext().register(this);
}
@Override
public void beforeCheckpoint(Context<? extends Resource> context) {
/* ... */
}
@Override
public void afterRestore(Context<? extends Resource> context) {
/* ... */
}
}
Also the following flags are needed to define the checkpoint location -XX:CRaCCheckpointTo=./tmp_auto_checkpoint
and -XX:CRaCRestoreFrom=./tmp_manual_checkpoint
more info can be found here
Spring simplify the adoption of CRaC specification Link to heading
Spring allows to create checkpoint in two ways:
1. Automatic checkpoint/restore at startup Link to heading
When -Dspring.context.checkpoint=onRefresh
is set a checkpoint is created automatically at startup during the LifecycleProcessor.onRefresh phase
2. On-demand checkpoint/restore of a running application Link to heading
A checkpoint can be created on demand, for example using a command like
jcmd application.jar JDK.checkpoint
Before the creation of the checkpoint, Spring stops all the running beans, giving them a chance to close resources if needed by implementing Lifecycle.stop
more info can be found here
Wrap up Link to heading
Pro:
- A limited development effort is needed
- It leverages the dynamism of JVM almost without limitations
- Amazing improvement on startup time
Cons:
- Is limited to Linux os
- More advanced deployment pipelines are required
- Additional security topics to solve
- It may require additional work to close / suspend / resume resources
I applied similar process to speedup PetClinic you can check it in this followup post that includes as well full code.
In future posts of Boost startup time on JVM series I’ll explore more ways to optimize JVM startup time
Stay tuned!