Boost startup time on JVM - CRaC - part 1

This is the first article in the series Boost startup time on JVM where I explore different options to reduce startup time for non trivial JVM-based web applications.

JVM-based applications have traditionally struggled with slow startup times. However, recent improvements in the JDK and better support from frameworks like Spring Boot have made it much easier to speed up startup.

We can now achieve a blazing fast startup time with a limited effort!

In this article the first of series I’ll share my experience with a first option

Why is so important to be fast at startup?
How fast can we go ?
What happens at startup in a HotSpot JVM
Avoid the trap of oversimplified applications!
A cheap option for fast startup time: project CRaC
Boost Live Kitchen startup time!
Guidelines and attention points
Wrap up

Why is so important to be fast at startup? Link to heading

A fast startup time brings the following benefits mainly but not only in cloud environment:

💸 Cost saving: only needed server are available to serve current traffic
💪 Resilience: the system can react to on unexpected traffic peaks without the need of over provisioning
💰 Unlock “Scale to zero” That means no server active when there is no need because it is possible to start them up very quickly.

This means that we have the possibility to make a relevant reduction of cloud infrastructure costs and keep same or better level of resilience 🤑

How fast can we go? Link to heading

I assume you’re eager to know, ‘What improvements have you made? Show me the numbers!’'

I was able to boost startup time from 3/4 seconds to about 160 ms 🚀 on two pseudo realistic apps: Live Kitchen and Spring pet clinic

it is more than 90% of speed improvement! …not bad! 😎

Using an approach with

almost no change on application code 🎉
some additional work on infrastructure needed only once 🎉

But also with

some additional concerns to consider
some additional problems to solve

⚠ Note: the figures above are related on two pseudo realistic ( but still not real production ) application so take them as an indicator of what is possible to achieve

I hope that now your curiosity is triggered so let’s start to see one of the options to achieve fast startup time on jvm

What happens at startup in a HotSpot JVM Link to heading

A Java Virtual Machine ( JVM ) is a software machine that simulates what a real machine does. Like a real machine, it has an instruction set, a virtual computer architecture and an execution model.

Java source code is compiled into bytecode that is executed by the HotSpot JVM on a virtual stack machine that handle different instructions.Each instruction identified by an 8-bit numerical opcode; hence, the name bytecode.

An HotSpot JVM contains performance-boosting just-in-time (JIT) compilation technology that profiles your program’s execution and selectively optimizes “hot spots” the parts it decides will benefit the most by compiling and caching them into native code on the fly using knowledge of the underlying system architecture.

Implications:

✅ dynamic features ( reflection , dynamic class loading ) make program more expressive
✅ dynamic compilation means more information to make better optimization decisions
❌ the cost of this dynamism is slower startup time

We will see how to keep all those dynamism benefits and improve the startup time with CRaC - Checkpoint & Restore at Checkpoint but before that let’s avoid wating time with some common traps

Avoid the trap of oversimplified applications! Link to heading

Working with overly simplistic applications—such as those relying on in-memory databases, lacking HTTP calls, or avoiding serialization—can give a misleading sense of mastery over concepts and technologies 😮

However, when we transition to real-world production environments, we encounter complex, real-life challenges that are much harder to resolve 😓

How I fell into the trap again!: the PetClinic and in-memory database case Link to heading

It happened to me again! 😓 I was trying to improve the startup time of the PetClinic application using CRaC. Following some tutorials, I got it working—and it was so easy! But then, I tried connecting it to a remote MySQL database, and that’s when the struggle began 😢.

You definitely don’t want this to happen in a production environment!

If you’re interested, I’ve shared a separate post where I explain how I forked the PetClinic application and successfully applied CRaC concepts to make it work.

A pseudo realistic app to the rescue: Live Kitchen Link to heading

To work with a realistic application i built Live Kitchen The stack used is

jdk 21 / kotlin / Spring boot 3.3 / multi module

This application has

non trivial domain logic: elaboration of optimized recipe steps execution plan
remote mysql database
parallel http calls to an external service with virtual threads .This means handle with data serialization
thymeleaf template engine that use reflection
multi modular code to allow domain infrastructure decopling

A cheap option for fast startup time: project CRaC Link to heading

CRaC is based on Checkpoint & Restore In Userspace (CRIU), a project to implement checkpoint and restore functionality for Linux. CRIU allows freezing a container or an individual application and restoring it from the saved checkpoint files.

Restoring an application from a previous checkpoint means to leverage most of JVM work done previously and allows a much faster startup time 🚀

Requirements:

Linux where CRIU is available
Jdk distribution with built-in support for CRaC like Azul Zulu
Spring 6.1 and SpringBoot 3.2 ( since 2023 ) or ( Micronaut / Quarkus )
The presence of the org.crac:crac library in the classpath.
Specifying the required java command-line parameters like -XX:CRaCCheckpointTo=PATH or -XX:CRaCRestoreFrom=PATH
CRaC requires that all connections must be closed before checkpoint and be restorable on restore. What this means ?

This is a quite cheap solution in terms of development time! 😎 Developers keep using actual application with with minor changes. ( See points 4, 5, 6 ) An additional work needed to create a docker image including the checkpoint is needed only once!

Boost Live Kitchen startup time! Link to heading

The Live Kitchen application uses Spring Data Jdbc to connect to mysql.

On Linux make sure you have the permissions to run CRIU

 sudo chown root:root $JAVA_HOME/lib/criu
 sudo chmod u+s $JAVA_HOME/lib/criu

Add crac dependency

<dependency>
   <groupId>org.crac</groupId>
   <artifactId>crac</artifactId>
   <version>1.4.0</version>
</dependency>

enable hikari pool suspension

 spring.datasource.hikari.allow-pool-suspension=true

start with a flag to indicate where to store the checkpoint

java -XX:CRaCCheckpointTo=./tmp_manual_checkpoint -jar application/target/*.jar

output

Bootstrapping Spring Data JDBC repositories in DEFAULT mode.
...
...
Started LiveKitchenApplicationKt in 2.616 seconds (process running for 2.95)

trigger a dump of checkpoint with the following command on PID related to previous step

jcmd $1 JDK.checkpoint

output

Starting checkpoint
Suspending Hikari pool
Evicting Hikari connections

restore application from the same checkpoint 🏃

 java -XX:CRaCRestoreFrom=./tmp_manual_checkpoint

output

Resuming Hikari pool
Tomcat started on port 8080 (http) with context path '/'
Spring-managed lifecycle restart completed (restored JVM running for 165 ms)

So ….. the 💪 enhanced Live kitchen application started in 165 ms instead of 3 seconds! 🎉 😎

Feel free to check it out and give me feedbacks!

Guidelines and attention points Link to heading

Azul CRaC guidelines Link to heading

The creator of CRaC specification reccommend to identify all classes in your code that are considered “resources” and to react properly to checkpoint creation or restore events. Those resources should implement the org.crac.Resource that force us to specify what to do before and after a checkpoint happens with beforeCheckpoint() and afterRestore() which are callbacks by the JVM.

Example of resource implementation

public class MyResource implements Resource {

    public MyClass() {
        Core.getGlobalContext().register(this);
    }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) {
        /* ... */
    }

    @Override
    public void afterRestore(Context<? extends Resource> context) {
        /* ... */
    }
}

Also the following flags are needed to define the checkpoint location -XX:CRaCCheckpointTo=./tmp_auto_checkpoint and -XX:CRaCRestoreFrom=./tmp_manual_checkpoint

more info can be found here

Spring simplify the adoption of CRaC specification Link to heading

Spring allows to create checkpoint in two ways:

1. Automatic checkpoint/restore at startup Link to heading

When -Dspring.context.checkpoint=onRefresh is set a checkpoint is created automatically at startup during the LifecycleProcessor.onRefresh phase

2. On-demand checkpoint/restore of a running application Link to heading

A checkpoint can be created on demand, for example using a command like

jcmd application.jar JDK.checkpoint

Before the creation of the checkpoint, Spring stops all the running beans, giving them a chance to close resources if needed by implementing Lifecycle.stop

more info can be found here

Warning

Using this feature should be done with the assumption that any value “seen” by the JVM, such as configuration properties coming from the environment, will be stored in those CRaC files. As a consequence, the security implications of where and how those files are generated, stored, and accessed should be carefully assessed

Wrap up Link to heading

Pro:

A limited development effort is needed
It leverages the dynamism of JVM almost without limitations
Amazing improvement on startup time

Cons:

Is limited to Linux os
More advanced deployment pipelines are required
Additional security topics to solve
It may require additional work to close / suspend / resume resources

I applied similar process to speedup PetClinic you can check it in this followup post that includes as well full code.

In future posts of Boost startup time on JVM series I’ll explore more ways to optimize JVM startup time

Stay tuned!