Categories
ORM-Persistence performance

Why ORM and Data frameworks are not your best option

As you all know, the relational world of databases is quite different from the object-oriented one. Both can store the same data but they do it differently. Making it a challenge to retrieve and store it from an object-oriented language like Java to a SQL-based database.

In principle, frameworks like ORMs (Hibernate to name the most popular one) and ‘data’ frameworks, like Spring Data or the new Jakarta Data, manage to handle the conversation well. But only for simple cases.

This blog is mainly about the SQL case but also can be applied to the NoSQL frameworks. When you try to abstract away the underlying communication to the underlying system, you lose performance, functionality, and flexibility.

The JDBC connectivity

From the early days of Java, The Java Database Connectivity API made it possible to interact with the SQL databases. It allowed the Java developer to execute SQL statements and retrieve data for the application.

In those days, it was normal that every developer knew SQL very well and could write the most complex queries to retrieve the data required for the advanced use cases of their applications. Using joins, sub-queries, and grouping is not that hard and can easily be mastered within a week.

The challenge and difficulty is the usage of the JDBC API. In those early days, many APIs were designed so that the developer still needed to perform many actions. That leads to a lot of boilerplate code that is required to close statements, result sets, and connections. If you fail to do this properly, it results in resource leakage and failures in your application over time.

But there are other challenges. You need to manually assign each column value from a JDBC ResultSet to the properties of your objects. This is not the most rewarding piece of code that you as a developer write on a project.

Refactorings, where database fields change can also be a challenge since the SQLs you execute are actually just Strings, and thus at compile time of your application, they are not validated.

The benefits of the ORM

All the challenges described in the previous section, are addressed by the ORM tools like Hibernate. They simplify database interactions and bridge the gap between object-oriented programming languages and databases.

Firstly, ORM tools eliminate the need for developers to write repetitive and error-prone boilerplate code. That is hidden away in the framework code.

Another significant benefit of ORM tools is their ability to provide type-safe mappings between database tables and object-oriented classes. Initially defined through XML files and later on through the annotations available from Java 5 onwards, the tool handles the translation of data between the database and objects. Besides the mappings of single fields, it allows you to represent foreign key relations from the database. But also to represent concepts that do not exist in the database such as the OneToMany relation to have all children of each master record.

Although the SQL language is standardised, there are several versions like SQL-92, SQL-2003, and SQL-2023 where the JSON datatype is introduced. Not all databases support the same version and all databases use a custom, slightly different version. This is handled by the introduction of the ORM Query Language (Hibernate Query Language, JPA Query language, etc.) and a Dialect for each database that converts these queries to a format that is supported by the database.
This ORM Query Language is thus a subset of all possible features found in the databases and thus you cannot use the full power of the database.

Additionally, ORM tools offer a range of features such as support for database schema migrations, caching, and query optimization. They enable developers to work with databases more abstractly and intuitively, freeing them from having to think in terms of SQL statements and database-specific details. This abstraction allows for greater flexibility in choosing the underlying database system, but also in lower performance as it adds an additional rather complex layer and less functionality.

After the ORM tools were in use for several years, people saw that querying a table and filtering on one or a few fields, results in a handful of very similar statements.

This led to the creation of the data project like Spring Data. Instead of writing these few statements each time, these are derived from the method name of specially indicated interfaces.

List findByName(String name)

This kind of method just replaces 3 lines of code with the ORM tool.

The problems of the ORM

In the previous section, you can find many useful improvements in accessing the database by using an OM tool. But it has its own challenges and problems.

Probably the most common problem is the Lazy and Eager loading strategies and the ‘N+1 select issue’.
In almost all cases, you don’t need to retrieve the records of a table in isolation, but you also need to take into account the relations with other tables. This can be needed to have all the fields for filtering, or additional data for displaying on the screen.
Within the ORM tool, they are represented by the ManyToOne, OneToMany, or ManyToMany relation.

The ORM can decide to load the information of these related tables eagerly by including the table already in the query using a JOIN clause. Or after the main query is executed, issuing additional queries to load the details in the lazy case.

But this lazy case introduces the ‘N+1 select issue’ since after retrieving the results for the main query which has N rows, the ORM tool launches N queries to retrieve the detail collection of each row.

So is eagerly preferred over lazy loading then? No, not at all. Since many tables are connected, eagerly loading retrieves in most cases information from tables that are not needed, making queries complex and slow. You must decide on an individual basis if data is required or not.

Some real-world cases and best practices

In my 20+ years career as a Java Developer, I was called in on many projects that were already in production and experienced some issues or needed some advanced functionality.
In almost all cases, the issues could be retraced to how the ORM mapping and tool were used in the project.

I’ll briefly discuss some cases and explain the solutions that I applied to the problems.

One case is about the eager loading and the lack of a proper design for the Entity layer.
They called me in on a project where there was a performance problem on the main page of the application that showed some kind of overview, minimal dashboard, for the user. The page showed about 10 values, so not much information but it actually took about 45 seconds to load.

The reason was quickly found when I activated the SQL tracing to see what queries were sent to the database. To get the data for the main page, there were 1329 queries executed. The developer only issued 5 queries, but the development team used the lazy loading configuration and did not specify any FETCHING strategy on the queries themselves.

Since all tables are connected, which is the common case, some queries that touch many tables or collect data from some detail collections are not performant when relying on standard ORM tool behavior due to the lazy option.

In this case, the usage of eager loading would not solve the problem as the 5 queries would become very large, touching many tables within the database. These queries are also very slow, and thus not a solution.

The solution was actually very simple, create 5 ‘native queries’ that can retrieve the required values very efficiently from the database. The ORM tool can execute a native query. You still can make use of the default mapping, or retrieve a collection of values when the query is not returning entire table rows but only some values. But you bypass the ORM Query language conversion to SQL, so faster anyway, and you submit the ideal query to the database. Again the fastest option.

So, if you haven’t done it already, take that intermediate or advanced SQL course so that you become a pro in writing complex queries. Use native queries if you touch 3 or more tables as you can write them more efficiently than the ORM which is designed for simple cases. Use the Lazy fetching strategy but define in the query if you need the data or not by using a JOIN FETCH clause for your simple queries.

The second case I want to discuss is a project where they relied heavily on Spring Data. They used the method name convention to define the query or in several cases used the @Query annotation to instruct what should be executed.

During development, everything went smoothly but quickly some performance issues arose when running in production. The difference, in production there are not 10 or 20 records but 10,000 and more.

Since the development team used Spring Data, they included many eager relations and also used ToMany relations extensively to get the info they needed ‘automagically’. Since these options result in joining many tables, in many cases not needed for the situation, the queries became slow when using more records.

The solution in this case was again relying on native queries that could optimally retrieve the data, reduce the usage of Spring Data interface methods, and define query and fetch strategies within queries.

In general, avoid, or simply don’t use at all, the usage of the ToMany relations as they are not available ‘naturally’ in databases and require complex or additional queries.
Avoid the usage of the Data frameworks when querying more than 1 table or don’t use them at all since writing 3 statements is not a problem. Or is it?
Avoid the eager loading definition and specify the JOIN FETCH clause when you need the info.

This is of course a quick and limited overview. For example, did you know that you can’t use database pagination when you use JOIN FETCH? That you can’t use a sub-query in a JOIN in the ORM Query language?

Conclusion

The ORM tools solve a few important issues, the boilerplate code required with the JDBC API and the mapping of database field values to Java object properties. But it can introduce a lot of troubles in your projects regarding performance which is mostly only discovered when running in production with larger datasets.

So as a rule of thumb, do not use eager loading and use JOIN FETCH clauses when needed, do not use any ToMany relation as they make your queries complex, and write all your queries, other than the very simple one table ones, yourself using native queries.

Since we only should use a very limited set of functionality of the ORM tools like Hibernate, why don’t we drop it altogether and just use a tool that reduces the boilerplate code and solves the mapping issue (type-safe column names and values)? Tools like JOOQ or the Expose framework written in Kotlin, are 2 examples of tools that implement the best practices I described here without the overhead of an ORM.
Interested in an introduction to the Expose framework, I’ll give an overview in my next blog.

Training and support

Interested in a training about efficient ORM tools or Hibernate usage in your project? In need of an expert to help you solve a problem, feel free to contact me.

Do you need a specific training session on Jakarta EE, Quarkus, Kotlin or MicroProfile? Have a look at the training support that I provide on the page https://www.atbash.be/training/ and contact me for more information.

Categories
Architecture performance

Time is Code, my Friend

For several years, the fast startup time of an application or service in Java has been a highly discussed topic. According to the authors of numerous articles, it seems if you don’t apply some techniques for a fast startup, your project is doomed.

This is of course nonsense, and I hope that everyone is sensible enough to realise that this is a hype and a sales and marketing trick to sell you some product. Because unless you are working on some project where start-up time is crucial, like financial transactions, the start-up time of 99.9% of the applications is not important at all (within limits).

Moreover, this startup time and topics like microservices and serverless, only focus on architecture and non-functional requirements. And, if the marketing hype is to be believed, it seems that functionality for the user is not important at all anymore – as long as you have an application that uses at least 5 of the current hypes.

Today, I want to add another interesting finding about applications or services focused on fast startup times. As a general rule, the faster your application starts, the less functionality it has.

The test

X starts up faster than Y. When applying Z you can achieve a 30% reduction in start-up time.  Some frameworks even made their core marketing tagline based on this.  Let us look at 10 solutions and compare them to see what we can learn about the results.

The idea of this test is to develop an application, slightly modified for the requirements of each framework, that has an endpoint capable of processing JSON. As a performance indicator, we measure the time when the first successful response is served by the runtime. I also captured the RSS of the process (for a description see the bottom of the text), to have an idea of the memory usage. And the third parameter I note is the size of the runtime.

Here is a listing of the runtimes:

Spring Boot 3.0. Project created using Spring Initializr using Spring Reactive web to make use of Netty.

Quarkus 2.16. Project created using code.quarkus.io using RestEasy Reactive and RestEasy Reactive JSONB.

Ktor 2.2. Project created using IntelliJ project wizard using Netty server, Content negotiation and Kotlin JSON serialisation.

HttpServer included in JVM with the addition of Atbash Octopus JSON 1.1.1 for the JSON Binding support. Custom minimal framework for handling requests and sending responses.

Payara 6. Jakarta EE 10 application running on Payara Micro 6

OpenLiberty. Jakarta EE 8 application running on OpenLiberty.

Wildfly 27. Jakarta EE 10 application running on WildFly 27

Helidon SE 3.1. The project was created using the Helidon Starter and includes JSON-P and JSON-B support.

Piranha. Jakarta EE 10 core profile application running on the Cor profile distribution

Micronaut 3.8. The project was created using Micronaut Launch and has support for JSON handling.

Why these 10? Because it is a wide range of frameworks in Java or Kotlin with various features and because it is the list of runtimes that I know well enough to create backend applications in.

The data is gathered when running JDK 17 (Zulu 17.30) on a machine with 16 GB of memory and 8-Core Intel i9 – 2,3 GHz processors. Just to give you all the details of the environment of the test in case you want to repeat the tests.

The determination of the time for the first response is performed by a Python 3 script to make sure we can measure very fast startup times.

All the information and code can be found in the Project FF GitHub repository.

The Results 

I will not keep you in suspense any longer, and present you the test results in the table below. The table is sorted according to the first request time from low to high.

FrameworkFirst RequestRSSArtefact size
Pure Java 158 ms41 Mb11 Kb
Helidon632 ms94 Mb5.7 Mb
Quarkus682 ms111 Mb14.1 Mb
Ktor757 ms111 Mb15.3 Mb
Micronaut956 ms116 Mb12.2 Mb
Piranha1060 ms140 Mb8.7 Mb
Spring Boot1859 ms184 Mb21.3 Mb
WildFly4903 ms313 Mb63.1 Mb
Open Liberty6268 ms356 Mb47.6 Mb
Payara Micro10018 ms438 Mb90.1 Mb

The placement of the frameworks is sometimes surprising, while other known facts are confirmed in this test.

But before I discuss some individual results, I want to highlight another aspect that you can see within the data. And which is actually more important than which framework is a few 100 ms faster.

Graph representing Start up time versus Artefact size

If we plot out the artefact size of the runtime against the first response time, we see a pretty good, almost linear relationship. And that is not really a surprise if you think about it. 

No Functionality is a Fast Startup

Looking at the table and the graph in the previous section, it is clear that we can achieve a fast first response when we have little to no functionality available in our application and runtime.

The evidence for that is shown in the example where a simple endpoint is created with only the HttpServer class of the JVM itself and a simple JSON handling library with only the basic functionality. It starts up in an astonishing 158 milliseconds. That is about 25% of the time required by Quarkus, for example.

And the trend of having little code and functionality available which results in a fast startup can also be seen in the results of other frameworks. Really modular frameworks, like Helidon SE, Quarkus, and Ktor are much faster than less modular ones like OpenLiberty and WildFly which are less modular, or Payara Micro which is not modular.

Hence the title “Time is Code, my Friend”, indicates that you can have a faster startup by sacrificing functionality within your application and the runtime, like observability. And yes, native compilation can help here but as said, for 99.9% of applications, startup time is irrelevant. As long as it is reasonably fast – it doesn’t matter if it is 1 or 5 seconds.

But actually, this means that the above results, and the majority of all those other blogs you can read about fast startup times, are completely useless.  Do you have an application that just takes in a JSON object and returns a response? Without interaction with any external system like a database or any other data storage, your application will startup fast but will not be very useful.

And another reason why the above results are useless is the environment in which they are executed: a machine with 8 dual-core CPUs.  If you allocate that configuration in your cloud provider, you will spend quite a lot of money each month. People hesitate to assign more than 0.5 or 1 CPU to their process which means that startup times must be multiplied by a factor.

So we should perform this kind of performance test in an environment with low CPU power and where the application is accessing databases where  latency is also a factor that you should not ignore.

Individual framework discussion

The first observation we can make is that when you really need high performance, do not use any framework or runtime but instead, create your own specialized solution. All the frameworks and runtimes must be generic to support most scenarios developer teams have. This comes with a cost, of course, since developing a specialised solution requires that you have to do much more. It is always a trade-off.

Secondly, Quarkus is less performant in startup time than I expected. They have a very elaborate and complex system to apply build time improvements so that the application can start up really fast. Yet, they reach only the same response times as Helidon SE and Ktor which don’t have such pre-processing. These build time improvements makes Quarkus much more complex, certainly under the hood, which will make maintenance of the framework much harder in the future and requires you to use a Quarkus extension for any kind of library you want to use in combination with Quarkus. There are currently more than 400 extensions because you need a specific extension to make use of the build time enhancements.

Some frameworks start up faster or slower than expected based on the artefact size. Piranha only starts at 1060 ms whereas the trend line says it should start at about 750 ms. So it is 25% slower than expected.  Also, OpenLiberty is slower than expected. On the other hand, WildFly, Quarkus, and Ktor are faster.

Atbash Runtime?

I didn’t include the results of Atbash Runtime in the table and the graph. For the simple reason that it is my pet project to learn about many aspects.  But here are the results:

FrameworkFirst RequestRSSArtefact Size
Atbash Runtime1537 ms168 Mb8.2 Mb

And I’m very proud of the result. Without any efforts focused on the startup time, just trying to create a modular Jakarta EE Core Profile runtime, I succeeded in beating the startup time of Spring Boot. And besides Helidon SE, it is the smallest one.

Conclusion

With the comparison of a simple application looking at the first response time and artefact size using 10 Java and Kotlin frameworks, I want to stress a few important factors:

– All the blogs and articles about startup time are useless since they are based on running a useless application in an unrealistic environment.  So we should stop focusing on all these architectural aspects and non-functional requirements and instead, development teams must focus on the things that matter to the end users of the applications!

– Modular runtimes and frameworks help you in removing functionality you do not need. This can help you to keep the startup time fast enough but also introduce some complexity in building and maintaining the artefacts for your environment.

– The build time enhancements of Quarkus make it faster than expected according to the artefact size. But they introduce complexity that can become a problem for the future of the framework. Other solutions like Helidon SE and Ktor achieve the same startup time without these tricks.

– Native compilation (AOT, Graal VM, etc…) can  reduce startup time and memory usage but are useless for 99.9% of the applications as they don’t benefit from those gains. And it will only introduce more complexity.

( I would like to thank Debbie for reviewing and commenting on the article)

Notes

The RSS (Resident Set Size) of a process is the portion of its memory that is held in RAM (Random Access Memory) and is used by the operating system to keep track of the memory usage of the process. It includes both the data and code segments of the process, as well as any shared libraries it is linked against.

The tests are performed with 10 times more free memory on the machine to make sure that the process is only using RAM.

Project FF, the Framework Frenzy, Framework Face-off, Framework Fireworks, or … the comparison and similarities between Jakarta EE Web Profile, Spring Boot, Quarkus, Kotlin Ktor, and a few others regarding creating backend applications with REST endpoints. It will host an extensive comparison between many frameworks regarding the functionality that you can find within Jakarta EE Web Profile, MicroProfile, and some other commonly used functionality in today’s back-end applications.

Atbash Training and Support

Do you need a specific training session on Jakarta or MicroProfile? Or do you need some help in getting up and running with your next project? Have a look at the training support that I provide on the page https://www.atbash.be/training/ and contact me for more information.

This website uses cookies. By continuing to use this site, you accept our use of cookies.  Learn more