Persistence – Atbash blog

As you all know, the relational world of databases is quite different from the object-oriented one. Both can store the same data but they do it differently. Making it a challenge to retrieve and store it from an object-oriented language like Java to a SQL-based database.

In principle, frameworks like ORMs (Hibernate to name the most popular one) and ‘data’ frameworks, like Spring Data or the new Jakarta Data, manage to handle the conversation well. But only for simple cases.

This blog is mainly about the SQL case but also can be applied to the NoSQL frameworks. When you try to abstract away the underlying communication to the underlying system, you lose performance, functionality, and flexibility.

The JDBC connectivity

From the early days of Java, The Java Database Connectivity API made it possible to interact with the SQL databases. It allowed the Java developer to execute SQL statements and retrieve data for the application.

In those days, it was normal that every developer knew SQL very well and could write the most complex queries to retrieve the data required for the advanced use cases of their applications. Using joins, sub-queries, and grouping is not that hard and can easily be mastered within a week.

The challenge and difficulty is the usage of the JDBC API. In those early days, many APIs were designed so that the developer still needed to perform many actions. That leads to a lot of boilerplate code that is required to close statements, result sets, and connections. If you fail to do this properly, it results in resource leakage and failures in your application over time.

But there are other challenges. You need to manually assign each column value from a JDBC ResultSet to the properties of your objects. This is not the most rewarding piece of code that you as a developer write on a project.

Refactorings, where database fields change can also be a challenge since the SQLs you execute are actually just Strings, and thus at compile time of your application, they are not validated.

The benefits of the ORM

All the challenges described in the previous section, are addressed by the ORM tools like Hibernate. They simplify database interactions and bridge the gap between object-oriented programming languages and databases.

Firstly, ORM tools eliminate the need for developers to write repetitive and error-prone boilerplate code. That is hidden away in the framework code.

Another significant benefit of ORM tools is their ability to provide type-safe mappings between database tables and object-oriented classes. Initially defined through XML files and later on through the annotations available from Java 5 onwards, the tool handles the translation of data between the database and objects. Besides the mappings of single fields, it allows you to represent foreign key relations from the database. But also to represent concepts that do not exist in the database such as the OneToMany relation to have all children of each master record.

Although the SQL language is standardised, there are several versions like SQL-92, SQL-2003, and SQL-2023 where the JSON datatype is introduced. Not all databases support the same version and all databases use a custom, slightly different version. This is handled by the introduction of the ORM Query Language (Hibernate Query Language, JPA Query language, etc.) and a Dialect for each database that converts these queries to a format that is supported by the database.
This ORM Query Language is thus a subset of all possible features found in the databases and thus you cannot use the full power of the database.

Additionally, ORM tools offer a range of features such as support for database schema migrations, caching, and query optimization. They enable developers to work with databases more abstractly and intuitively, freeing them from having to think in terms of SQL statements and database-specific details. This abstraction allows for greater flexibility in choosing the underlying database system, but also in lower performance as it adds an additional rather complex layer and less functionality.

After the ORM tools were in use for several years, people saw that querying a table and filtering on one or a few fields, results in a handful of very similar statements.

This led to the creation of the data project like Spring Data. Instead of writing these few statements each time, these are derived from the method name of specially indicated interfaces.

List findByName(String name)

This kind of method just replaces 3 lines of code with the ORM tool.

The problems of the ORM

In the previous section, you can find many useful improvements in accessing the database by using an OM tool. But it has its own challenges and problems.

Probably the most common problem is the Lazy and Eager loading strategies and the ‘N+1 select issue’.
In almost all cases, you don’t need to retrieve the records of a table in isolation, but you also need to take into account the relations with other tables. This can be needed to have all the fields for filtering, or additional data for displaying on the screen.
Within the ORM tool, they are represented by the ManyToOne, OneToMany, or ManyToMany relation.

The ORM can decide to load the information of these related tables eagerly by including the table already in the query using a JOIN clause. Or after the main query is executed, issuing additional queries to load the details in the lazy case.

But this lazy case introduces the ‘N+1 select issue’ since after retrieving the results for the main query which has N rows, the ORM tool launches N queries to retrieve the detail collection of each row.

So is eagerly preferred over lazy loading then? No, not at all. Since many tables are connected, eagerly loading retrieves in most cases information from tables that are not needed, making queries complex and slow. You must decide on an individual basis if data is required or not.

Some real-world cases and best practices

In my 20+ years career as a Java Developer, I was called in on many projects that were already in production and experienced some issues or needed some advanced functionality.
In almost all cases, the issues could be retraced to how the ORM mapping and tool were used in the project.

I’ll briefly discuss some cases and explain the solutions that I applied to the problems.

One case is about the eager loading and the lack of a proper design for the Entity layer.
They called me in on a project where there was a performance problem on the main page of the application that showed some kind of overview, minimal dashboard, for the user. The page showed about 10 values, so not much information but it actually took about 45 seconds to load.

The reason was quickly found when I activated the SQL tracing to see what queries were sent to the database. To get the data for the main page, there were 1329 queries executed. The developer only issued 5 queries, but the development team used the lazy loading configuration and did not specify any FETCHING strategy on the queries themselves.

Since all tables are connected, which is the common case, some queries that touch many tables or collect data from some detail collections are not performant when relying on standard ORM tool behavior due to the lazy option.

In this case, the usage of eager loading would not solve the problem as the 5 queries would become very large, touching many tables within the database. These queries are also very slow, and thus not a solution.

The solution was actually very simple, create 5 ‘native queries’ that can retrieve the required values very efficiently from the database. The ORM tool can execute a native query. You still can make use of the default mapping, or retrieve a collection of values when the query is not returning entire table rows but only some values. But you bypass the ORM Query language conversion to SQL, so faster anyway, and you submit the ideal query to the database. Again the fastest option.

So, if you haven’t done it already, take that intermediate or advanced SQL course so that you become a pro in writing complex queries. Use native queries if you touch 3 or more tables as you can write them more efficiently than the ORM which is designed for simple cases. Use the Lazy fetching strategy but define in the query if you need the data or not by using a JOIN FETCH clause for your simple queries.

The second case I want to discuss is a project where they relied heavily on Spring Data. They used the method name convention to define the query or in several cases used the @Query annotation to instruct what should be executed.

During development, everything went smoothly but quickly some performance issues arose when running in production. The difference, in production there are not 10 or 20 records but 10,000 and more.

Since the development team used Spring Data, they included many eager relations and also used ToMany relations extensively to get the info they needed ‘automagically’. Since these options result in joining many tables, in many cases not needed for the situation, the queries became slow when using more records.

The solution in this case was again relying on native queries that could optimally retrieve the data, reduce the usage of Spring Data interface methods, and define query and fetch strategies within queries.

In general, avoid, or simply don’t use at all, the usage of the ToMany relations as they are not available ‘naturally’ in databases and require complex or additional queries.
Avoid the usage of the Data frameworks when querying more than 1 table or don’t use them at all since writing 3 statements is not a problem. Or is it?
Avoid the eager loading definition and specify the JOIN FETCH clause when you need the info.

This is of course a quick and limited overview. For example, did you know that you can’t use database pagination when you use JOIN FETCH? That you can’t use a sub-query in a JOIN in the ORM Query language?

Conclusion

The ORM tools solve a few important issues, the boilerplate code required with the JDBC API and the mapping of database field values to Java object properties. But it can introduce a lot of troubles in your projects regarding performance which is mostly only discovered when running in production with larger datasets.

So as a rule of thumb, do not use eager loading and use JOIN FETCH clauses when needed, do not use any ToMany relation as they make your queries complex, and write all your queries, other than the very simple one table ones, yourself using native queries.

Since we only should use a very limited set of functionality of the ORM tools like Hibernate, why don’t we drop it altogether and just use a tool that reduces the boilerplate code and solves the mapping issue (type-safe column names and values)? Tools like JOOQ or the Expose framework written in Kotlin, are 2 examples of tools that implement the best practices I described here without the overhead of an ORM.
Interested in an introduction to the Expose framework, I’ll give an overview in my next blog.

Training and support

Interested in a training about efficient ORM tools or Hibernate usage in your project? In need of an expert to help you solve a problem, feel free to contact me.

Do you need a specific training session on Jakarta EE, Quarkus, Kotlin or MicroProfile? Have a look at the training support that I provide on the page https://www.atbash.be/training/ and contact me for more information.