Categories
Kotlin ORM-Persistence

SQL and ORM alike Database Access with Kotlin Exposed.

Introduction

Almost all access to SQL-based databases is performed through the JDBC API. It provides a standardized way for Java programs to interact with databases, enabling seamless communication and data manipulation. It acts as an intermediary layer, shielding the application from the underlying complexities of different database systems.
It also makes use of the SQL language which is a standard to unify the interactions with the database, from inserting, updating, querying, and deleting data.

This SQL language, together with the JDBC API, abstracts away the differences in the database systems. It contains features to increase the performance like prepared statements which also help in securing your application by using parameters and not just ‘pasting’ values into the SQL query.

The API itself is rather old and still relies on a lot of manual actions and statements from the developer. Resources like result sets, statements, and connections need to be closed after we don’t need them anymore. This results in a lot of boilerplate code.

The SQL language itself is seen as problematic by a lot of developers. The SQL Language is completely different from Java itself. And thus not everyone is keen on learning something besides Java. Especially since there are frameworks these days that provide a solution that works completely within Java.

But the traditional ORM frameworks like Hibernate bring a lot of other challenges and problems as I have described in the previous blog – Why ORM and Data frameworks are not your best option. So be careful in what features and functionalities you use in your application.

In this blog, we look at the Exposed library written in Kotlin. It is a lightweight SQL library on top of the JDBC driver for the Kotlin language. Exposed has two flavors of database access: typesafe SQL wrapping DSL and lightweight Data Access Objects (DAO).

It is an alternative for Java-based solutions like Hibernate when you are working with Kotlin. Also, JetBrains recently announced that it will increase the efforts into the framework, see this blog.

Installation

Since the Exposed framework has 2 flavours, the dependencies that you should add are separated into different artifacts.

The core dependency contains all the basic code and functionality which is independent of the JDBC or DAO flavour. Depending on the flavour you want to use, you should add the dependency you need.

When using Gradle as the build system, you can add the following dependencies.

dependencies {
implementation "org.jetbrains.exposed:exposed-core:${exposed.version}"
implementation "org.jetbrains.exposed:exposed-dao:${exposed.version}"
implementation "org.jetbrains.exposed:exposed-jdbc:${exposed.version}"
}

In the case of Maven, the following snippet gives you all the information you need.

    <dependency>
        <groupId>org.jetbrains.exposed</groupId>
        <artifactId>exposed-core</artifactId>
        <version>${exposed.version}</version>
    </dependency>
    <dependency>
        <groupId>org.jetbrains.exposed</groupId>
        <artifactId>exposed-dao</artifactId>
        <version>${exposed.version}</version>
    </dependency>
    <dependency>
        <groupId>org.jetbrains.exposed</groupId>
        <artifactId>exposed-jdbc</artifactId>
        <version>${exposed.version}</version>
    </dependency>

Defining the DB connection, which can be based on a standard DataSource object or use Connection Pooling provided by a third-party library like HikariCP, just requires calling the org.jetbrains.exposed.sql.Database.Companion#connect() function. This will make sure that all parts of the framework will use the DB connection from this setting. You can create for example following

object DBSettings {

    private val config = HikariConfig().apply {
        jdbcUrl = "jdbc:postgresql://localhost:5432/postgres"
        driverClassName = "org.postgresql.ds.PGSimpleDataSource"
        username = "postgres"
        password = "mysecretpassword"
        maximumPoolSize = 10
    }

    val dataSource by lazy {
        HikariDataSource(config)
    }

    val db by lazy {
        Database.connect(dataSource)
    }
}

Where you obviously retrieve the connection parameters from an external location instead of just hardcoding them in the code as in the previous example.

Typesafe access

One of the challenges we have when working with databases is creating SQL statements and mapping them to instances. We can either execute a SQL query which is basically just a String and map each field by hand to our properties of the objects. This is can very tedious. Or we define some ‘entities’ like in ORM frameworks and let the framework generate the SQL query. But that can lead to unoptimised queries, especially when you need very complex queries.

The Exposed framework, like some other frameworks in Java and Kotlin, try to approach this in a very lightweight way that handles the mapping for you so that your code is more type-safe (is a certain database field used as a String or Number?) and name changes in database columns and code can be handled transparently.

Within the JDBC feature of Exposed, you can define Singletons in Kotlin that extend from the Table class and define the correspondence between the table and code. Due to the expressiveness of Kotlin, these definitions can be highly readable. For example, an instance that maps to a table ‘Users’ that has some fields and a foreign key to the table ‘Cities’ can be defined as

object Users : Table() {
    val id = integer("id").autoIncrement()  // Column<Int>
    val name = varchar("name", length = 50) // Column<String>
    val cityId = (integer("city_id").references(Cities.id)).nullable() // Column<Int?>

    override val primaryKey = PrimaryKey(id, name = "PK_User_ID") // name is optional here
}

Due to the ‘inheritance’ from Table, each singleton defined as in the example, gets a lot of functions to perform some actions.

Like inserting into the table, and retrieving the newly assigned id using the auto-increment functionality of the database on the fly.

    val amsterdamId = Cities.insert {
        it[name] = "Amsterdam"
    } get Cities.id

Or loop over each record in the table

    for (city in Cities.selectAll()) {
        println("${city[Cities.id]}: ${city[Cities.name]}")
    }

You can see that in each of these use cases, we only refer to the property of the singleton Users and thus the actual table field name is only defined once in the configuration of the table itself.

But the JDBC feature also allows to create more complex queries, for example

    (Users innerJoin Cities).slice(Users.name, Cities.name).select { Users.name.eq("Emma") }.forEach {
        println("${it[Users.name]} works in ${it[Cities.name]}")
    }

This joins the 2 tables, based on the information within the definition (look at .references(Cities.id) earlier on), selects the 2 name columns, adds a where clause on the user name, and prints out the information in this case.

So this JDBC feature of the Exposed framework solves a few important issues of working with the JDBC API, namely the mapping and the boilerplate code that is required when using the JDBC API purely.
Note however that the mapping of the result sets is done using a Map-like structure where the key represents the column name and the value the database field value. People who are used to working with ORM frameworks will find this a bit awkward but Kotlin gives you a highly readable syntax for it. Most of all, you as the developer are still in control of the queries that are generated and thus performance issues are less likely.

The DAO or ORM solution.

The DAO features of the Exposed Framework take a slightly different approach to the same problem space by making use of ‘Entities’. So you need to define your table mappings and how you run queries differently. This means that both features are not compatible, or at least, combining both approaches did not work quite well for me. They can be used in the same application, but the definition of the table is slightly different and an Entity object is also needed.

object Users : IntIdTable() {
    val name = varchar("name", length = 50) // Column<String>
    val city = reference("city", Cities)
}


class User(id: EntityID<Int>) : IntEntity(id) {
    companion object : IntEntityClass<User>(Users)

    var name by Users.name
    var city by City referencedOn Users.city
}

Here Users contains the definition of the table and assumes an autoincrement Primary Key for the field id. The User class defines an instance of a record where we even have an object for City and not just the foreign key value.

An insert statement can be written as

    val amsterdam = City.new {
        name = "Amsterdam"
    }

which results in an INSERT SQL statement at the moment we access the database again or at the end of the transaction. So it is not flushed to the database immediately.

Retrieving data is also done differently than in the JDBS feature case. An important fact that you need to know is that joins are executed lazily which might not be the optimal case for some of your queries. For example, when retrieving the record for ‘Emma’ and the city she works in, the retrieval of the City name is done separately and after the retrieval of the User record itself.

    User.find { Users.name.eq("Emma") }.forEach {
        println("${it.name} works in ${it.city.name}")
    }

As you see, here again, the inheritance brings some useful methods into the objects like the find method.

Of course, an eager-loading way of working is possible by explicitly defining the joins that need to be made. This again is a positive point for this framework concerning the eager and lazy loading issues you can have with the classic ORM tools like Hibernate. The next snippet shows the same query but now with joining the tables eagerly.

    // Eager loading
    val query = Users.innerJoin(Cities).select(
        Users.name.eq("Emma")
    )

    User.wrapRows(query).forEach {
        println("${it.name} works in ${it.city.name}")
    }

Code examples

You can find the complete example project at this location of my Kotlin Project on GitHub.

If you want to learn more abput the Exposed framework itself, this is the main page of the Wiki for the project.

Conclusion

The Exposed framework is solving the main issues of working with the JDBC API in a completely different way the other frameworks like Hibernate. The lightweight approach to avoid mapping and typesafe access is not unique, but due to the expressiveness of the Kotlin language, results in a DSL-like solution that is highly readable and understandable.

Because the JDBC feature of the framework, tries to stay as close as possible to the JDBC API itself, it makes use of a Map-like structure for the transferred data. If you like real objects for records or result set items, you can make use of the DAO functionality of the framework. This resembles the ORM frameworks but is much easier to use and more performant in the execution.

Training and Support

Interested in a training about efficient ORM tools or Hibernate usage in your project? In need of an expert to help you solve a problem, feel free to contact me.

Do you need a specific training session on Jakarta EE, Quarkus, Kotlin or MicroProfile? Have a look at the training support that I provide on the page https://www.atbash.be/training/ and contact me for more information.

Categories
ORM-Persistence performance

Why ORM and Data frameworks are not your best option

As you all know, the relational world of databases is quite different from the object-oriented one. Both can store the same data but they do it differently. Making it a challenge to retrieve and store it from an object-oriented language like Java to a SQL-based database.

In principle, frameworks like ORMs (Hibernate to name the most popular one) and ‘data’ frameworks, like Spring Data or the new Jakarta Data, manage to handle the conversation well. But only for simple cases.

This blog is mainly about the SQL case but also can be applied to the NoSQL frameworks. When you try to abstract away the underlying communication to the underlying system, you lose performance, functionality, and flexibility.

The JDBC connectivity

From the early days of Java, The Java Database Connectivity API made it possible to interact with the SQL databases. It allowed the Java developer to execute SQL statements and retrieve data for the application.

In those days, it was normal that every developer knew SQL very well and could write the most complex queries to retrieve the data required for the advanced use cases of their applications. Using joins, sub-queries, and grouping is not that hard and can easily be mastered within a week.

The challenge and difficulty is the usage of the JDBC API. In those early days, many APIs were designed so that the developer still needed to perform many actions. That leads to a lot of boilerplate code that is required to close statements, result sets, and connections. If you fail to do this properly, it results in resource leakage and failures in your application over time.

But there are other challenges. You need to manually assign each column value from a JDBC ResultSet to the properties of your objects. This is not the most rewarding piece of code that you as a developer write on a project.

Refactorings, where database fields change can also be a challenge since the SQLs you execute are actually just Strings, and thus at compile time of your application, they are not validated.

The benefits of the ORM

All the challenges described in the previous section, are addressed by the ORM tools like Hibernate. They simplify database interactions and bridge the gap between object-oriented programming languages and databases.

Firstly, ORM tools eliminate the need for developers to write repetitive and error-prone boilerplate code. That is hidden away in the framework code.

Another significant benefit of ORM tools is their ability to provide type-safe mappings between database tables and object-oriented classes. Initially defined through XML files and later on through the annotations available from Java 5 onwards, the tool handles the translation of data between the database and objects. Besides the mappings of single fields, it allows you to represent foreign key relations from the database. But also to represent concepts that do not exist in the database such as the OneToMany relation to have all children of each master record.

Although the SQL language is standardised, there are several versions like SQL-92, SQL-2003, and SQL-2023 where the JSON datatype is introduced. Not all databases support the same version and all databases use a custom, slightly different version. This is handled by the introduction of the ORM Query Language (Hibernate Query Language, JPA Query language, etc.) and a Dialect for each database that converts these queries to a format that is supported by the database.
This ORM Query Language is thus a subset of all possible features found in the databases and thus you cannot use the full power of the database.

Additionally, ORM tools offer a range of features such as support for database schema migrations, caching, and query optimization. They enable developers to work with databases more abstractly and intuitively, freeing them from having to think in terms of SQL statements and database-specific details. This abstraction allows for greater flexibility in choosing the underlying database system, but also in lower performance as it adds an additional rather complex layer and less functionality.

After the ORM tools were in use for several years, people saw that querying a table and filtering on one or a few fields, results in a handful of very similar statements.

This led to the creation of the data project like Spring Data. Instead of writing these few statements each time, these are derived from the method name of specially indicated interfaces.

List findByName(String name)

This kind of method just replaces 3 lines of code with the ORM tool.

The problems of the ORM

In the previous section, you can find many useful improvements in accessing the database by using an OM tool. But it has its own challenges and problems.

Probably the most common problem is the Lazy and Eager loading strategies and the ‘N+1 select issue’.
In almost all cases, you don’t need to retrieve the records of a table in isolation, but you also need to take into account the relations with other tables. This can be needed to have all the fields for filtering, or additional data for displaying on the screen.
Within the ORM tool, they are represented by the ManyToOne, OneToMany, or ManyToMany relation.

The ORM can decide to load the information of these related tables eagerly by including the table already in the query using a JOIN clause. Or after the main query is executed, issuing additional queries to load the details in the lazy case.

But this lazy case introduces the ‘N+1 select issue’ since after retrieving the results for the main query which has N rows, the ORM tool launches N queries to retrieve the detail collection of each row.

So is eagerly preferred over lazy loading then? No, not at all. Since many tables are connected, eagerly loading retrieves in most cases information from tables that are not needed, making queries complex and slow. You must decide on an individual basis if data is required or not.

Some real-world cases and best practices

In my 20+ years career as a Java Developer, I was called in on many projects that were already in production and experienced some issues or needed some advanced functionality.
In almost all cases, the issues could be retraced to how the ORM mapping and tool were used in the project.

I’ll briefly discuss some cases and explain the solutions that I applied to the problems.

One case is about the eager loading and the lack of a proper design for the Entity layer.
They called me in on a project where there was a performance problem on the main page of the application that showed some kind of overview, minimal dashboard, for the user. The page showed about 10 values, so not much information but it actually took about 45 seconds to load.

The reason was quickly found when I activated the SQL tracing to see what queries were sent to the database. To get the data for the main page, there were 1329 queries executed. The developer only issued 5 queries, but the development team used the lazy loading configuration and did not specify any FETCHING strategy on the queries themselves.

Since all tables are connected, which is the common case, some queries that touch many tables or collect data from some detail collections are not performant when relying on standard ORM tool behavior due to the lazy option.

In this case, the usage of eager loading would not solve the problem as the 5 queries would become very large, touching many tables within the database. These queries are also very slow, and thus not a solution.

The solution was actually very simple, create 5 ‘native queries’ that can retrieve the required values very efficiently from the database. The ORM tool can execute a native query. You still can make use of the default mapping, or retrieve a collection of values when the query is not returning entire table rows but only some values. But you bypass the ORM Query language conversion to SQL, so faster anyway, and you submit the ideal query to the database. Again the fastest option.

So, if you haven’t done it already, take that intermediate or advanced SQL course so that you become a pro in writing complex queries. Use native queries if you touch 3 or more tables as you can write them more efficiently than the ORM which is designed for simple cases. Use the Lazy fetching strategy but define in the query if you need the data or not by using a JOIN FETCH clause for your simple queries.

The second case I want to discuss is a project where they relied heavily on Spring Data. They used the method name convention to define the query or in several cases used the @Query annotation to instruct what should be executed.

During development, everything went smoothly but quickly some performance issues arose when running in production. The difference, in production there are not 10 or 20 records but 10,000 and more.

Since the development team used Spring Data, they included many eager relations and also used ToMany relations extensively to get the info they needed ‘automagically’. Since these options result in joining many tables, in many cases not needed for the situation, the queries became slow when using more records.

The solution in this case was again relying on native queries that could optimally retrieve the data, reduce the usage of Spring Data interface methods, and define query and fetch strategies within queries.

In general, avoid, or simply don’t use at all, the usage of the ToMany relations as they are not available ‘naturally’ in databases and require complex or additional queries.
Avoid the usage of the Data frameworks when querying more than 1 table or don’t use them at all since writing 3 statements is not a problem. Or is it?
Avoid the eager loading definition and specify the JOIN FETCH clause when you need the info.

This is of course a quick and limited overview. For example, did you know that you can’t use database pagination when you use JOIN FETCH? That you can’t use a sub-query in a JOIN in the ORM Query language?

Conclusion

The ORM tools solve a few important issues, the boilerplate code required with the JDBC API and the mapping of database field values to Java object properties. But it can introduce a lot of troubles in your projects regarding performance which is mostly only discovered when running in production with larger datasets.

So as a rule of thumb, do not use eager loading and use JOIN FETCH clauses when needed, do not use any ToMany relation as they make your queries complex, and write all your queries, other than the very simple one table ones, yourself using native queries.

Since we only should use a very limited set of functionality of the ORM tools like Hibernate, why don’t we drop it altogether and just use a tool that reduces the boilerplate code and solves the mapping issue (type-safe column names and values)? Tools like JOOQ or the Expose framework written in Kotlin, are 2 examples of tools that implement the best practices I described here without the overhead of an ORM.
Interested in an introduction to the Expose framework, I’ll give an overview in my next blog.

Training and support

Interested in a training about efficient ORM tools or Hibernate usage in your project? In need of an expert to help you solve a problem, feel free to contact me.

Do you need a specific training session on Jakarta EE, Quarkus, Kotlin or MicroProfile? Have a look at the training support that I provide on the page https://www.atbash.be/training/ and contact me for more information.

Categories
Jakarta EE ORM-Persistence

Why ORM and Data frameworks are not your best option

Subscribe to continue reading

Subscribe to get access to the rest of this post and other subscriber-only content.

This website uses cookies. By continuing to use this site, you accept our use of cookies.  Learn more