Nowadays, ORM technique has been playing an important role in object-oriented programming, and JPA is now considered the standard industry approach for ORM in the Java industry. In this post, I summarized several phenomena which violate my intuition and prone to error.
As JPA itself is just a specification, there are various underlying implementation. In this post, we are only focusing on Hibernate implementation. In fact, I've never used or tested any other implementation so far, which means there's a chance that a problem cannot be reproduced in other JPA implementation.
As in post Common Pitfalls of Declarative Transaction Management in Spring, all the samples are written in Kotlin language. And Spring Data JPA framework is used for the sake of convenience. Full source code can be found at common-pitfalls-in-jpa-hibernate.
You may already know that there are several contracts we have to obey when implementing
hashcode method. Namely Reflexivity, Symmetry, Transitivity, Consistency and "Non-nullity". When it comes to a JPA entity, things become even more difficult since entity state transitions must be taken into account. In other words,
hashcode methods must behave consistently across all entity state transitions. Thus, we can immediately conclude that logical key(usually auto generate after the first time being persisted) should not be taken into consideration.
AbstractPersistable from spring data JPA library is a perfect counterexample which implements
hashcode based on auto-generation id. The following code demonstrates its flaw:
HashSet failed to recognize the same entity since its hashcode changed after being persisted. Certainly, this is error-prone. For similar reason, default
hashcode inherited from
java.lang.Object is not suitable for JPA entity either. Code below shows that a merged entity isn't equal to itself because
entityManager.merge may return a different object reference.
Now, the only option left to us is implementing
hashcode methods based on some business key, and never change the key after the entity is created. However, you can not always find such keys in practical. In such cases, the best we can do is no matter which way we choose to implement the methods, be aware of its shortcomings and document them clearly.
- A beginner’s guide to entity state transitions with JPA and Hibernate
- How to implement Equals and HashCode for JPA entities
- How to implement equals and hashCode using the JPA entity identifier (Primary Key)
A collection can be lazy fetch or eager fetch in JPA. Eager fetching is generally considered as an anti-pattern, so we are not going to talk about it. Lazy fetch is preferred since it improves performance by reducing unnecessary queries. In hibernate, we can make a lazy collection even more "lazy" by putting a
@LazyCollection(LazyCollectionOption.EXTRA) annotation on the collection. After doing so, when
collection.size is called, hibernate will fire a select count query instead of loading the whole collection. This sounds very practical let alone we can implement it by just adding one line of code. The following code shows how an "extra lazy" collection looks like:
The problem is that not only it optimizes the
collection.size call, it also affects the behavior of
collection.contains. If a collection is fully loaded, the
contains method behave like a usual collection, which means the result depends on
hashcode method of its elements. Otherwise, it will send a SQL query based on the primary key of the given element instead of loading the whole collection. That's why I said the collection could behave unpredictably. To make this concrete, let's say our child entity has a unique business key "name", and we implement the
hashcode base on that key as we recommended.
Then we may encounter the awkward situation below,
children.contains(child) return inconsistent result depends on if the collection is loaded.
collection.contains() method behaves unpredictable, the uniqueness contract of a set element will also be broken. You should be totally aware of these consequence before you decide to make a collection "extra" lazy.
Let's say we have the same model from the preceding example. To focus on this problem, we remove the
@LazyCollection annotation from the children collection and change the child entity to have a composite unique key so that the child name uniqueness is limited in certain parent range.
Now, suppose we need to update the children of a persisted parent, we may end up with writing code like this
This code works fine in most case. However, if the parent already has a child called "foo", we'll encounter an
ConstraintViolationException. Even though we've already removed all the existed children before adding. The reason is hibernate doesn't execute statements in the order in which the code is written during flushing. Instead, it has its own defined order, which says inserts always happen before deletes. So the same error can also be reproduced in another way:
As the code comments, we have to call addition
entity.flush() after performing the delete action. Certainly it violates our intuition.
This time, We set up a
preUpdate hook function to automatically record how many times an instance has been changed since it was persisted, and we also change the
children property to be mutable for demonstrating sake.
Then you'll see that mutate children collection won't trigger the hook, whereas mutating children property will. Because the dirty check mechanism won't detect the collection mutation.
From the framework's perspective, this makes sense in a way, because detecting elements modification of a collection might be too expensive in such a case. But from a user's perspective, you may want to increase the
updateTimes value whenever a model is updated.
Again, with the same parent, child model. If you want to remove a child via
entityManager.remove method, it simply doesn't work. The following code exhibits this phenomenon.
The reason is we have a cascade relation between parent and child entity. Either we remove the cascade property from
@OneToMany annotation, or we have to ensure that the same entity is already removed from the
parent.children collection. Otherwise, the delete action will be silently discarded as if the
entityManager.remove(child) has never been called.
JPA does a great job of saving programmers from writing SQL by hand, but write correct code with high performance in JPA may not as easy as you thought. Hopefully, this post can save you from making the same mistakes as I made.