Do some casually writing to practice my English.

Recently, I was asked to enable soft delete for all the tables I created, it sounds like a breeze, as an experienced noob, I "finished" it immediately without even think about it. This is how I did, add a boolean column named "deleted" for each table, then replace every unique index to include the "deleted" column, done! Anyway, It turns out I was too naive.

What's wrong with my naive solution?

Imagine that we have a user table:

Whenever we need to "soft delete" a user, we set the value of the "deleted" column to 1, what could possibly go wrong?
Now, let's say we have a user "John Snow", we deleted the corresponding record with the following command after he was killed in GOT season 5.

then we insert it again after they bring him back in season 6.

Everything works smoothly so far, except that we won't able to delete him again. this time update user set deleted = 1 where id = #{id}; will raise a duplicate records error, that is exactly what the unique constraint does, but apparently, it violates our intention.
The problem is, we only want the username to be unique if the user is active, we don't care if there are multiple deleted user share a username. In other words, we only want a partially unique constraint which restricts the active user.

ACID

一致性(Consistency)

ACID 中的一致性，表示事务只会将数据从一种“正确”的状态修改成另一种“正确”的状态。举例来说如果说有一个用户交易系统，所有的事务只会把金额从一个账户转移到另一个账户，那么可以保证的是无论执行多少次转账交易，该系统所有账户的余额都是“正确”的。

背景

Martin Kleppmann 是剑桥大学分布式系统领域的一名研究员，同时也是 Designing Data-Intensive Applications 这本书的作者，他在个人博客中发了一篇文章 How to do distributed locking，其中涉及了大量对 Redlock 算法安全性的质疑，Salvatore Sanfilippo（Redis 的创始人，也是这里 Redlock 算法的作者）随后发表 Is Redlock safe? 回应这些质疑，这篇文章总结了这两篇文章讨论的重点和我对这些问题的想法。

术语和约定

• safety 属性
简单说 safety 就是保证不会有坏事发生。如果该属性被违背，我们一般可以确切的知道它们在哪个时间点被违背，比如说集合元素的唯一性就是 safety 属性。如果一个集合插入了一个重复元素，那么在插入的这个时间点违反了唯一性这个 safety 属性。（注意不要混淆这里的 safety 属性和文章标题中“安全”一词的含义）
• liveness 属性
简单说，liveness 就是保证好事最终会发生。比如说最终一致性就是 liveness 属性（一般 liveness 属性定义中都包含”最终“二字）

"Intuitively, a safety property describes what is allowed to happen, and a liveness property describes what must happen."

• 资源服务：即需要被锁保护的资源。
• 锁服务：即本文 Redlock 算法扮演的角色。
• 锁用户：申请与释放锁的客户端。（下文可能简称为用户）

1. 效率(Efficiency)：通过锁来避免多次做重复的工作，计算重复的内容等等。这种场景下即便偶然出现多个用户同时持有锁，并同时与资源服务发生交互，也是可以忍受的。
2. 正确性(Correctness)：也就是文章标题所说的“安全”，我们希望资源服务在锁的保护下能够做“正确”的事。更严谨的说，我们希望任一时刻，只有一个用户能够访问资源服务，而且即便锁在该用户在与资源服务交互的中途过期，也不至于破坏资源服务的一致性。

1. 互斥（safety 属性）：在任一时刻，只有一个用户能持有锁。
2. 避免死锁（liveness 属性）：每把锁都有一个有效期，超出有效期则自动释放锁。如果没有这样的自动释放机制，那么一个已获得锁的用户宕机或失联，将导致资源被持续锁定直至该用户故障被修复，在大部分场景中，这是不可接受的。
3. 容错（liveness 属性）：没有单点失败问题，只要系统中多数锁服务节点正常工作，用户就能够获取和释放锁。

（封面图片来自 Consistency Models

基础概念解释

• Systems
分布式系统是一种并发 system，很多关于并发控制的研究可以直接应用到分布式 system 中。不过，大部分我们将要讨论的概念最开始是为单点并发系统设计的。它们之间在可用性和性能上还是有一些区别。
System 的逻辑状态会随着时间改变。比如说单个整型变量就可以是一个简单的 system，它有类似于 0, 3, 42 这样的状态。一个互斥锁 system 有两种状态：locked 和 unlocked.
• Operations
一个 operation 是 system 从一种状态到另一种状态间的转移。比如说，一个单变量 system 可能有类似于读取和写入这样的 operation，它们分别用来获取和设置该变量的值。一个计数器可能有自增，自减，读取这样的 operation。
• Histories
一个 history 是一系列 operation 的集合，包括它们的并发结构。这里将其表述成一个包含 operation 的调用和完成的有序列表(an ordered list of invocation and completion operations)。
• Consistency Models
一个 consistency model 是一系列 history 的集合。我们用 consistency models 来定义哪些 histories 在 system 中是“好的”或者“合法的”。当我们说一个 history 违反了 serializability 或者不是 serializable 的时候，我们指的是这个 history 不在 serializable consistency model 允许的 history 集合。

Nowadays, ORM technique has been playing an important role in object-oriented programming, and JPA is now considered the standard industry approach for ORM in the Java industry. In this post, I summarized several phenomena which violate my intuition and prone to error.

As JPA itself is just a specification, there are various underlying implementation. In this post, we are only focusing on Hibernate implementation. In fact, I've never used or tested any other implementation so far, which means there's a chance that a problem cannot be reproduced in other JPA implementation.

Prerequisites

As in post Common Pitfalls of Declarative Transaction Management in Spring, all the samples are written in Kotlin language. And Spring Data JPA framework is used for the sake of convenience. Full source code can be found at common-pitfalls-in-jpa-hibernate.

Pitfall 1: Don't be fooled by equals and hashcode methods

You may already know that there are several contracts we have to obey when implementing equals and hashcode method. Namely Reflexivity, Symmetry, Transitivity, Consistency and "Non-nullity". When it comes to a JPA entity, things become even more difficult since entity state transitions must be taken into account. In other words, equals and hashcode methods must behave consistently across all entity state transitions. Thus, we can immediately conclude that logical key(usually auto generate after the first time being persisted) should not be taken into consideration. AbstractPersistable from spring data JPA library is a perfect counterexample which implements equals and hashcode based on auto-generation id. The following code demonstrates its flaw:

The HashSet failed to recognize the same entity since its hashcode changed after being persisted. Certainly, this is error-prone. For similar reason, default equals and hashcode inherited from java.lang.Object is not suitable for JPA entity either. Code below shows that a merged entity isn't equal to itself because entityManager.merge may return a different object reference.

Now, the only option left to us is implementing equals and hashcode methods based on some business key, and never change the key after the entity is created. However, you can not always find such keys in practical. In such cases, the best we can do is no matter which way we choose to implement the methods, be aware of its shortcomings and document them clearly.

Reference: