When "Soft Delete" Meets "Unique Index"

Do some casually writing to practice my English.

Recently, I was asked to enable soft delete for all the tables I created, it sounds like a breeze, as an experienced noob, I "finished" it immediately without even think about it. This is how I did, add a boolean column named "deleted" for each table, then replace every unique index to include the "deleted" column, done! Anyway, It turns out I was too naive.

What's wrong with my naive solution?

Imagine that we have a user table:

1
2
3
4
5
6
7
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`deleted` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
UNIQUE KEY `uq_user` (`username`,`deleted`)
)

Whenever we need to "soft delete" a user, we set the value of the "deleted" column to 1, what could possibly go wrong?
Now, let's say we have a user "John Snow", we deleted the corresponding record with the following command after he was killed in GOT season 5.

1
update user set deleted = 1 where id = #{id};

then we insert it again after they bring him back in season 6.

1
insert user(username) values ("John Snow");

Everything works smoothly so far, except that we won't able to delete him again. this time update user set deleted = 1 where id = #{id}; will raise a duplicate records error, that is exactly what the unique constraint does, but apparently, it violates our intention.
The problem is, we only want the username to be unique if the user is active, we don't care if there are multiple deleted user share a username. In other words, we only want a partially unique constraint which restricts the active user.

阅读全文

The Good Old Transaction

随便写写跟事务相关的笔记

ACID

原子性(Atomicity)

这里的原子性含义与多线程编程中的原子性有一些细微的区别,在多线程语境中,如果一个方法满足原子性,则其它线程无法看到该方法执行的中间状态,但它并不保证该方法中的语句全生效或全不生效(All or Nothing)。相反,ACID 中的原子性保证 All or Nothing,但其并不保证其它事务是否能看到该事务执行的中间状态,在 ACID 中,该属性由隔离性(Isolation)来保证。考虑下面这段程序

1
2
3
4
5
6
7
8
9
10
11
12
13
14
var counter: Int = 0;
@Synchronized
fun increase() {
counter++;
if (ThreadLocalRandom.current().nextBoolean())
throw Exception("Oops!")
counter++;
}
@Synchronized
fun printCurrent() {
println(counter);
}

这里 @Synchronized 保证 increase 方法是符合原子性的,这意味着,如果没有异常出现,则 printCurrent 方法不可能打印出一个奇数。但如果出现异常,counter 的第一次自增并不会回滚,也就是说这次 increase 调用只将 counter 自增1。与其相对的是下面这段 SQL 代码:

1
2
3
4
5
6
7
8
begin transaction;
update counter set value = value + 1 where id = 1;
if ROUND(RAND(),0)=1
begin;
THROW 50000, 'Oops!', 1
end;
update counter set value = value + 1 where id = 1;
commit;

即便没有异常出现,如果没有 Isolation(或者 Isolation.level = READ_UNCOMMMITTED),则其它事务能看到这段代码的中间状态,但如果有异常出现,第一次自增的操作会被回滚。
从这个角度来说,ACID 中的 Atomicity 更多的指的是在错误出现时能够自动撤销之前修改,也许把 "A" 理解成 Abortability 更恰当。

一致性(Consistency)

ACID 中的一致性,表示事务只会将数据从一种“正确”的状态修改成另一种“正确”的状态。举例来说如果说有一个用户交易系统,所有的事务只会把金额从一个账户转移到另一个账户,那么可以保证的是无论执行多少次转账交易,该系统所有账户的余额都是“正确”的。
这里的“正确”之所以要打引号是因为它是由应用定义的,除了一些外键约束,唯一约束之外,数据库并不能理解当前的数据是否符合你对“正确”的定义。
换句话说,原子性,隔离性,持久性是数据库的属性,但一致性可能更应该被看成应用的属性,应用通过数据库提供的原子性,隔离性来保证数据的一致性。因此 "C" 并不真的属于 “ACID”(It was said that the C in ACID was "tossed in to make the acronym work")。

阅读全文

分布式锁真的“安全”吗?

今天偶然间读到了 Martin Kleppmann 与 Salvatore Sanfilippo 关于 Redlock 算法是否”安全“的讨论,觉得挺有启发的,因此打算把目前的思考记下来。由于这篇文章比较长,这里提前剧透我的结论,“所有带有效期的分布式锁本质上都是不“安全”的,只有“安全”的资源服务,没有“安全”的分布式锁”。

背景

Martin Kleppmann 是剑桥大学分布式系统领域的一名研究员,同时也是 Designing Data-Intensive Applications 这本书的作者,他在个人博客中发了一篇文章 How to do distributed locking,其中涉及了大量对 Redlock 算法安全性的质疑,Salvatore Sanfilippo(Redis 的创始人,也是这里 Redlock 算法的作者)随后发表 Is Redlock safe? 回应这些质疑,这篇文章总结了这两篇文章讨论的重点和我对这些问题的想法。

术语和约定

像之前的翻译文章一样,一些专业术语翻译成中文反而不好理解,这里提前解释一下这些术语。

  • safety 属性
    简单说 safety 就是保证不会有坏事发生。如果该属性被违背,我们一般可以确切的知道它们在哪个时间点被违背,比如说集合元素的唯一性就是 safety 属性。如果一个集合插入了一个重复元素,那么在插入的这个时间点违反了唯一性这个 safety 属性。(注意不要混淆这里的 safety 属性和文章标题中“安全”一词的含义)
  • liveness 属性
    简单说,liveness 就是保证好事最终会发生。比如说最终一致性就是 liveness 属性(一般 liveness 属性定义中都包含”最终“二字)

"Intuitively, a safety property describes what is allowed to happen, and a liveness property describes what must happen."

为了更好的描述问题,我们先定义下面三种角色:

  • 资源服务:即需要被锁保护的资源。
  • 锁服务:即本文 Redlock 算法扮演的角色。
  • 锁用户:申请与释放锁的客户端。(下文可能简称为用户)

使用分布式锁的目的主要有两种,分别是:

  1. 效率(Efficiency):通过锁来避免多次做重复的工作,计算重复的内容等等。这种场景下即便偶然出现多个用户同时持有锁,并同时与资源服务发生交互,也是可以忍受的。
  2. 正确性(Correctness):也就是文章标题所说的“安全”,我们希望资源服务在锁的保护下能够做“正确”的事。更严谨的说,我们希望任一时刻,只有一个用户能够访问资源服务,而且即便锁在该用户在与资源服务交互的中途过期,也不至于破坏资源服务的一致性。

无论出于哪种目的,单从分布式锁服务的角度来说,我们都希望它具有如下属性(下文将以属性1,属性2,属性3来引用这些属性):

  1. 互斥(safety 属性):在任一时刻,只有一个用户能持有锁。
  2. 避免死锁(liveness 属性):每把锁都有一个有效期,超出有效期则自动释放锁。如果没有这样的自动释放机制,那么一个已获得锁的用户宕机或失联,将导致资源被持续锁定直至该用户故障被修复,在大部分场景中,这是不可接受的。
  3. 容错(liveness 属性):没有单点失败问题,只要系统中多数锁服务节点正常工作,用户就能够获取和释放锁。

下文讨论的 RedLock 算法期望解决的主要问题是单点 Redis 作为分布式锁服务时无法满足属性3,下面先来了解一下该算法。

阅读全文

(译)Strong Consistency Models

(封面图片来自 Consistency Models

最近打算尝试一下翻译。由于我的英语基本停留在高中水平,所以不会严格按照原文来翻译,再加上我喜欢加入自己的理解(个人水平有限,所以我的理解应该也没啥参考价值)。所以有一定英语基础的同学还是建议自己阅读原文:Strong Consistency Models

基础概念解释

一些专业术语翻译成中文后往往更加难以理解,因此我不会翻译这些词,下面先简单解释一些本文中用得比较多的术语,其中的定义来自于 Consistency Models 这篇文章。这里只是做一个笼统的翻译。

  • Systems
    分布式系统是一种并发 system,很多关于并发控制的研究可以直接应用到分布式 system 中。不过,大部分我们将要讨论的概念最开始是为单点并发系统设计的。它们之间在可用性和性能上还是有一些区别。
    System 的逻辑状态会随着时间改变。比如说单个整型变量就可以是一个简单的 system,它有类似于 0, 3, 42 这样的状态。一个互斥锁 system 有两种状态:locked 和 unlocked.
  • Operations
    一个 operation 是 system 从一种状态到另一种状态间的转移。比如说,一个单变量 system 可能有类似于读取和写入这样的 operation,它们分别用来获取和设置该变量的值。一个计数器可能有自增,自减,读取这样的 operation。
  • Histories
    一个 history 是一系列 operation 的集合,包括它们的并发结构。这里将其表述成一个包含 operation 的调用和完成的有序列表(an ordered list of invocation and completion operations)。
  • Consistency Models
    一个 consistency model 是一系列 history 的集合。我们用 consistency models 来定义哪些 histories 在 system 中是“好的”或者“合法的”。当我们说一个 history 违反了 serializability 或者不是 serializable 的时候,我们指的是这个 history 不在 serializable consistency model 允许的 history 集合。

阅读全文

Common Pitfalls in JPA(Hibernate)

Nowadays, ORM technique has been playing an important role in object-oriented programming, and JPA is now considered the standard industry approach for ORM in the Java industry. In this post, I summarized several phenomena which violate my intuition and prone to error.

As JPA itself is just a specification, there are various underlying implementation. In this post, we are only focusing on Hibernate implementation. In fact, I've never used or tested any other implementation so far, which means there's a chance that a problem cannot be reproduced in other JPA implementation.

Prerequisites

As in post Common Pitfalls of Declarative Transaction Management in Spring, all the samples are written in Kotlin language. And Spring Data JPA framework is used for the sake of convenience. Full source code can be found at common-pitfalls-in-jpa-hibernate.

Pitfall 1: Don't be fooled by equals and hashcode methods

You may already know that there are several contracts we have to obey when implementing equals and hashcode method. Namely Reflexivity, Symmetry, Transitivity, Consistency and "Non-nullity". When it comes to a JPA entity, things become even more difficult since entity state transitions must be taken into account. In other words, equals and hashcode methods must behave consistently across all entity state transitions. Thus, we can immediately conclude that logical key(usually auto generate after the first time being persisted) should not be taken into consideration. AbstractPersistable from spring data JPA library is a perfect counterexample which implements equals and hashcode based on auto-generation id. The following code demonstrates its flaw:

1
2
@Entity
class Demo1 : AbstractPersistable<Int>()

1
2
3
4
5
6
7
8
9
10
11
12
@Test
fun test() {
val demo = Demo1()
val set = hashSetOf(demo)
set.contains(demo) // true
entityManager.persist(demo)
entityManager.flush()
set.contains(demo) // false
}

The HashSet failed to recognize the same entity since its hashcode changed after being persisted. Certainly, this is error-prone. For similar reason, default equals and hashcode inherited from java.lang.Object is not suitable for JPA entity either. Code below shows that a merged entity isn't equal to itself because entityManager.merge may return a different object reference.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Entity
class Demo2 : Persistable<Int> {
@Id
@GeneratedValue
private var id: Int? = null
override fun getId(): Int? {
return id
}
override fun isNew(): Boolean {
return id == null
}
// inherit equals and hashcode from Object
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Test
fun test() {
val demo = Demo2()
val set = hashSetOf(demo)
set.contains(demo) // true
entityManager.persist(demo)
entityManager.flush()
entityManager.detach(demo)
val managed = entityManager.merge(demo)
set.contains(managed) // false
}

Now, the only option left to us is implementing equals and hashcode methods based on some business key, and never change the key after the entity is created. However, you can not always find such keys in practical. In such cases, the best we can do is no matter which way we choose to implement the methods, be aware of its shortcomings and document them clearly.

Reference:

阅读全文