分布式锁真的“安全”吗?

今天偶然间读到了 Martin Kleppmann 与 Salvatore Sanfilippo 关于 Redlock 算法是否”安全“的讨论,觉得挺有启发的,因此打算把目前的思考记下来。由于这篇文章比较长,这里提前剧透我的结论,“所有带有效期的分布式锁本质上都是不“安全”的,只有“安全”的资源服务,没有“安全”的分布式锁”。

背景

Martin Kleppmann 是剑桥大学分布式系统领域的一名研究员,同时也是 Designing Data-Intensive Applications(这是目前为止我在 IT 领域读过的最值的一本书,没有之一) 这本书的作者,他在个人博客中发了一篇文章 How to do distributed locking,其中涉及了大量对 Redlock 算法安全性的质疑,Salvatore Sanfilippo(Redis 的创始人,也是这里 Redlock 算法的作者)随后发表 Is Redlock safe? 回应这些质疑,这篇文章总结了这两篇文章讨论的重点和我对这些问题的想法。

术语和约定

像之前的翻译文章一样,一些专业术语翻译成中文反而不好理解,这里提前解释一下这些术语。

  • safety 属性
    简单说 safety 就是保证不会有坏事发生。如果该属性被违背,我们一般可以确切的知道它们在哪个时间点被违背,比如说集合元素的唯一性就是 safety 属性。如果一个集合插入了一个重复元素,那么在插入的这个时间点违反了唯一性这个 safety 属性。(注意不要混淆这里的 safety 属性和文章标题中“安全”一词的含义)
  • liveness 属性
    简单说,liveness 就是保证好事最终会发生。比如说最终一致性就是 liveness 属性(一般 liveness 属性定义中都包含”最终“二字)

为了更好的描述问题,我们先定义下面三种角色:

  • 资源服务:即需要被锁保护的资源。
  • 锁服务:即本文 Redlock 算法扮演的角色。
  • 锁用户:申请与释放锁的客户端。(下文可能简称为用户)

使用分布式锁的目的主要有两种,分别是:

  1. 效率(Efficiency):通过锁来避免多次做重复的工作,计算重复的内容等等。这种场景下即便偶然出现多个用户同时持有锁,并同时与资源服务发生交互,也是可以忍受的。
  2. 正确性(Correctness):也就是文章标题所说的“安全”,我们希望资源服务在锁的保护下能够做“正确”的事。更严谨的说,我们希望任一时刻,只有一个用户能够访问资源服务,而且即便锁在该用户在与资源服务交互的中途过期,也不至于破坏资源服务的一致性。

无论出于哪种目的,单从分布式锁服务的角度来说,我们都希望它具有如下属性(下文将以属性1,属性2,属性3来引用这些属性):

  1. 互斥(safety 属性):在任一时刻,只有一个用户能持有锁。
  2. 避免死锁(liveness 属性):每把锁都有一个有效期,超出有效期则自动释放锁。如果没有这样的自动释放机制,那么一个已获得锁的用户宕机或失联,将导致资源被持续锁定直至该用户故障被修复,在大部分场景中,这是不可接受的。
  3. 容错(liveness 属性):没有单点失败问题,只要系统中多数锁服务节点正常工作,用户就能够获取和释放锁。

下文讨论的 RedLock 算法期望解决的主要问题是单点 Redis 作为分布式锁服务时无法满足属性3,下面先来了解一下该算法。

阅读全文

(译)Strong Consistency Models

(封面图片来自 Consistency Models

最近打算尝试一下翻译。由于我的英语基本停留在高中水平,所以不会严格按照原文来翻译,再加上我喜欢加入自己的理解(个人水平有限,所以我的理解应该也没啥参考价值)。所以有一定英语基础的同学还是建议自己阅读原文:Strong Consistency Models

基础概念解释

一些专业术语翻译成中文后往往更加难以理解,因此我不会翻译这些词,下面先简单解释一些本文中用得比较多的术语,其中的定义来自于 Consistency Models 这篇文章。这里只是做一个笼统的翻译。

  • Systems
    分布式系统是一种并发 system,很多关于并发控制的研究可以直接应用到分布式 system 中。不过,大部分我们将要讨论的概念最开始是为单点并发系统设计的。它们之间在可用性和性能上还是有一些区别。
    System 的逻辑状态会随着时间改变。比如说单个整型变量就可以是一个简单的 system,它有类似于 0, 3, 42 这样的状态。一个互斥锁 system 有两种状态:locked 和 unlocked.
  • Operations
    一个 operation 是 system 从一种状态到另一种状态间的转移。比如说,一个单变量 system 可能有类似于读取和写入这样的 operation,它们分别用来获取和设置该变量的值。一个计数器可能有自增,自减,读取这样的 operation。
  • Histories
    一个 history 是一系列 operation 的集合,包括它们的并发结构。这里将其表述成一个包含 operation 的调用和完成的有序列表(an ordered list of invocation and completion operations)。
  • Consistency Models
    一个 consistency model 是一系列 history 的集合。我们用 consistency models 来定义哪些 histories 在 system 中是“好的”或者“合法的”。当我们说一个 history 违反了 serializability 或者不是 serializable 的时候,我们指的是这个 history 不在 serializable consistency model 允许的 history 集合。

阅读全文

Common Pitfalls in JPA(Hibernate)

Nowadays, ORM technique has been playing an important role in object-oriented programming, and JPA is now considered the standard industry approach for ORM in the Java industry. In this post, I summarized several phenomena which violate my intuition and prone to error.

As JPA itself is just a specification, there are various underlying implementation. In this post, we are only focusing on Hibernate implementation. In fact, I've never used or tested any other implementation so far, which means there's a chance that a problem cannot be reproduced in other JPA implementation.

Prerequisites

As in post Common Pitfalls of Declarative Transaction Management in Spring, all the samples are written in Kotlin language. And Spring Data JPA framework is used for the sake of convenience. Full source code can be found at common-pitfalls-in-jpa-hibernate.

Pitfall 1: Don't be fooled by equals and hashcode methods

You may already know that there are several contracts we have to obey when implementing equals and hashcode method. Namely Reflexivity, Symmetry, Transitivity, Consistency and "Non-nullity". When it comes to a JPA entity, things become even more difficult since entity state transitions must be taken into account. In other words, equals and hashcode methods must behave consistently across all entity state transitions. Thus, we can immediately conclude that logical key(usually auto generate after the first time being persisted) should not be taken into consideration. AbstractPersistable from spring data JPA library is a perfect counterexample which implements equals and hashcode based on auto-generation id. The following code demonstrates its flaw:

1
2
@Entity
class Demo1 : AbstractPersistable<Int>()

1
2
3
4
5
6
7
8
9
10
11
12
@Test
fun test() {
val demo = Demo1()
val set = hashSetOf(demo)
set.contains(demo) // true
entityManager.persist(demo)
entityManager.flush()
set.contains(demo) // false
}

The HashSet failed to recognize the same entity since its hashcode changed after being persisted. Certainly, this is error-prone. For similar reason, default equals and hashcode inherited from java.lang.Object is not suitable for JPA entity either. Code below shows that a merged entity isn't equal to itself because entityManager.merge may return a different object reference.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@Entity
class Demo2 : Persistable<Int> {
@Id
@GeneratedValue
private var id: Int? = null
override fun getId(): Int? {
return id
}
override fun isNew(): Boolean {
return id == null
}
// inherit equals and hashcode from Object
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
@Test
fun test() {
val demo = Demo2()
val set = hashSetOf(demo)
set.contains(demo) // true
entityManager.persist(demo)
entityManager.flush()
entityManager.detach(demo)
val managed = entityManager.merge(demo)
set.contains(managed) // false
}

Now, the only option left to us is implementing equals and hashcode methods based on some business key, and never change the key after the entity is created. However, you can not always find such keys in practical. In such cases, the best we can do is no matter which way we choose to implement the methods, be aware of its shortcomings and document them clearly.

Reference:

阅读全文

Common Pitfalls of Declarative Transaction Management in Spring

Spring supports two types of transaction management, namely, programmatic and declarative transaction management. Despite the fact that programmatic management is more flexible, declarative management is still preferred since it is less invasive to application code. In this post, I'm going to summarize several pitfalls you may encounter while using declarative transaction management. Certainly, if you read the official document thoroughly, you should know how to avoid them on your own, but if you think of it is all about annotating your method with the @Transactional annotation as I did, you may never figure them out until the day your customer reports his balance is incorrect.

Prerequisites

Our samples are written in Kotlin language. In addition, I assume that you are already familiar with the following frameworks.

  • Spring
  • JPA(Hibernate implementation)
  • Spring Data JPA

Examples in this post are based on the following class:

1
2
3
4
5
6
7
8
// entity
@Entity
class DemoEntity(
val name: String
) : AbstractPersistable<Int>()
// repository
interface DemoRepo : CrudRepository<DemoEntity, Int>

Read the manual of Spring Data JPA If you are not able to understand the above code. Finally, The related test code will be available at common-pitfalls-of-declarative-transaction-management-in-spring.

Pitfall 1: @Transactional annotation may have no effect at all

It's a common circumstance that we put some code in a private method so it can be reused. If the code involves a transaction, we may end up with writing code like this.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
@Service
class DemoService(
private val demoRepo: DemoRepo
) {
fun persistAndDoSomething(demo: DemoEntity) {
persist(demo)
// do something
}
fun persistAndDoOtherthings(demo: DemoEntity) {
persist(demo)
// do other things
}
@Transactional
private fun persist(demo: DemoEntity) {
// you may think this action will be rolled back if exception occurs
demoRepo.save(demo)
unpredictableMethod()
}
// simulate a method which may or may not throw an exception
fun unpredictableMethod() {
if (ThreadLocalRandom.current().nextBoolean())
throw Exception("Oops!")
}
}

In this case, the persist method will be invoked as if no @Transactional annotation is present. To understand why, we need to know that the declarative transaction is implemented on top of AOP(Aspect Oriented Programming) proxies. A proxied method invocation procedure looks like this:
transactional-proxy
From the picture, it is not hard to imagine that the beginTransaction, commit and rollback logic is implemented in a so called "advice" component. "Advice" here refers to a core concept of AOP(read the documentation of Spring AOP to get more information), and there is two different advice mode supported by Spring transaction management, which called "PROXY" and "ASPECTJ". As the document says:

When using proxies, you should apply the @Transactional annotation only to methods with public visibility. If you do annotate protected, private or package-visible methods with the @Transactional annotation, no error is raised, but the annotated method does not exhibit the configured transactional settings. Consider the use of AspectJ if you need to annotate non-public methods.

Now the reason is pretty clear, to fix the problem, we can either switch the advice mode from "PROXY"(default option) to "ASPECTJ", or remove the private modifier from the persist method. Let's say we choose to remove the modifier, you can find that the persist is still invoked without any transaction, because we just fall into the next pitfall.

阅读全文

Understanding Zombie Process

As a programmer, I usually feel uncomfortable when top command reports there're zombie processes running on my computer. After some study, I found that the zombie process is not as scary as my thought. This article briefly introduces the zombie process in UNIX-like systems.

What is a "zombie process"?

"In UNIX System terminology, a process that has terminated, but whose parent has not yet waited for it, is called a zombie."

After we create a process via fork function, we get a parent process and a child process. The parent process sometimes needs to know how the child is terminated. In normal cases, we call wait or waitpid to fetch the termination status. However, a child process could terminate before its parent waits for it. In such a case, If the system cleared the child's information completely, its parent wouldn't be able to know its status. As a result, the kernel has to keep a small amount of information after a process terminates. A process like this that has been terminated, but not completely disappear, is called a zombie process.

Note that zombie processes should not be confused with orphan processes: an orphan process is a process that is still executing, but whose parent has died. These do not remain as zombie processes; instead, (like all orphaned processes) they are adopted by init, which waits on its children. The result is that a process that is both a zombie and an orphan will be reaped automatically.

阅读全文