Notes on High Performance MySQL

发表于 2024-06-29 | 分类于 Backend

Notes on High Performance MySQL, for future reference.

Chapter 4: Optimizing Schema and Data Types

Choosing Identifiers

Integers are usually the best choice of identifiers. Avoid string types for identifiers if possible, because they take up a lot of space and are generally slower than integer types. You should also be very careful with completely “random” strings, such as those produced by MD5() , SHA1() , or UUID().

阅读全文

Exploration vs Exploitation

发表于 2024-06-29 | 分类于 Diary

A short note on Standford CS234 Reinforcement Learning 2019 Lecture1
How should an RL agent balance its action?

Exploration: trying new things that might enable the agent to make better decisions in the future

Exploitation: choosing actions that are expected to yield good reward given the past experience

Often there may be an exploration-exploitation tradeoff, we may have to sacrifice reward in order to explore and learn about better policy.

To make the idea concrete, if you go to a restaurant, they have several different dishes, you want to optimize at the best dish, the best strategy is actually depends on how long you will spend near that restaurant. If you are going to live there for a long time, the best strategy is try them all, instead, when you go to the restaurant last time, you should order the known best dish.
The underlying idea is fairly simple, when it applys to human lives, it means you should try different things while you are young, and stick to whatever interests you when you gets old.
It also suggests, "Treat everyday as if it's your last day" is actually a terrible strategy. Because if it is your last day, you should always choose to do whatever gives you the maximum pleasure, but if you have future, you should take more time for "exploration".

Why I think Kotlin is preferable to Java

发表于 2019-10-26 | 分类于 Programming Language

Although I'm quite impressed by Rust language recently, Kotlin is still my favorite language. In this post, I will share the major reasons which convinced me to leave Java two years ago. It won't cover every bright side of Kotlin language, but will be enough to make my point.

TL:DR

Java the Good Parts and the Bad Parts

If you ever asked me if Java is a good programming language, I would definitely say yes. Compare to languages such as C++, VB, Javascript. Writing code in Java is much more pleasant. More specifically, its virtue including but not limited to:

Cross Platform
Statically Typed
Automatic Memory Management
Open Community
(After all, When I could not make a living by writing some fancy languages. It was Java gave me a job so that I could complain it all day.)

Anyway, just like other elder languages, Java has made many design mistakes, I won't dive into the language design topic here, as I'm not a specialist in programming language (or any other) field. I just want to share some issues that do bother me, from a mediocre programmer's perspective, then see how they are solved in Kotlin.

阅读全文

When "Soft Delete" Meets "Unique Index"

发表于 2019-05-21 | 分类于 Backend

Do some casually writing to practice my English.

Recently, I was asked to enable soft delete for all the tables I created, it sounds like a breeze, as an experienced noob, I "finished" it immediately without even think about it. This is how I did, add a boolean column named "deleted" for each table, then replace every unique index to include the "deleted" column, done! Anyway, It turns out I was too naive.

What's wrong with my naive solution?

Imagine that we have a user table:

CREATE TABLE `user` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `username` varchar(50) NOT NULL,
  `deleted` tinyint(1) NOT NULL DEFAULT 0,
  PRIMARY KEY (`id`),
  UNIQUE KEY `uq_user` (`username`,`deleted`)
)

Whenever we need to "soft delete" a user, we set the value of the "deleted" column to 1, what could possibly go wrong?
Now, let's say we have a user "John Snow", we deleted the corresponding record with the following command after he was killed in GOT season 5.

1	update user set deleted = 1 where id = #{id};

then we insert it again after they bring him back in season 6.

1	insert user(username) values ("John Snow");

Everything works smoothly so far, except that we won't able to delete him again. this time update user set deleted = 1 where id = #{id}; will raise a duplicate records error, that is exactly what the unique constraint does, but apparently, it violates our intention.
The problem is, we only want the username to be unique if the user is active, we don't care if there are multiple deleted user share a username. In other words, we only want a partially unique constraint which restricts the active user.

阅读全文

The Good Old Transaction

发表于 2019-05-07 | 分类于 Backend

随便写写跟事务相关的笔记

ACID

原子性(Atomicity)

这里的原子性含义与多线程编程中的原子性有一些细微的区别，在多线程语境中，如果一个方法满足原子性，则其它线程无法看到该方法执行的中间状态，但它并不保证该方法中的语句全生效或全不生效（All or Nothing）。相反，ACID 中的原子性保证 All or Nothing，但其并不保证其它事务是否能看到该事务执行的中间状态，在 ACID 中，该属性由隔离性(Isolation)来保证。考虑下面这段程序

var counter: Int = 0;

@Synchronized
fun increase() {
    counter++;
    if (ThreadLocalRandom.current().nextBoolean())
        throw Exception("Oops!")
    counter++;
}

@Synchronized
fun printCurrent() {
    println(counter);
}

这里 @Synchronized 保证 increase 方法是符合原子性的，这意味着，如果没有异常出现，则 printCurrent 方法不可能打印出一个奇数。但如果出现异常，counter 的第一次自增并不会回滚，也就是说这次 increase 调用只将 counter 自增1。与其相对的是下面这段 SQL 代码：

begin transaction;
update counter set value = value + 1 where id = 1;
if ROUND(RAND(),0)=1
begin;
    THROW 50000, 'Oops!', 1
end;
update counter set value = value + 1 where id = 1;
commit;

即便没有异常出现，如果没有 Isolation（或者 Isolation.level = READ_UNCOMMMITTED），则其它事务能看到这段代码的中间状态，但如果有异常出现，第一次自增的操作会被回滚。
从这个角度来说，ACID 中的 Atomicity 更多的指的是在错误出现时能够自动撤销之前修改，也许把 "A" 理解成 Abortability 更恰当。

一致性(Consistency)

ACID 中的一致性，表示事务只会将数据从一种“正确”的状态修改成另一种“正确”的状态。举例来说如果说有一个用户交易系统，所有的事务只会把金额从一个账户转移到另一个账户，那么可以保证的是无论执行多少次转账交易，该系统所有账户的余额都是“正确”的。
这里的“正确”之所以要打引号是因为它是由应用定义的，除了一些外键约束，唯一约束之外，数据库并不能理解当前的数据是否符合你对“正确”的定义。
换句话说，原子性，隔离性，持久性是数据库的属性，但一致性可能更应该被看成应用的属性，应用通过数据库提供的原子性，隔离性来保证数据的一致性。因此 "C" 并不真的属于 “ACID”（It was said that the C in ACID was "tossed in to make the acronym work"）。

阅读全文