Do some casually writing to practice my English.

Recently, I was asked to enable soft delete for all the tables I created, it sounds like a breeze, as an experienced noob, I "finished" it immediately without even think about it. This is how I did, add a boolean column named "deleted" for each table, then replace every unique index to include the "deleted" column, done! Anyway, It turns out I was too naive.

What's wrong with my naive solution?

Imagine that we have a user table:

1
2
3
4
5
6
7
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`deleted` tinyint(1) NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
UNIQUE KEY `uq_user` (`username`,`deleted`)
)

Whenever we need to "soft delete" a user, we set the value of the "deleted" column to 1, what could possibly go wrong?
Now, let's say we have a user "John Snow", we deleted the corresponding record with the following command after he was killed in GOT season 5.

1
update user set deleted = 1 where id = #{id};

then we insert it again after they bring him back in season 6.

1
insert user(username) values ("John Snow");

Everything works smoothly so far, except that we won't able to delete him again. this time update user set deleted = 1 where id = #{id}; will raise a duplicate records error, that is exactly what the unique constraint does, but apparently, it violates our intention.
The problem is, we only want the username to be unique if the user is active, we don't care if there are multiple deleted user share a username. In other words, we only want a partially unique constraint which restricts the active user.

Partially Index in PostgreSQL

If you are using PostgreSQL, you are lucky, they do have implemented the "Partially index". the create statement may look like:

1
2
3
4
5
6
CREATE TABLE "user" (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
username varchar(50) NOT NULL,
deleted boolean NOT NULL DEFAULT false
);
CREATE UNIQUE INDEX uq_user ON "user" USING btree(username) WHERE NOT deleted;

Now everything works as our expectations.

In absense of partially index

However, the majority database products don't have the "Partially index" concept. We have to take a workaround, instead of storing the deleted flag as a boolean value, now we have to use a "delete token". This is how it is done, if a record is active, we keep the token to be 0, if we want to delete a record, we set the "delete token" column to a unique value to avoid violating unique constraint. A perfect delete token could be the timestamp when you want to delete the record(the auto-generated id of the deleted record is also a fair choice). The following script shows how to do it with MySQL.

1
2
3
4
5
6
7
CREATE TABLE `user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`deleted_at` int(11) unsigned NOT NULL DEFAULT 0,
PRIMARY KEY (`id`),
UNIQUE KEY `uq_user` (`username`,`deleted_at`)
)

Note that deleted_at timestamp DEFAULT NULL doesn't work since most databases do not treat multiple null as duplicate values, which means multiple active user with a same username can pass the unique constraint.
Also, the statement of deleting a user has now becoming

1
UPDATE user SET deleted_at = unix_timestamp() WHERE id = #{id};

Now, we are finally able to "kill" John Snow multiple times.