Here in Brazil we have a personal identifier called CPF. This identifier has a sequence of 11 digits and is unique for each Brazilian citizen. Possibly in other countries there is something like that too.
How is a unique identifier, I chose to set it in the database like a UNIQUE KEY. And now that things get complicated. A user could be registered by an administrator and be removed by some kind of problem. However, it is not actually removed, but hidden by a deleted flag. And in another moment, he could be recreated again (there is a need to recreate it, and not re-enable it).
Because it is an individual identifier, the CPF would be reused on recreated user and would conflict in UNIQUE KEY.
In this situation, I thought of some practical solutions to solve the problem as well as its disadvantages. But I like what is the most appropriate way to solve this kind of problem.
- Do not use a UNIQUE KEY. The disadvantage is that it could increase the query time, which could perhaps be solved by a KEY-only column.
- Set the column to NULL on pseudo-DELETE. The disadvantage is that I lose the information, however, could solve just creating a second column to store its value before being annulled.
- The UNIQUE KEY could group the column with a date removal column (NULL by default). The disadvantage is having to use two columns for each group UNIQUE KEY.
- Create a second table to store data removed without UNIQUE KEY. The disadvantages here are almost incalculable!
- Remove the row, rather than hide it. The disadvantage here is that I could lose the references in a user FK.
Currently I prefer the first option, and if very necessary, I would opt for the second or third option. But I wonder if there is a more correct way, following the concept more accepted by the community.
4
what you’ve described is a unique key – only you’re thinking of it as a unique column.
You can create a unique constraint on any number of columns in a table, so here you’d combine the CPF with the deleted (or some other column if you could have multiple deleted records) column as well.
For example:
ALTER TABLE Persons
ADD CONSTRAINT uc_PersonID UNIQUE (P_Id,LastName)
2
Do not, do not, DO NOT abuse your database. A unique or primary key should be unique, without exception. Even if you’re using an archaic database system like xBase which natively supports the idea of “marking a record as deleted”, it’s just a bad idea.
There are two good and simple options, however.
1: Don’t “delete”, “mark inactive” instead.
As a SQL verb, “delete” has special meaning in any RDMS or system. It means you’re telling the system to physically remove and destroy records. Any data that happens to be recoverable is such only as a shortcut by the RDMS, and your data model should not depend on that behavior.
With a simple “Active” or “Inactive” boolean value, it should be easy to mark your person
record, and relatively simple for your DBA to recover someone they had to manually adjust for some reason. (Not that such is good practice, but that’s another topic.) Don’t muck-about with duplicate CPF’s; just declare the thing unique, and train your admin staff on how to recover an inactive person.
Of course, this method while simple also prevents the ability to easily give a person a new surrogate key while maintaining past history. If that’s important, go with option #2.
2: Use a seperate table for CPF, and have “personID” be a property of the CPF instead of the inverse.
The obvious model for designing a system with people and their unique government number is to have said number be a field on the person
table.
CREATE TABLE persons (
personID int IDENTITY PRIMARY KEY,
name varchar(250),
CPF varchar(10)
)
However, in an instance where your design demands that two person
records may exist for the same CPF
, the simple design is to have CPF’s be unique, and have them point to their current person record.
CREATE TABLE persons (
personID int IDENTITY PRIMARY KEY,
name varchar(250)
)
CREATE TABLE cpfs (
CPF varchar(10) NOT NULL PRIMARY KEY,
personID int REFERENCES person(personID)
)
Give the tables above, it is a simple matter to create a view listing only the “real” person for each CPF, along with their personID.
SELECT C.CPF, P.*
FROM cfps C
INNER JOIN persons P
ON C.personID = P.personID
And it likewise becomes simple to query for only active persons.
SELECT *
FROM someTable
WHERE personID IN (SELECT personID FROM cfps)
Now, this system does introduce a second table, and includes a modest bit of complexity. If you don’t expect multiple “person” records to stick around as a regular thing and only anticipate some few exceptions, go with option #1 instead.