In my education I have been told that it is a flawed idea to expose actual primary keys (not only DB keys, but all primary accessors) to the user.
I always thought it to be a security problem (because an attacker could attempt to read stuff not their own).
Now I have to check if the user is allowed to access anyway, so is there a different reason behind it?
Also, as my users have to access the data anyway I will need to have a public key for the outside world somewhere in between. Now that public key has the same problems as the primary key, doesn’t it?
There has been the request of an example on why do that anyway, so here is one.
Keep in mind that the question is meant to be about the principle itself not only if it applies in this example. Answers addressing other situations are explicitly welcome.
Application (Web, Mobile) that handles activity, has multiple UIs and at least one automated API for intersystem communication (e.G. the accounting department wants to know how much to charge the customer based on what has been done).
The Application has multiple customers so separation of their data (logically, the data is stored in the same DB) is a must have of the system. Each request will be checked for validity no matter what.
Activity is very fine granular so it is together in some container object, lets call it “Task”.
Three usecases:
- User A wants to send User B to some Task so he sends him a link (HTTP) to get some Activity done there.
- User B has to go outside the building so he opens the Task on his mobile device.
- Accounting wants to charge the customer for the Task, but uses a third party accounting system that automatically loads the Task / Activity by some code that refers to the REST – API of the Application
Each of the usecases requires (or gets easier if) the agent to have some addressable identifier for the Task and the Activity.
7
Also, as my users have to access the data anyway I will need to have a public key for the outside world somewhere in between.
Exactly. Take the stateless HTTP, who would otherwise not know what resource it should request: it exposes your question’s ID 218306
in the URL. Perhaps you’re actually wondering whether an exposed identifier may be predictable?
The only places where I’ve heard a negative answer to that, used the rationale: “But they can change the ID in the URL!”. So they used GUIDs instead of implementing proper authorization.
I can imagine one situation where you don’t want your identifiers to be predictable: resource harvesting. If you have an site that publicly hosts certain resources others may be interesting in, and you host them like /images/n.jpg
or /videos/n.mp4
where n
just is an incrementing number, anyone looking at traffic to and from your website can harvest all your resources.
So, to directly answer your question: no, it is not bad to directly “expose” identifiers that only have meaning to your program, usually it is even required for your program to succesfully operate.
7
Because Primary Keys are an implementation detail.
If you migrate databases, your primary keys might change due to order of insertion, removal of old records… a few different reasons. If you migrate database platforms you may no longer have an actual primary key at all. Exposing the PK above the data access layer is a leaky abstraction, with all of the coupling concerns that entails.
8
You shouldn’t expose it because people who see it will start to use it as their ‘account number’ which it is NOT. For example, for my bank account I know what my account number is. I’ve memorized it, I use it on the phone with customer service, I use it when filling out forms for other banks to do transfers, for legal documents, for my auto-pay service, etc, etc. I don’t want it to change. The primary key (for my account) on the other hand, I don’t know or ever see.
The system that stores it changes over the years from one system to another, through bank merges, system upgrades and replacements, etc, etc.
The primary keys may change through some of these transformation, so if its never been exposed, written down or remembered by any regular user that’s a real good thing.
Keys with no business meaning are often termed surrogate keys and are often (but not always) used as primary keys.
btw, this even happens internally when people built interfaces and programs that misuse and expose primary keys and make them part of such systems instead of them just doing one thing – uniquely identifying a database record internally. I actually learned the above through a 6 year stint supporting a data warehouse system in a hospital.
7
This is a combination answer of the other ones (aka. what I have learned). If you feel like upvoting this one, you should at least upvote one of the other ones as well as they did the actual work. If you are more interested, read the other answers instead.
You should not expose the databases primary key but instead use a surrogate key
- If you want your users to be able to remember (at least a bit) or recognize the identifier of an entry. (Graystone28s Answer)
- If you want to plan ahead and consider that you might change systems (Database or otherwise) that will likely change your PK. (Telastyns Answer)
- If you want to ensure your users have a consistent way of accessing data that wont change even if your company shifts ownership and the data is thous migrated into a different system altogether. (Michael Durrants Answer)
- If your PK is predictable (like a sequence) your system can suffer resource harvest problems. (CodeCasters Answer) This only applies if your system has information that is worth harvesting and that is accessible by anybody or at least somebody who has a harvesting interest.
Note: Your created key should be (kinda) human comprehensible (Sqlvogels Answer).
If your system has no need for 1. to 4. then there is no reason not to use the databases PK as your public identifier (several of the answers). Also security is not an issue here (several of the answers).
One reason I’ve found, in the fullness of time I’ve seen end users request that their identifier means something (like having a prefix, or an indicator of the year it was incepted). Changing a PK is hard, but a surrogate is much easier.
Your primary key will likely be something you want your database indexing on for performance reasons, and you may in time for technical reason change it for example from a number to a guid… you just don’t know what reasons new technologies or knowledge might guide you down. Your pk is your technical item of data, the public key is for end users consumption.
4
For most applications it’s pretty much essential that you DO expose keys to users. To use an information system effectively the users of that system will normally need a way to identify the information within it and to relate that information to something in the world outside the database. In relational database terms, those identifiers are keys.
One well-used design pattern is to create an additional, purely “technical” key for database tables as a means of abstraction. For example to provide a stable (relatively unchanging) key where some alternative key is subject to change. Such technical keys typically aren’t exposed to end users because doing so undermines the intended abstraction from user requirements. It has nothing to do with security.
The problem / misunderstanding implicit in your question is due to inappropriate use of the term primary key. A primary key is just one among several “candidate” keys (several possible identifiers in a database table). The primary key doesn’t necessarily require any fundamentally different property to any other key so assertions and design principles that apply specifically to primary keys and not to other keys are always suspect and often wrong.
Given that you will usually need to expose a key to your user, what should that key be? Try to make your keys Familiar, Simple and Stable. Familiarity and simplicity make keys easy to read and remember and will help avoid data entry errors. Stability means the key changes infrequently which also helps avoid the possibility of misidentification.
3
It’s completely normal for entities have a unique identifier that is exposed to the outside world. For some objects it might be possible to find an identifier that actually has a meaning (for example invoice number) but for other no such identifier exists and therefore it has to be generated.
For the sake of consistency and readability I find it a good practice for all the entities in a system to use the exact same type and name for their identifier. Normally this identifier would be exposed (<type> getId()
) in some abstract base class.
For the same reason each service in the system (for example invoice service) should provide identical methods for accessing entities by their identifier. Normally this method (findById(<type> id)
) would be inherited from a generic service interface or base class.
This identifier doesn’t have to be the primary key of the entity but it can be one. Only thing one has to ensure is that the key generation strategy produces reasonably unique identifiers (not necessary universally unique but at least within the system).
If the system is later migrated (big if in my experience) to another database then it’s not a problem to use a different strategy (not based on primary keys) for creating the identifiers as long as the strategy is compatible with the original one.
2
This is from a comment on Greystone28’s answer by CodeCaster. It is an example of what you are saying:
I expose InvoiceNumber, which has a meaning to and is changeable by
the customer, but I also expose InvoiceID, which my code uses to
uniquely identify the invoice. You don’t have to (and more often don’t
want to) let the user-key be the storage-key. This question is about the latter.
What purpose in your app does diplaying the InvoiceID serve?
By expose, I’m assuming you mean the user can see it. Only expose it if the user needs it to use your app. It could be used by techniical support or some administrative stuff. I’ve worked with a few apps that do this. It does make it easier to provide support when I know the specific record in question.
3
Primary key is there, just as a handle to the tuple (record, row) you try to access as a developer. It’s also used in referential integrity (foreign key constraints), and maybe it has one or more use cases too.
Essentially, there is nothing bad about exposing it to users, or even hackers. Because I don’t know of an attack that uses primary key for example.
But in security, we have many principles (which we accept and don’t approve) and we need to adhere them:
- The principle of lease privilege
- Security through obscurity
And some other principles. What they say essentially is that:
If you don’t need to expose your data, why would you at all?
2