A long-standing question for me has been: when do I store data (actual values) in a database table, and when do I store them right in the code?
The untold consensus has typically been as such(*):
If it is a single variable or a simple structure, or an array of a few values, put data right in the code.
[* consensus has been debated in comments and answers, but basically I wanted some kind of a premise to jump-start the question, so feel free to challenge and improve upon it]
Example:
$number = 44;
$colors = array("blue", "yellow", ... "mauve");
If it has hundreds+ of rows of data of the same type, use a database.
But there seems to be a gray area; what about cases that are not so clear? What considerations and factors does one need to pay attention to in order to make a decision?
Example:
Say your company uses 10-15 different types of motors frames that can be represented as “412T”. You have about 30 of them and they change rarely. You can either create a DB table for those or hard-code them in a database. In this case, motors are static, physical things that are not likely to change often.
Keeping them in code subjects them to source control, where in a database, DB changes are typically not tracked. But keeping them in a database frees up (separates) code from data.
Another (actual) example I can use is this question of mine: https://stackoverflow.com/questions/26169751/how-to-best-get-the-data-out-of-a-lookup-table (currently 48 rows of option data).
3
I don’t think these two statements really represent a consensus about when to hard code data:
If it is a single variable or a simple structure, or an array of a few values, put data right in the code
If it has hundreds+ of rows of data of the same type, use a database
Simple counter-example (I am sure there are better ones): programming language syntax tables are complex large structures that are often hard-coded. Take a look at an example from the Perl source code.
Instead I would focus first on asking:
-
How often does the data change?
-
Who might need to change it?
If the answer to “how often” is “more often than I want to deploy a new version of my app,” then you should not hard code the data.
If the answer to “who changes it” is “anybody besides the programmer,” then you should not hard code the data.
In a small shop, it is possible the distinction between the coder and the user has disappeared, and the idea of “deployment” has also disappeared. In this case you are the king of your own domain and you can do whatever you like.
But even in that situation, the need to collaborate can come up, and that can be difficult if a custom application does not follow a convention that programmers typically do follow.
2
I’d go with the third option: a config file!
For applications that I work on (in Java, so all my examples are use Java+Spring ), such values are usually stored in config files and injected (via Spring) into the code that needs them when the application starts up. In a properties file:
motorFramesString=412T, 413T, ...
In the spring config:
<bean="motorFrameManager" class="myCompany.MotorFrameManager" >
<property name="motorFrames" value="${motorFrames}"/>
</bean>
The advantage of this is that you can change or add more of these mostly-static values easily enough, without a recompile, and you don’t have to worry about filling your (relational) database with reference data (since that seems to be a concern, and maybe not everything needs to be in the database anyway).
As for why these values should go into a config file instead of a reference table: Do you plan to use these values mostly in code, or mostly in the database? If you have many existing queries and view and procedures that depend on these values, then it might be best to put them in the database as reference data, as it’s easier than loading them from config files and sending them as parameters to every possible query/view/procedure that references them. If the values are mostly used in application code, then the config file is probably a better choice.
A more complicated example like what you link could be done too, with or with out properties.
In products.properties:
productA.name=Product A 123
productA.hasMotor=true
productA.numFeet=1
productA.hasOutlet=true
productA.needsManual=true
productB.name=Product B 456
productB.hasMotor=false
productB.numFeet=1
productB.hasOutlet=true
productB.needsManual=true
In your spring config file:
<bean name="productA" class="com.mycompany.Product">
<property name="name" value="${productA.name}"/>
<property name="hasMotor" value="${productA.hasMotor}"/>
<!-- rest omitted for brevity -->
</bean>
<bean name="productB" class="com.mycompany.Product">
<property name="name" value="${productB.name}"/>
<property name="hasMotor" value="${productB.hasMotor}"/>
<!-- rest omitted for brevity -->
</bean>
<!-- configure as many beans as needed -->
<bean="motorFrameManager" class="myCompany.MotorFrameManager" >
<property name="motorFrames"> <!-- assumes that MotorFrameManager has a property motorFrames which is a List<Product> -->
<list>
<ref bean="productA"/>
<ref bean="productB"/>
</list>
</property>
</bean>
The nice thing about this is that if your source data is a spreadsheet (like in the question you linked to) you can use macros in Excel to automatically generate the properties and spring snippets.
11
I think the premise of the question is not quite right. The dividing factor is not the quantity of records that need to change, but the frequency of changes as well as who changes them.
Frequency
When data is volatile, in the sense that it changes often and outside of a release cycle of the software, it needs to be able to be configured outside of hard-coded values or even configuration files. A database makes sense here, especially if the application itself is capable of maintaining it.
Who
When a customer needs to be able to change data, it needs to be modifiable in a user-friendly manner and outside of a release cycle.
Common Thread
The common thread here is that when data needs to change outside of a software release, it should be stored in a database. Databases might be upgraded during a release, but the data lives on without being reset or heavily modified. When a customer needs to be able to modify data (or configure functionality), it should be stored in a database with a nice front end that is idiot-proof.
Testing
Be sure to write unit tests that validate your software in a variety of configurations. Maybe your customer turns on an optional prompt, or redefines one meter to equal twelve feet. Regardless of how reasonable the change is, if your software allows it, it damn well better validate that the change works as expected no matter how inane it is.
It is neither the frequency of changes nor the amount of data that decides where to store data.
If the data is required to run the program, it is part of the program code, so store it as a constant. All other data goes in the database.
Of course, config files, images, sounds, etc are usually better stored on the filesystem.
3
If there’s even the slightest chance you will ever get a phone call resulting in you having to rebuild an application because something hard-coded has changed, then don’t hard-code it. At the very least keep it in a config file or db table. You don’t have to provide any UI to maintain it necessarily, but dialling in and changing a config file or running an SQL UPDATE on a table is surely preferable to rebuilding the whole shooting match.
The distinction is indeed a somewhat gray area, but my approach to this kinds of issues is: does the data change in production”? Anything that changes after deployment in a production environment should go in the database, even for things that may rarely change.
So the question you should be asking is not “how often will it change?”, but “can it change?”. If a value for a property can vary within the same code iteration (without touching anything else in the code) on the production environment, it goes into the database.
What you are also missing is “program control” type information say max buffer size, number of items aloud in an order, max page size for a screen is always better either hard coded in the program or in a configuration file.
If you keep this sort of data in a database there is always the possibility of someone changing it on the fly and completely changing the behavior of a system.
The other problem is that there is no easy way to get database entries through the source control/change management/automatic build systems that you should be using.
One thing you should never keep in the database are the details needed to access the database.
You’ve a bit of a catch-22 if you need to access the database to retrieve the connect strings, username, and password for that same database after all.
Hardcode it? hmm, might work if you’re using some sort of database that installs and ships with the application.
Configuration file? More promising but what about account security?
Of course you could store the database information in that config file in encrypted form, but then you’d need to have the decryption keys stored somewhere, and that’d be hardcoded almost certainly.
Apart from that, there are things that never change. Things like the natural constants (G, pi, e, you name it), things maybe like certain regular expressions, like email validation.