Generally speaking, is it considered bad practice to allow for user created fields in a database for a webapp?
For example, I am making a home inventory webapp for my wife, and she is going to want to define her own fields for different items. I was planning to allow her to create item categories, and add “features” to those categories. Features would just be key/value stored as strings. That way if she had a category called “Audio CDs” for example, she could add features for stuff like “artist”, “tracks”, etc. But in another category like “furniture”, she could add features for stuff like “material” (wood, plastic, etc). Then any item could belong to one (or many) categories, adding those features to the item.
I can see issues where searching by these features requires string comparisons, there’s no validation of data, etc. Following agile methodology, maybe it would be better to just have her come up with new categories and attributes and I would just have to create new tables as we go. In my example, it’s a small userbase (2 of us) and the amount of records created would be small, so not too bad.
Generally speaking though, how do people handle something like this in “real life”?
3
When you start getting to “user defined fields” as is often found in bug trackers, customer resource management, and similar business tools is that they are not backed with a table with a bajillion fields (if they are, then thats likely a problem of its own).
Instead what you find are Entity Attribute Value table designs and the associated administration tool to manage the valid attributes.
Consider the following table:
+--------------+ | thing | |--------------| | id | | type | | desc | | attr1 | | attr2 | | attr3 | | attr4 | | attr5 | +--------------+
This is after you’ve added a few attributes. Instead of attr1
pretend it reads artist
or tracks
or genre
or whatever attributes the thing has. And instead of 5, what if it was 50. Clearly that is unmanageable. It also requires an update of the model and redeployment of the application to handle a new field. Not ideal.
Now consider the following table structure:
+--------------+ +---------------+ +-------------+ | thing | | thing_attr | | attr | |--------------| |---------------| |-------------| | id |<---+| thing_id (fk)| +>| id | | type | | attr_id (fk)|+-+ | name | | desc | | value | | | +--------------+ +---------------+ +-------------+
You’ve got your thing with its basic fields. You have two more tables. One with the attributes. Each field is a row in the attr
table. And then there is the thing_attr
with a pair of foreign keys relating back to the thing
table and the attr
table. And this then has a value field where you store whatever the value of the field for that entity would be.
And now you’ve got a structure where the attr table can be updated at runtime and new fields can be added (or removed) on the fly without significant impact to the overall application.
The queries are a little bit more complex and validation becomes more complex too (either funky stored procedures or all client side). Its a trade off in design.
Consider also the situation where some day you need to do a migration and you come back to the application to find that there are now a half dozen or so more attributes than the schema you originally distributed. This makes for ugly migrations and upgrades where the Entity Attribute Value table, when used correctly, can be cleaner. (Not always, but can be.)
Are there any downsides to just modifying the schema at runtime? If the user thinks a thing needs a new attribute, just dynamically add a column to the table?
If you are working with the appropriate flavor of nosql database, you could probably do this (note that the appropriate flavor of the nosql for this would probably be a key-value store which is, well, the EAV table for relational ones described above) without too much trouble. However it comes with all the compromises for nosql which are described elsewhere in great detail.
If you are instead working on a relational database – you need to have the schema. Adding the column dynamically means some subset of the following things are true:
- You are doing meta-database programming. Instead of being able to cleanly map this column to that field with a nice ORM, you are probably doing things like
select *
and then doing some complex code to find out what the data actually is (see Java’s ResultSetMetaData) and then storing that in a map (or some other datatype – but not nice fields in the code). This then throws away a fair bit of type and typo safety that you have with the traditional approach. - You’ve likely abandoned the ORM. This means you’re writing raw sql for all the code instead of letting the system do the work for you.
- You’ve given up on doing clean upgrades. What happens when the customer adds a field with one name that your next version also uses? In the matchmaking site the upgrade that wants to add a
hasdate
field for storing a timestamp has already been defined ashasdate
with a boolean for a successful match… and your upgrade breaks. - You’re trusting that the customer doesn’t break the system by using some reserved word that breaks your queries too… somewhere.
- You’ve bound yourself to one database brand. The DDL of different databases is different. Database types are the easiest example of this.
varchar2
vstext
and the like. Your code to add the column would work on MySQL but not Postgres or Oracle or SQL Server. - Do you trust the customer to actually add the data well? Sure, the EAV is far from ideal but now you’ve got some horrendous obscure table names that you the developer didn’t add, with the wrong type of index (if any), with no constraints added in the code where there need to be and so on.
- You’ve given schema modification privileges to the user running the application. Little Bobby Drop Tables isn’t possible when you’re restricted to SQL rather than DDL (sure you can do a
delete * from students
instead, but you can’t really mess up the database in bad ways). The number of things that can go wrong with schema access either from an accident or malicious activity skyrockets.
This really boils down to “don’t do it.” If you really want this, go with a known pattern of the EAV table structure or a database that is entirely dedicated to this structure. Don’t let people create arbitrary fields in a table. The headaches just aren’t worth it.
9
Doing this well is hard.
For a one-off application like what you’re planning, you can, of course, just add a column for each field, and provide a UI that makes field definition by untrained users safer than giving them an SQL command line. Or you could follow the dreaded Entity-Attribute-Value pattern, which is a classic, if somewhat scary, response to this sort of problem. Building the UI for defining EAV fields is usually much more complex than for database columns, and the queries can get pretty hairy, but for large numbers of fields (i.e., highly-sparse-matrix schemata), it may be the only way to get the job done.
2
I came a cross some thing similar recently.
I made 2 tables.
1: table Objects
Id , name, type
He is all your objects. U set name of it .
And a type of this object :- for me available types was inventory , inventory_item , office.
And usual setup was n items are child or inventory which is also child of office and I used a join table to join objects to each other
2 table settings
organization_Id , title, value , type
Settings table contain every field name for that specific object type , and value in value.
Example properties of office
Location, phone , working hours
And for items
- Amount
- Price
- Barcode
Etc, all these properties are enforced by you model and saved in settings table as separate rows ( yet use replace not insert to avoid multiple rows for same field )
So when ever I want an office I load it easy with all its relations and settings where settings object_I’d in (requested objects)
After that I pivot all rows from settings and that’s it.
And in case I wanted a setting to be specific to an item in a inventory (not global) I set object_I’d = I’d from object_objects relations table and I set settings.type = relation_setting
I hope you understand what I mean I will try to reformat answer when I get to a laptop
2
Is it bad practice to allow user defined fields?
No, it is not bad practice. It is quite common. In OO terms this is called inheritance. You have a base class inventoryItem and two inherited classes AudioCD and furniture.
Generally speaking though, how do people handle something like this in “real life”?
You have to decide how inventoryItem, AudioCD and furniture are stored in database.
If easy-query is most important for you and db-space/normalisation doesn’t matter you would implement the “table-per-hierarchy” schema.
If space/normalisation is most important for you and more complicated queries are no problem for you would implement the “table-per-type” Schema.
For more details see dotnet table-per-type-vs-table-per-hierarchy-inheritance
or java hibernate inheritance.
1