I’m thinking to use an entity-attribute-value (EAV) model for some of the stuff in one of the projects, but all questions about it in Stack Overflow end up to answers calling EAV an anti-pattern.
But I’m wondering if it is that wrong in all cases.
Let’s say shop product entity, it has common features, such as name, description, image, and price, that take part in logic many places and has (semi)unique features, like watch and beach ball would be described by completely different aspects. So I think EAV would fit for storing those (semi)unique features.
All this is assuming, that for showing product list, it is enough info in product table (that means no EAV is involved) and just when showing one product/comparing up to 5 products/etc. data saved using EAV is used.
I’ve seen such approach in Magento commerce and it is quite popular, so are there cases when EAV is reasonable?
2
https://web.archive.org/web/20140831134758/http://www.dbforums.com/database-concepts-design/1619660-otlt-eav-design-why-do-people-hate.html
EAV gives a flexibility to the developer to define the schema as needed and this is good in some circumstances.
On the other hand it performs very poorly in the case of an ill-defined query and can support other bad practices.
In other words, EAV gives you enough rope to hang yourself and in this industry, things should be designed to the lowest level of complexity because the guy replacing you on the project will likely be an idiot.
8
In a nutshell, EAV is useful when your list of attributes is frequently growing, or when it’s so large that most rows would be filled with mostly NULLs if you made every attribute a column. It becomes an anti-pattern when used outside of that context.
5
Let’s say shop product entity, it has common features, like name, description, image, price, etc., that take part in logic many places and has (semi)unique features, like watch and beach ball would be described by completely different aspects. So I think EAV would fit for storing those (semi)unique features?
Using an EAV structure for has several implications that are trade offs.
You are trading off a ‘less space for the row because you don’t have 100 columns that are null
‘ against ‘more complex queries and model’.
Having an EAV typically means the value is a string that one can stuff any data into. This then has implications on validity and constraint checking. Consider the situation where you’ve put the number of batteries used as something in the EAV table. You want to find a flashlight that uses C sized batteries, but less than 4 of them.
select P.sku
from
products P
attrib Ab on (P.sku = Ab.sku and Ab.key = "batteries")
attrib Ac on (P.sku = Ac.sku and Ac.key = "count")
where
cast(Ac.value as int) < 4
and Ab.value = 'C'
...
The thing to realize here is that you can’t use an index reasonably on the value. You also can’t prevent someone from putting in something that isn’t an integer there, or an invalid integer (uses ‘-1’ batteries) because the value column is used again and again for different purposes.
This then has implications in trying to write a model for the product. You’ll have the nice typed values… but you’re also going to have a Map<String,String>
just sitting there with all sorts of stuff in it. This then has further implications when serializing it to XML or Json and the complexities of trying to do validation or queries against those structures.
Some alternatives or modifications to the pattern to consider is instead of a free form key, to have another table with valid keys. It means instead of doing string comparisons in the database, you are checking against the equality of foreign key ids. Changing the key itself is done in one spot. You’ve got a known set of keys which means that they can be done as an enum.
You could also have related tables that contain attributes of a specific class of product. A grocery department could have another table that has several attributes associated with it that the building materials doesn’t need (and vice versa).
+----------+ +--------+ +---------+
|Grocery | |Product | |BuildMat |
|id (fk) +--->|id (pk) |<---+id (fk) |
|expiration| |desc | |material |
|... | |img | |... |
+----------+ |price | +---------+
|... |
+--------+
There are times that especially call for a EAV table.
Consider the situation where you aren’t just writing a inventory system for your company where you know every product and every attribute. You are now writing an inventory system to sell to other companies. You can’t know every attribute of every product – they will need to define them.
One idea that comes out is “we’ll let the customer modify the table” and this is just bad (you get into meta-programming for table structures because you no longer know what is where, they can royally mess up the structure or corrupt the application, they’ve got the access to do wrong things and the implications of that access become significant). There’s more about this path at MVC4 : How to create model at run time?
Instead, you create the administrative interface to an EAV table and allow that to be used. If the customer wants to create an entry for ‘polkadots’ it goes into the EAV table and you already know how to deal with that.
An example of this can be seen in the database model for Redmine you can see the custom_fields table, and the custom_values table — those are parts of the EAV that allows the system to be extended.
Note that if you find your entire table structure to look like EAV rather than relational, you might want to look at the KV flavor of NoSQL (cassandra, redis, Mongo,. …). Realize that these often come with other tradeoffs in their design that may or may not be appropriate to what you are using it for. However, they are specifically designed with the intent of an EAV structure.
You may wish to read SQL vs NoSQL for an inventory management system
Following this approach with a document oriented NoSQL database (couch, mongo), you could consider each inventory item to be a document on a disk… pulling up everything in a single document is fast. Furthermore, the document is structured so that you can pull out any one single thing fast. On the other hand, searching all the documents for things that match a particular attribute can have less performance (compare using ‘grep’ against all the files)… its all a trade off.
Another approach would be LDAP where one would have a base with all of its associated items, but would then also have additional object classes applied to it for the other types of items. (see System Inventory Using LDAP)
Once you go down this path, you may find something that exactly matches what you are looking for… though everything comes with some tradeoffs.
6 years later
Now that JSON in Postgres is here, we have another option, for those who are using Postgres. If you only want to attach some extra data to a product, then your needs are fairly simple. Example:
CREATE TABLE products (sku VARCHAR(30), shipping_weight REAL, detail JSON);
INSERT INTO products ('beachball', 1.0, '{"colors": ["red", "white"], "diameter": "50cm"}');
SELECT * FROM products;
sku | weight | detail
-----------+--------+------------------------------------
beachball | 1 | {"colors": ["red", "white"], "diameter": "50cm"}
Here’s a smoother introduction to JSON in Postgres: https://www.compose.com/articles/is-postgresql-your-next-json-database/.
Note that Postgres actually stores JSONB, not plain text JSON, and it does support indexes on fields inside of a JSONB document / field, in case you discover that you actually do want to query against that data.
Also, note that fields within a JSONB field cannot be modified individually with an UPDATE query; you would have to replace the entire content of the JSONB field.
This answer may not directly address the question, but it does offer an alternative to an EAV pattern, that should be considered by anyone who is pondering the original question.
2
Typically people look the other way if you’re using it for lookup tables, or other situations where the benefit is to keep from having to create tables for one or two stored values. The situation you’re describing, where you’re basically storing item properties, sounds perfectly normal (and normalized). Widening a table to store a variable number of item attributes is a bad idea.
For the general case of storing disparate data in a long thin table…You shouldn’t be afraid to create new tables if you need to, and having just one or two long skinny tables isn’t much better than having only one or two short fat tables.
That being said, I am notorious for using EAV tables for logging. They do have some good utility.
3
EAV changes the problem of explicit structure, to implied perception. Rather than saying X is a table with columns A and B. You imply that columns A and B form table X. It’s the reverse in one sense but there isn’t a one-to-one mapping, necessarily. You could say that A and B both map to table (or type) X and Y. This could be important in the more involved domain where context matters.
I’ve been studying Datomic, for this type of approach and I think it’s a very useful and powerful system with limits on what you should do with it (not that you couldn’t).
That EAV would be slow, or “give you enough rope to hang yourself” is not a statement I would agree with. Rather, I would put more emphasis on the strengths of EAV and if it suits your problem space, you should consider it.
My experience is that it’s a wonderful almost unconstrained approach to modelling. Specifically, in the case of Datomic, they impose a ordered set semantic on top of everything. Any modelling decision which models a relationship can freely go from one, to many without having to redesign columns/tables. You can also go back as long as the constraint doesn’t violate the invariant. It’s all the same under the hood.
The problem with EAV has in my mind been with lack of an implementation like Datomic. Since this is a question about EAV I don’t want to rave on about Datomic but it is one of those things where I think they got everything right with respect to EAV.
Storing EAV as one big table with three columns entity_id, attribute_id, value
is an inefficient implementation of EAV.
For more efficiency: attributes should be columns.
Create new columns or tables when new attributes are introduced. Use a query engine that can automatically pick correct tables.
As far as I know there is no library that can do it for you so you migt have to make your own.
Example
Here is an example how you can store EAV efficiently given these entities:
e1,e2
has a1,a2,a3
e3,e4,e5
has a1,a5
table_a1a2a3
e | a1 | a2 | a3 |
---|---|---|---|
e1 | value | value | value |
e2 | value | value | value |
table_a1a5
e | a1 | a5 |
---|---|---|
e3 | value | value |
e4 | value | value |
e5 | value | value |
You can get the flexibility of EAV and the performance of hand-crafted database models this way.
For SQL database you could imagine some procedure that picks correct tables:
imagine query | generated SQL query |
---|---|
SELECT a1,a2,a3 |
SELECT a1,a2,a3 FROM table_a1a2a3 |
SELECT a1,a5 |
SELECT a1,a5 FROM table_a1a5 |
SELECT a1 |
SELECT a1 FROM table_a1a2a3 + SELECT a1 FROM table_a1a5 |
SELECT a1 WHERE a2 > a3 |
SELECT a1 FROM table_a1a2a3 WHERE a2 > a3 |
DELETE WHERE e = 1 |
DELETE FROM table_a1a2a3 WHERE e = 1 + DELETE FROM table_a1a5 WHERE e = 1 |
etc… | etc… |