Are there any good reasons to use ER Diagrams instead of UML Class Diagrams for data modeling, given the fact that class diagrams subsume ER diagrams? Or is it just for historical reasons because database people are used to ER modeling and are not familiar with UML? So, is ERD the COBOL of data modeling?
UML and ERD are two languages that can do the same thing: model entity (or object) types and their relationship types (or associatons).
7
When I am working on a new feature, I always use ERDs. To me, the data structures are more important than the classes that will be used to interact with them, and it is important to remember that the two are not necessarily identical. At some point in the future, it may become important for me to split a class into multiple classes, or to combine the object representation of multiple tables into a single class. I may also write programs that rely on the same database using a different language, like Clojure or Haskell, where representing the result of a query as an object is unnatural.
To my mind, UML is the “COBOL of data modeling,” because it represents a period of object orientation triumphalism, where it was assumed that a single object model was at the same tier as the database. It isn’t—and shouldn’t be. This, along with Rails-influenced use of software-level data integrity constraints, has led to a lot of pain, in my experience.
Some relevant quotes:
“Show me your flowcharts and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.” – Fred Brooks
“I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important.” – Linus Torvalds
“Rule 5. Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.” – Rob Pike
“Fold knowledge into data so program logic can be stupid and robust.” – esr
8
Different modeling languages (Entity-Relation, Unified Modeling Language, and others) are simply notations for communicating a design to stakeholders. Communicating a design is technical communication, and one of the principles of good technical communication is to communicate the information clearly and concisely. Choosing a modeling notation that is understood by your audience and can communicate the desired information clearly is the first step to achieve this principle.
In his article A Comparison of Data Modeling Techniques, David Hay identifies a number of modeling notations and provides the same example model as expressed in each notation, including ER diagrams, Information Engineering, Barker’s notation, IDEF1X, Object Role Modeling, and UML. Hay discusses the difference between analysts (who need clear and easy to read diagrams that can be reasoned about) and designers (who need complete, rigorous, and expressive diagrams to use for implementation).
Scott Ambler also has some principles of Agile Modeling are relevant to this:
Travel Light. Every artifact that you create, and then decide to keep, will need to be maintained over time. If you decide to keep
seven models, then whenever a change occurs (a new/updated
requirement, a new approach is taken by your team, a new technology is
adopted, …) you will need to consider the impact of that change on
all seven models and then act accordingly. If you decide to keep only
three models then you clearly have less work to perform to support the
same change, making you more agile because you are traveling lighter.
Similarly, the more complex/detailed your models are, the more likely
it is that any given change will be harder to accomplish (the
individual model is “heavier” and is therefore more of a burden to
maintain). Every time you decide to keep a model you trade-off agility
for the convenience of having that information available to your team
in an abstract manner (hence potentially enhancing communication
within your team as well as with project stakeholders). Never
underestimate the seriousness of this trade-off. Someone trekking
across the desert will benefit from a map, a hat, good boots, and a
canteen of water they likely won’t make it if they burden themselves
with hundreds of gallons of water, a pack full of every piece of
survival gear imaginable, and a collection of books about the desert.
Similarly, a development team that decides to develop and maintain a
detailed requirements document, a detailed collection of analysis
models, a detailed collection of architectural models, and a detailed
collection of design models will quickly discover they are spending
the majority of their time updating documents instead of writing
source code.Multiple Models. You potentially need to use multiple models to develop software because each model describes a single aspect of your
software. “What models are potentially required to build modern-day
business applications?” Considering the complexity of modern day
software, you need to have a wide range of techniques in your
intellectual modeling toolkit to be effective (see Modeling Artifacts
for AM for a start at a list and Agile Models Distilled for
detailed descriptions). An important point is that you don’t need to
develop all of these models for any given system, but that depending
on the exact nature of the software you are developing you will
require at least a subset of the models. Different systems, different
subsets. Just like every fixit job at home doesn’t require you to use
every tool available to you in your toolbox, over time the variety of
jobs you perform will require you to use each tool at some point. Just
like you use some tools more than others, you will use some types of
models more than others. For more details regarding the wide range of
modeling artifacts available to you, far more than those of the UML as
I show in the essay Be Realistic About the UML.Content Is More Important Than Representation. Any given model could have several ways to represent it. For example, a UI specification
could be created using Post-It notes on a large sheet of paper (an
essential or low-fidelity prototype), as a sketch on paper or a
whiteboard, as a “traditional” prototype built using a prototyping
tool or programming language, or as a formal document including both a
visual representation as well as a textual description of the UI. An
interesting implication is that a model does not need to be a
document. Even a complex set of diagrams created using a CASE tool may
not become part of a document, instead they are used as inputs into
other artifacts, very likely source code, but never formalized as
official documentation. The point is that you take advantage of the
benefits of modeling without incurring the costs of creating and
maintaining documentation.
He also has some practices for Agile Modeling to help achieve these principles:
Apply The Right Artifact(s). Each artifact has its own specific applications. For example, a UML activity diagram is useful for
describing a business process, whereas the static structure of your
database is better represented by a physical data or persistence
model. Very often a diagram is a better choice than source code — If
a picture is worth a thousand words then a model is often worth 1024
lines of code when applied in the right circumstances (a term borrowed
from Karl Wieger’s Software Requirements) because you can often
explore design alternatives more effectively by drawing a couple
diagrams on whiteboards with your peers than you can by sitting down
and developing code samples. The implication is that you need to know
the strengths and weaknesses of each type of artifact so you know when
and when not to use them. Note that this can be very difficult because
you have Multiple Models available to you, in fact the Agile Models
Distilled page lists over 35 types of models and it is by no means
definitive.Iterate To Another Artifact. When you are working on a development
artifact — such as a use case, CRC card, sequence diagram, or even
source code — and find that you are stuck then you should consider
working on another artifact for the time being. Each artifact has its
strengths and weaknesses, each artifact is good for a certain type of
job. Whenever you find you are having difficulties working on one
artifact, perhaps you are working on a use case and find that you are
struggling to describe the business logic, then that’s a sign that you
should iterate to another artifact. For example, if you are working on
an essential use case then you may want to consider changing focus to
start working on an essential UI prototype, a CRC model, a business
rule, a system use case, or a change case. By iterating to another
artifact you immediately become “unstuck” because you are making
progress working on that other artifact. Furthermore, by changing your
point of view you often discover that you address whatever it was that
causing you to be stuck in the first place. See the essay Iterate to
Another Artifact for more thoughts.Single Source Information. Information should be stored in one place and one place only. In other words, not only should you apply the
right artifact you should also model a concept once and once only,
storing the information in the best place possible. When you are
modeling you should always be asking the questions “Do I need to
retain this information permanently?”, “If so, where is the best place
to store this information?” and “Is this information already captured
elsewhere that I could simply reference?”. Sometimes the best place to
store information is in an agile document, often it’s in source code.
Read here for more details.
First need to identify who you are communicating with and what information they need. You should choose the appropriate modeling notation and models to communicate that information to them. Once the models are created, you should use them. They should be reviewed for consistency, they should be transformed into other models, they should be included in documents, or they should be used to guide an implementation.
If you need to, consider investing in training. If you’re working with Systems Engineers who use SysML, maybe consider training everyone to read SysML models. If the software team finds the UML notation easier, consider training everyone in UML. It doesn’t have to be a formal training class – it could be passing around links to useful websites, buying a few copies of a book for a company library, lunch and learn sessions, or external training (either off-site or a trainer brought in for a session). This may make it easier to reduce the need to have multiple modeling notations used.
Second, don’t be afraid to throw away models. Perhaps the first iteration of a model could be an ER diagram. That could be used to understand the data and to create your database. However, in order to add more detail, you may choose to evolve that into a different model type, such as a class diagram. Depending on stakeholder needs, you may need to maintain both models. If you don’t, though, throw the first model away so you don’t need to maintain it or risk someone finding it and working off of an incorrect model. Future updates to the database could be driven through changes to the class diagram. At the end of the day, though, you don’t want the same information captured in multiple places.
To very clearly answer your question: yes, there are reasons to use an ER diagram over a UML model. That reason is that the ER diagram is more useful to your stakeholders than a UML model. However, using an ER diagram once doesn’t mean that you will keep it for the life of a project or product or that you won’t be creating another model in parallel or from your ER model.
I’d also recommend checking out Scott Ambler’s Agile Data site for more articles and information. It is connected to the Agile Modeling site and is part of the complete Disciplined Agile Delivery process, but it does have some good ideas regardless of the methodology you are following.
1
None of the answers so far seems to have picked up on the difference between conceptual and physical data modelling.
A UML conceptual model will show inheritance relationships, cardinallity and all that good stuff, with the minimum of implementation detail.
The physical model (ER diagram) will differ:
- Inheritance is no longer obvious. There are three classical ways of mapping inheritance to a relational database – table per concrete class, table per class hierarchy and table per base class plus table per concrete class (holding extra fields only). Many databases have a mix of the three methods, so the ER diagram clearly shows the physical mappings.
- Many-to-many mappings in the conceptual model translate to a join table in the ER diagram. In the conceptual model this is just a pair of crows feet symbols at the end of the relationship link. In the database it is a real table.
- Naming conventions are often different. For instance, my organisation would map the Java Date creationDateTime field to an Oracle column CREATION_TS TIMESTAMP.
But given the choice, I always go for the ER diagram. You can’t easily write SQL given just the conceptual model. Given an unfamiliar and undocumented database, I usually use a reverse-engineering tool to create an ER diagram. With a good tool and a database that has referential integrity constraints defined, you get great results.
The target audience for conceptual models is likely to be enterprise architects and the more technical business analysts.
So in my view, UML and ER diagrams serve similar but quite distinct purposes.
1