If we need to represent classes in a class diagram for a big project that is not completely designed yet, and the classes have to be actual tables in a database, how would we predict and design the classes?
Let’s say we have a project that is going to have more than 15 tables, do all these tables need to be classes in the class diagram?
How would classes be designed in this kind of situation? The language that is being used is Java and the class diagram has to prepared in UML. I know how to design them, I just don’t know if the tables should represent the classes in this case.
2
Designing your classes to map directly to tables can be seen as an anti-pattern. There is a natural Object – Relational impedance mismatch that you get when using OO and relational DBs.
Designing your application according to its domain independent of its data storage is often a recommended practice. Domain Driven Design is one such technique you might find useful.
It does not seem to be a big project if you have a little more than 15 tables. However, main point is which development approach and design are you planning to have?
Data modeling needs to be done before class designs. While doing your data modeling , you may also continue to work on your Architecture Modeling Technique. There are several alternatives to consider and all that will depend on your project specifications. Tools like ER diagram may help you to design high level, as well as low level class diagrams.
Depending on your application type, you may get some hints on structuring your classes from open source project samples. Let me mention an important point, as your project moves forward you may see need to slightly change the structure or add additional layers depending on requirements and design.
There are some good references to look:
- Open Source Software in Java
- Open Source Projects at Sun
- Well written java open source projects (for learning)?
6
In general, I would say no. You should consider having a DTO class that mirrors the table’s structure, but its only use is for transferring data. There should be other classes in your persistence layer that support the particular functions your application needs (e.g., you might have a Customer
class that contains multiple Address
instances, so it would make sense that getting a Customer will require that you also get any associated Addresses. All the data access should be done from your DAO or manager classes, so all your application has to worry about is creating or using a Customer). If your business logic has to worry about what table some data is stored in, you’re doing it wrong.
You should start with use case modeling and process modeling, but never with class diagrams.
Processes will tell you how the entities relate to each other.
In fact, you don’t need all your persistent classes in one diagram usually – there’s no story behind that! It’s a nice family picture, perhaps it’d be good to help trace system elements later, but you are usually told, as a rule of thumb that an UML diagram should never contain more than 15 elements, as it makes design errors easy to miss.
Imagine your system as a Rubik’s cube. In order to understand how to solve the Rubik’s cube, you usually look at one, two, or three sides. You don’t flatten the cube out!
Every single diagram should tell a story about your system: perhaps it tells the taxonomy of a family of classes, perhaps it tells how the system is used by the different actors, perhaps it tells how a certain scenario is played, perhaps it tells how a certain set of objects interoperate with each other in order to solve a problem, or how a component is structured internally.
That’s why you usually should have a lot of small diagrams: perhaps you could ask a special tool to bring them together, but since you’re a human, and most of your workmates are also humans, you’ll never be able to grasp the whole system at once.
One of the biggest challenges in software engineering (as opposed to other methodologies, like Agile) is to accept that your own skull is limited. That you can grasp around 10+-5 things, and that’s it. You either have to zoom in or out, hiding details, you have to turn the cube, or do something in general.
I’m yet to find an application which was started designed as a single, huge class diagram and didn’t contain serious design errors, visible just by looking at the diagram for 10 minutes, or – more usually – looking at the changelogs, where serious changes were had to be made.
I usually tell people to draw the data flows for each of the use cases first: what data comes in, what data goes out, what data is needed in order to get the results. Then ask what kind of pain points there can be for each step.
Shooting a big family portrait of persistent classes – this is rather limited. Of course, you should collect the classes needed to be persisted over these flow diagrams, but perhaps it’s better if a tool does it for you.
Design is not about how to solve a particular implementation. Deal with that later. First,find out, what kind of classes you need. An UML class is an abstract construct: it might not even translate for Java classes ever. An UML class is just a saying: “well, we will have things which have these meaningful features from our current viewpoint”. A Java class is a way to tell the machine how to allocate memory and for what purpose.
Once UML tells you what you need, then, only then you can start thinking about how to realize this.
And yes, use Domain-Driven Design (the blue book with the same title is quite nice).
(And I know I’ll be blamed by the Agile guys that I’m not Agile enough, but let’s not deal with this, we’re talking about UML. Most of the guys who do Agile never really understood UML, and they don’t want to care about this question I hope.)
For such a generic question, I can only give you a generic answer: it depends.
There are data-driven applications (which are very boring to create) that deliver exactly what the customer wants. In that case, your domain (if you really need one) is an exact duplicate of your database schema.
There are also applications where the database is nothing more than a way to persist state in case of an application crash. In that case, the database design is not that important (assuming it won’t have a lot of concurrent users) and the database model will probably be an duplicate of your domain model.
There are also applications where the database is modelled using the right modelling techniques and the application domain is modelled using the right OOAD techniques. The difference between them could be resolved using a ORM or manual mappers.
Tables not necessarily drive class design.
For example not all programs connect to a DB. Should tables drive class design, such a program should have no classes at all.