I am trying to design database for this case:
- Assignments have vectors, relation is 1:N
- Assignments have submissions, relation is 1:N
- Submissions have executions, relation is 1:N
- Every execution have one vector.
Business logic
- teacher creates assignments and defines test vectors
- student upload his solution, so record in submissions is created
- After successful compilation of submission, submission is executed with defined test vectors. Each execution is one record in executions (One execution per vector).
So circular reference is created after successful execution, but if compilation failed no record is created in executions. Link between vectors and executions is needed for score calculation process, where is reference output from vectors compared to output from executions.
So in my case, circular reference is not persistent, but it depends on runtime, so it is wrong design?
The tricky part is that relational databases don’t care about the direction of the relationship or circular dependencies. In the object model we do care.
Define your “has a” relationships, and then abstract one of the classes with an interface to break the circular dependency. For example:
- Assignment aggregates TestVector (i.e. has a collection of)
- Submission has a Assignment
- Execution has a Submission
- Execution aggregates ExecutionStep (interface)
- Vector is a ExecutionStep (interface)
See how Execution refers to an abstract ExecutionStep rather than a concrete Vector? That breaks the circular dependency, because now you can define other things for the execution to run without changing the object model.
(However, looking at it now, you don’t need the foreign key from execution to vector, because you can get that with a join of Execution -> Submission -> Assignment -> Vector.)
…
When you write the classes to model these tables, then you have a circular code dependency, which is bad. So you can break the code dependency with an interface, but that’s a separate issue from the database design.
In the database design, the reason Execution -> Vector is problematic is that it duplicates Execution -> Submission -> Assignment -> Vector. So you’d have to make sure those stay in sync.
Unless:
-
you’re duplicating the data as a performance optimization & know the risks
-
you want Execution to reflect the vectors involved when it ran
That is, let’s say an Assignment adds a new Vector. Any existing Executions won’t see that new vector, because they already point to their list of vectors. But that may be a good thing, because it’s saying “When this Execution ran, it used these Vectors, even though an Assignment may have added or removed Vectors since then”.
I was thinking these artifacts wouldn’t change; that they are all immutable. But that may be a bad assumption. So I could be wrong — the reference from Execution to Vector may not be redundant. You’d know the answer to that better than me.
7