Database design for objects with multiple states

Context

I’m designing a database which, simplified, should be able to handle users sending job requests to each other, and after that a job can be started, finished, and reviewed. The design should be scalable (think millions of users).

Approaches I’ve considered:

Gargantuan table

One approach, probably not the best one, would be to simply store ALL jobs in one, huge table jobs. This table would need a state column to represent which state the job is currently in (e.g. ACCEPTED, STARTED, FINISHED, REVIEWED e.t.c.). The biggest problem with this approach that I can see is that jobs in different states have different types of data that are relevant to them. For example, a job request has a preliminary agreed upon price, but that could change before the job is started, and change again before the job is finished. This could of course be solved by just adding more columns to the table and naming them properly, but it will probably become a huge bottleneck performance-wise very early to have one table containing all the different types of possible data for all the different possible states of a job.

Different tables for different states

This approach would be to have multiple tables, for example job_requests, jobs_started, jobs_finished, tables who in turn can have substates, e.g. job_requests could have the sub-states PENDING, ACCEPTED, while the jobs_finished table would have the substates COMPLETED, CANCELLED, REVIEWED.

With this approach each table only contains data which is relevant to the current job state, but on the other hand some data might be duplicated (for example the user ids of the job requester, and job receiver — on the other hand this information could be stored in yet another table?).
The problem with this approach is that I can’t think of a good solution on how to archive all the information when transitioning between states. For example, once a job request has been accepted, and then started, it should be deleted from the job_requests table and moved into the jobs_started table, but it’s just a matter of time before a stakeholder wants to know for example how long the average time is between a job request being created, until it’s been started, at which point I’d need the data from the job_requests table to be able to calculate it.

It feels like this type of problem should be easy to solve, but I really can’t think of any good solution which “feels right”, any solution I come up with feels ugly and I can immediately think of a number of things which makes the solution bad.

Very grateful for any feedback or tips on approaches I could take. Thanks in advance!

Sounds like you have 3 major categories of data you are trying to store:

General job data (job id, job requester id, job receiver id etc)
State transitions (job started, job finished)
State-specific job data
(optional) job-related events (price changed, job receiver user reassigned etc)

The key is to separate event-like data from everything else.

Schema design

Here are some details:

1. General jobs table

All of the information that’s NOT state-specific goes into the (let’s say) “jobs” table. Auto-generated primary key: job_id

2. State transitions table

All information about state transitions goes into “job_state_transitions” table, which might have the following columns:

job_transition_id
job_id
created_at
from_state
to_state

Ideally this table is append-only. Nothing is every updated or removed here.

Using such a table, you can find out the latest status of any particular job by selecting the latest row for a given job_id from the job_transitions table. You can further denormalize that and introduce a “job_state” column, the contents of which are updated every time a new row is inserted in job_transition table (stored procedures might help here if that’s your thing).

You can also do all sorts of analysis on state transitions, because the timing data is preserved (created_at is a date/time field that can help with that)

3. State-specific job data

All of the state-specific data goes into “[state]-jobs” tables. Primary key: some sequence id. Main index: job_transition_id

4. Optional event data

You can also introduce an “audit trail” table that allows you to keep track of various changes that users might request for each job, like your “change agreed upon price”. This is a generalization of the state transition table: one main table that contains events and a one supplemental table for each event type (for example, price_changes table with job_id, created_at, from_price and to_price columns).

Scaling to millions of users

If the main “jobs” table grows to be unwieldy, you can shard it by job_id or requesting_user_id or something like that.

Likewise, events table should be append-only and can be rotated or purged of events related to jobs that have been finished.

How about a hybrid of the two? A large table with a JobID and state (and perhaps other information common to all jobs and states), with other tables to manage the additional state information. That reduces (or eliminates) the duplication of data, but keeps things more manageable.

You might want to consider using a noSQL database for this specific data.
This way you can just put your object in a record, no matter what the ‘columns’ are.

Depending on the chosen database system (and implemented solution), you would be able to ‘overwrite’ a record with different data if you want. Of course, you can also save your historic data and just PUT a new record in the database.

You can start out with this record:

myStartedObject{
    id: 1,
    state:  'started',
    agreedUponPrice:    100.00,
    someOtherData:  'x'
}

and over time it would change to something like this, or if you want to keep the history records, the id would change.:

myFinishedObject{
    id: 1
    state:  'finished',
    finishedPrice:  209.00,
    someFinishedData:   'y'
}

Storing all state date inside the object is also an option of course, something like this:

myObject
{
    id: 1,
    states: [{ state: 'started', agreedUponPrice: 100.00, someOtherData: 'x'}, 
        { state: 'finished', finishedPrice: 209.00, someFinishedData: 'y'}]
}

Keep in mind, this surely isn’t the silver bullet, but it might be something to consider as you state your ‘schema’ is quite loose and you don’t want to pollute your RDBMS with all of those columns.

Also keep in mind, any RDBMS can do this also. I know in MS SQL you have the XML data type. With this, you can store your data in this XML column. This XML column can contain your deserialized object for instance.
It’s not a ‘real’ noSQL solution, but it looks a bit like one.

If you treat this problem as a workflow problem, you can employ one of the workflow tools such as jBPM or Windows Workflow Foundation and focus on the business logic and let the tool handle the persistence for you.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 20:15

Thẻ: architectural-patterns, architecture, database, database-design, enterprise-architecture

Database design for objects with multiple states

Context

Approaches I’ve considered:

Gargantuan table

Different tables for different states

Very grateful for any feedback or tips on approaches I could take. Thanks in advance!

Sounds like you have 3 major categories of data you are trying to store:

General job data (job id, job requester id, job receiver id etc)
State transitions (job started, job finished)
State-specific job data
(optional) job-related events (price changed, job receiver user reassigned etc)

The key is to separate event-like data from everything else.

Schema design

Here are some details:

1. General jobs table

All of the information that’s NOT state-specific goes into the (let’s say) “jobs” table. Auto-generated primary key: job_id

2. State transitions table

All information about state transitions goes into “job_state_transitions” table, which might have the following columns:

job_transition_id
job_id
created_at
from_state
to_state

Ideally this table is append-only. Nothing is every updated or removed here.

You can also do all sorts of analysis on state transitions, because the timing data is preserved (created_at is a date/time field that can help with that)

3. State-specific job data

All of the state-specific data goes into “[state]-jobs” tables. Primary key: some sequence id. Main index: job_transition_id

4. Optional event data

Scaling to millions of users

If the main “jobs” table grows to be unwieldy, you can shard it by job_id or requesting_user_id or something like that.

Likewise, events table should be append-only and can be rotated or purged of events related to jobs that have been finished.

You might want to consider using a noSQL database for this specific data.
This way you can just put your object in a record, no matter what the ‘columns’ are.

You can start out with this record:

myStartedObject{
    id: 1,
    state:  'started',
    agreedUponPrice:    100.00,
    someOtherData:  'x'
}

and over time it would change to something like this, or if you want to keep the history records, the id would change.:

myFinishedObject{
    id: 1
    state:  'finished',
    finishedPrice:  209.00,
    someFinishedData:   'y'
}

Storing all state date inside the object is also an option of course, something like this:

myObject
{
    id: 1,
    states: [{ state: 'started', agreedUponPrice: 100.00, someOtherData: 'x'}, 
        { state: 'finished', finishedPrice: 209.00, someFinishedData: 'y'}]
}

Filed under: softwareengineering - @ 20:15

Thẻ: architectural-patterns, architecture, database, database-design, enterprise-architecture

Thiết kế website giá rẻ

Danh mục

Database design for objects with multiple states

Schema design

1. General jobs table

2. State transitions table

3. State-specific job data

4. Optional event data

Scaling to millions of users

Database design for objects with multiple states

Schema design

1. General jobs table

2. State transitions table

3. State-specific job data

4. Optional event data

Scaling to millions of users