Background: I am developing a wizard based web form where users can log in an fill in some details and press next and previous to go back and forth in the wizard before they actually submit. When the user submits I would make a service call that performs some processing on the data.
Issue: Not all data is available readily to the user, and the user may choose to “save” the data instead of “submitting” it, and then come back (if s/he chooses to) and finish it later. This intermediate data would be incomplete (and in some cases invalid) – but for convenience we would let the user save it.
Solutions I seen in the past: In the past I have dealt with this with 2 distinct solutions –
- Save the intermediate data in the regular tables (where submitted data is saved) and mark that data as “inactive”. The data would be “activate”d only when the user clicks submit and all data is valid. This involves modifying the service that is called on submit to accept the data on save the intermediate and invalid data. Also this puts the burden on the team to remember to check for “active” flag in every query.
- The other option is to storing the intermediate saved data in huge denormalized CLOBs. The data could be structured within the CLOBs as an XML or JSON. The intermediate data (saved) is completely disjoint from the completed (submitted). The downside here is that now the form needs to be able to work with 2 different sets of services based on the status of the workflow cycle (if in-progress read the XML from the CLOB – if completed read from the service that has valid data!)
Question:
Is there a better way to solve this problem since it does seem like a universal issue in the enterprise world at least? Could someone point me to any data or storage “patterns” that the community has been following to solve this storage issue in web workflows?
Note:
The underlying database technology can be thought of a system that supports tables and individual fields can contain very large strings like CLOBs.
1
Since you are designing a web site, there are multiple options and the selection is clearly depends on your requirement.
Save in client side
you could use cookies or local storage to achieve this, main advantage of this is you don’t have to send and retrieve data back and forth to the server to persist.
However you won’t be able to resume your work from the another client.
Save in server side
Persisting the data in the same tables and mark them as inactive will result in piling up invalid data in the system when user decides to abandon the workflow. But again this can be eliminated if the user is forced to continue or to cancel half completed workflow when the user attempt to use the workflow. One of the advantages in this one is it would be prone to less number of persistence errors as it’s already in the database. However this may not work depending on the database design.
Saving the data as XML or JSON format on the server requires additional logic, and depending on the workflow it could be complicated to implement (such as conversion from the XML/JSON to domain model).
2
we had to implement a “subscription tunnel” for collecting and filtering potential clients from the web, implementing a subscription workflow.
What we did is implementing a “provider” pattern. Behind this pattern we had all workflow data to be stored in a single (and highly de-normalized) table. This was made to simplify the “processing cost” of the provider, and to allow the workflow steps to be re-arranged on demand. This table represents all “collectable” data. We did it using MSSQLServer. I can’t say this technical choice was the better one for us, but it worked and our business people found agility concerning this strategic part of the system. The provider could be plugged from configuration files with a simple Dependency Injection.