I have a proof of concept application that uses Azure tables to associate DNA sequences to “something”.
Table 1 is the master table. It uniquely lists every DNA sequence. The PK is a load balanced hash of the RK. The RK is the unique encoded value of the DNA sequence.
Additional tables are created per subject. The PK is a load balanced hash, and the RK is the unique value of the DNA sequence. Assume that the quantity of RKs here is many order of magnitudes smaller than the Master table. Each subject has a list of N DNA sequences that have one reference in the Master table, where N is > 100,000.
It is possible for many tables to reference the same DNA sequence, but in this case only one entry will be present in the Master table.
My Azure dilemma:
I need to lock the reference in the Master table as I work with the data. I need to handle timeouts, and prevent other threads from overwriting my data as one C# thread is working with the information. Other threads need to realise that this is locked, and move onto other unlocked records and do the work.
Ideally I’d like to get some progress report of how my computation is going, and have the option to cancel the process (and unwind the locks).
Question
What is the best approach for this?
I’m looking at these code snippets for inspiration:
Link
https://stackoverflow.com/q/4535740/328397
6
While I’ve not worked with Azure, only on-premise SQL Server, it seems like this is really a matter of asynchronous data access/concurrency.
Why not simply have each thread keep track in a centralized location (e.g. another table) which set of data it’s working with. Then, subsequent threads can ignore any records that are already “checked out” to another thread. Once the thread is finished working with the data it has “checked out”, it would then remove the record from the centralized location.
Also, you could include the UTC timestamp of when the record was “checked out”, so that if something where to fail or otherwise timeout, then the “check out” record (i.e. the lock) could be cleared by the next thread, or even a seperate worker process.