We are overhauling a legacy accounting app that has some large reports. One concern that has been raised is due to the transactional nature of the data, if we start a long running report how do we ensure the data is not modified before we get around to processing it?
Locking everything on the transaction db is out, what has been suggested is maintaining a replication db that can be locked, this seems sledgehammery to me.
I’m wondering if anyone has some perspective you can share, for example what is the cost of dynamically creating a replication db at report time and replicating just what we need, then blowing the temp db away? What is the cost of dynamically creating a db and/or table schema just for a single report? How to handle the same report being run concurrently? Is there an easier way?
3
if we start a long running report how do we ensure the data is not modified before we get around to processing it?
That’s not a typical requirement. What are you afraid of that affecting? The report that is currently running, or the transactional workload?
If you are worried about the reporting getting different versions of data that could pollute the report or make is a misrepresentation of the data, then you would have to ensure a point-in-time view of the database right before the report kicks off.
Note: I am a SQL Server professional so my further guidance speaks on behalf of SQL Server technology pertaining to my particular RDBMS
Database snapshots are a great solution here. When you create a database snapshot it uses a sparse file to record changed pages from the moment the snapshot is created. The snapshot itself is a transactionally-consistent static view of the database at the time of the snapshot creation. It harnesses a copy-on-write operation every time the original database is modified.
If you want to offload this to a separate server, you could leverage transactional replication for a subset of the source database (the publisher), by specifying particular articles to replicate, as opposed to the whole database. Consider this a narrow and offloaded approach, as well as a data-focused approach (vs the whole database) to reporting. You can pause the distribution of the data to the subscriber(s) when you are ready to run your report workload, and then resume when complete.
The easier solution here would be the database snapshot. Create the snapshot before you run your report. Run your report against the snapshot. Then when the report is done and you no longer need the static view of your database from that snapshot time, drop the snapshot.
2