We are looking into enabling Sql Server Change Data Capture for a new subsystem we are building.
It’s not really because we need it, but we are being pushed for having a complete history traceability, and CDC would nicely solve this requirement with minimum effort on our parts.
We are following an agile development process, which in this case means that we frequently make changes to the database schema, e.g. adding new columns, moving data to other columns, etc.
We did a small test where we created a table, enabled CDC for that table, and then added a new column to the table. Changes to the new column is not registered in the CDC table.
Is there a mechanism to update the CDC table to the new schema, and are there any best practices to how you deal with captured data when migrating the database schema?
2
We have also recently started looking at CDC. I’m not an expert on the subject, but I think I have some answers for your questions.
For the most part, CDC will help you achieve your goal of a completely traceable history, but I don’t think it will get you all of the way there.
First off:
we frequently make changes to the database schema … Is there a mechanism to update the CDC table to the new schema
And this is where I think CDC will fail you. The MSDN documentation under the section “Understanding Change Tracking Overhead” is pretty clear that it won’t track the schema changes for you. For example, with Alter Table Add Column
:
If a new column is added to the change tracked table, the addition of the column is not tracked. Only the updates and changes that are made to the new column are tracked.
Drop Column
is a little bit more complex.
However, you should be using DB scripts to alter your schema so you don’t necessarily have to rely upon CDC here. That allows you to have consistency between your QA and Production schemas. And change to QA should be performed by script so the exact same changes can be applied to Prod. It shouldn’t be too hard to extract out the schema changes from those scripts. This may mean that the “time” dimension of your history be driven by version instead of actual time, but the end result will be the same.
If you don’t have one already, create a table to track the version of your database schema. And then place that database schema version table under CDC so you can align macroscopic changes to the schema against the microscopic changes within a particular table.
To my understanding, you should still be seeing the data added to the new column(s) regardless of CDC not showing the schema change. And the data migration from table to table should also be picked up by CDC.
are there any best practices to how you deal with captured data when migrating the database schema?
Treat it like you would treat an audit. You need to understand what it is you’re examining, why you’re examining it, and how long you need to keep that information around. Scope and retention are the two biggest bugaboos when it comes to a task like this.
CDC’s reporting tools are understandably austere, so you have to know the context of the changes. It’s too easy to say “track everything!” and end up with nothing that’s usable as a result. Likewise, you could be doubling the size of your database by keeping a copy of every change. On a high churn table with many inserts and deletes, you’ll end up with astronomical growth. That’s not bad in and of itself, but you need to budget for that growth and have a means to examine all of the data that’s generated.
So this gets you back to understanding why you are being pushed to have complete traceability. There are certainly valid reasons for that requirement. But you won’t be able to structure your effective auditing of the database until you know why you must meet that requirement.
You are able to track addition of columns with DDL triggers.
CREATE TRIGGER trigger_name
ON { ALL SERVER | DATABASE }
[ WITH <ddl_trigger_option> [ ,...n ] ]
{ FOR | AFTER } { event_type | event_group } [ ,...n ]
AS { sql_statement [ ; ] [ ,...n ] |
EXTERNAL NAME < method specifier > [ ; ] }
You can use the event group DDL_TABLE_EVENTS to fire for the CREATE, DROP, or ALTER of a table.
However, you might want to take a look at SQL Server Audit, if you are using enterprise >2008, as it can “combines all the auditing capabilities into an audit specification”. This msdn link has a detailed article about it: http://msdn.microsoft.com/en-us/library/dd392015(v=sql.100).aspx.
CREATE SERVER AUDIT audit_name
TO { [ FILE (<file_options> [, ...n]) ] |
APPLICATION_LOG | SECURITY_LOG }
[ WITH ( <audit_options> [, ...n] ) ] }[ ; ]
<file_options>::=
{FILEPATH = 'os_file_path'
[, MAXSIZE = { max_size { MB | GB | TB } | UNLIMITED } ]
[, MAX_ROLLOVER_FILES = integer ]
[, RESERVE_DISK_SPACE = { ON | OFF } ] }
<audit_options>::=
{ [ QUEUE_DELAY = integer ]
[, ON_FAILURE = { CONTINUE | SHUTDOWN } ]
[, AUDIT_GUID = uniqueidentifier ]}
Then you create Server or Database Specifications.
Simple-Talk has a great article about change data capture and a very through guide on CDC as well as the MSDN documentation explains the various methods depending on the requirements. But they both can help you with your specific requirements.