I have written a T-SQL script to count non-blank values in each column across all tables in a database. The script currently uses a cursor to iterate through each column and dynamically generate and execute SQL to count the values, excluding blanks and, in the case of date columns, excluding the date ‘1900-01-01’. Here is the script:
IF OBJECT_ID('tempdb..#ColumnValueCounts') IS NOT NULL
DROP TABLE #ColumnValueCounts;
SET NOCOUNT ON
-- Temporary table to store the results
CREATE TABLE #ColumnValueCounts (
TABLE_CATALOG NVARCHAR(255),
TABLE_SCHEMA NVARCHAR(255),
TABLE_NAME NVARCHAR(255),
COLUMN_NAME NVARCHAR(255),
ORDINAL_POSITION NVARCHAR(5),
DATA_TYPE NVARCHAR(128),
NON_BLANK_COUNT INT
);
-- Cursor to fetch table and column names
DECLARE @TableCatalog NVARCHAR(255), @TableSchema NVARCHAR(255), @TableName NVARCHAR(255)
, @ColumnName NVARCHAR(255), @DataType NVARCHAR(255)
,@ORDINAL_POSITION NVARCHAR(5), @DATA_TYPE NVARCHAR(128);
DECLARE @Sql NVARCHAR(MAX);
DECLARE table_cursor CURSOR FOR
SELECT TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, ORDINAL_POSITION, DATA_TYPE
FROM INFORMATION_SCHEMA.COLUMNS;
OPEN table_cursor;
FETCH NEXT FROM table_cursor INTO @TableCatalog, @TableSchema, @TableName, @ColumnName, @DataType
, @ORDINAL_POSITION, @DATA_TYPE;
WHILE @@FETCH_STATUS = 0
BEGIN
-- Dynamic SQL to count non-blank values
SET @Sql = 'INSERT INTO #ColumnValueCounts
(TABLE_CATALOG, TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, ORDINAL_POSITION, DATA_TYPE, NON_BLANK_COUNT)
' +
'SELECT ''' + @TableCatalog + ''', ''' + @TableSchema + ''', ''' + @TableName + ''', ''' + @ColumnName
+ ''', ''' + @ORDINAL_POSITION + ''', ''' + @DATA_TYPE + ''', COUNT(*) ' +
'FROM ' + QUOTENAME(@TableSchema) + '.' + QUOTENAME(@TableName) +
' WHERE ' + QUOTENAME(@ColumnName) + ' IS NOT NULL AND ' +
CASE WHEN @DataType IN ('date', 'datetime', 'datetime2',
'smalldatetime', 'datetimeoffset')
THEN QUOTENAME(@ColumnName) + ' <> ''1900-01-01T00:00:00'' AND '
ELSE ''
END +
'CONVERT(VARCHAR, ' + QUOTENAME(@ColumnName) + ') <> '''' ';
-- PRINT @Sql;
EXEC sp_executesql @Sql;
FETCH NEXT FROM table_cursor INTO @TableCatalog, @TableSchema, @TableName, @ColumnName, @DataType
, @ORDINAL_POSITION, @DATA_TYPE;
END;
CLOSE table_cursor;
DEALLOCATE table_cursor;
-- Select results from the temporary table
SELECT *
FROM #ColumnValueCounts;
-- Cleanup
-- DROP TABLE #ColumnValueCounts;
The script is functional but I’m concerned about performance, especially with large databases, due to the use of a cursor. I am looking for a way to achieve the same result without using a cursor. Is there a more efficient method, perhaps using a set-based approach, to count non-blank values across multiple tables and columns?
Expected Results:
I expect to have a result set similar to what the current script produces, which is a table listing each column in the database along with a count of its non-blank values. However, the new approach should ideally reduce execution time and resource usage.
Can anyone help on this please.
Sample output
TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | ORDINAL_POSITION | DATA_TYPE | NON_BLANK_COUNT |
---|---|---|---|---|---|---|
stackoverflow | dbo | WorkerDetail_1 | Name | 1 | varchar | 48 |
stackoverflow | dbo | WorkerDetail_1 | Group | 2 | varchar | 48 |
stackoverflow | dbo | WorkerDetail_1 | Date | 3 | date | 48 |
stackoverflow | dbo | WorkerDetail_1 | Hours | 4 | int | 48 |
4