I want to leverage the cores and SSDs on our servers to improve parallelism in T-SQL. I’m seeing lots of cores sitting idle and Disk IO at a fraction of its aggregate throughput during long-running processes. In the days of mechanical drives, things used to be universally bound by Disk IO and with the exception of sorted parallel bcp ops loading to different filegroups, we would execute our steps in sequence.
I’m curious if there’s a syntax or planned syntax in the newer versions of SQL Server that allows us to leverage parallel execution without rolling our own. For instance, let’s say we’re gathering information from several tables for some ETL piece. In pure T-SQL, we might see something like the following:
BULK INSERT StagingA FROM ...
BULK INSERT StagingB FROM ...
SELECT <column_list> INTO #A FROM StagingA
SELECT <column_list> INTO #B FROM StagingB
It seems like it could be performant, but really ugly, to split things out like this. I also have found mixed information on SqlConnections and thread safety. The pattern seems to improve performance, but according to the following post, I suspect the performance improvements I’m perceiving may in fact be due to MARS, interleaved statements and cache performance and not true parallelism: How many concurrent statements does SqlConnection support
//C#
var statements = new string[]{@"
BULK INSERT StagingA FROM ...
SELECT ... INTO #A"
, @"..."};
await Parallel.ForEachAsync(statements, async (sql, token) =>
{
await cn.ExecuteAsync(sql);
});
That leaves me with a couple other options. One thought is to use DTC with a TransactionScope and creating my temporary tables as ## tables with a unique name for the scope (or possibly MEMORY_OPTIMIZED tables using a similar naming convention). If the transaction fails, it should clean up the temporary entities. And even with the TransactionScope, I’d have to track my objects and clean them up.