Context
-
Reading from CSV files using SQL Server 2016 (no format option available).
-
The files are crlf/windows and each value will be wrapped in double quotes. Files are located here, note that I changed them to crlf for consistency.
Problem
Every now and then specifically I see this often with numbers, when we attempt to do a replace(value, ‘”‘, ”) and then cast to int, it will fail with error below
Error
Conversion failed when converting the nvarchar value ‘331877
‘ to data type int.
But if we do a substring() it produces a value that cannot be cast to int.
Super hacky IMHO because you need to guess at the length of the string using len().
I have verified that the results of both replace and substring look exactly alike, but when I run a
nullif(replace(value, '"', ''), substring(value, 2, len(value)-2))
it claims they are different somehow. (crosses eyes) They look the same… they are the same type… they are the same, but nullif recognizes that something is wrong. My code is below. Perhaps there is something with the coalition??
bulk insert #tmp
From 'C:acquisition_samples.csv'
WITH
(
CODEPAGE = '65001'
,FIRSTROW = 2
,FIELDTERMINATOR = ','
,ROWTERMINATOR = '0x0A'
,batchsize=10
,TABLOCK
);
-- DDL
insert into acquisition_sample(fdc_id_of_sample_food, fdc_id_of_acquisition_food) -- UPDATE file name, and columns
select
nullif(REPLACE(t.fdc_id_of_sample_food, '"', ''), substring(t.fdc_id_of_sample_food,2,LEN(t.fdc_id_of_sample_food)-2))
, nullif(REPLACE(t.fdc_id_of_acquisition_food, '"', ''), substring(t.fdc_id_of_acquisition_food,2,LEN(t.fdc_id_of_acquisition_food)-2)) as wtf
, CAST(substring(t.fdc_id_of_sample_food,2,LEN(t.fdc_id_of_sample_food)-2) AS INT)
, t.fdc_id_of_acquisition_food
, CAST(substring(t.fdc_id_of_acquisition_food,2,LEN(t.fdc_id_of_acquisition_food)-3) AS INT)
from #tmp t