This is a long one, so apologies in advance.
I typically use STUFF and XML to concatenate multiple values where I’m expecting one distinct row, e.g.
Category | Product |
---|---|
Fruit | Apple |
Food | Apple |
select distinct
t1.Product
, Categories = STUFF((
select distinct ' / ' + t1a.Category
from table1 t1a
where
t1a.Product = t1.product
for xml path('')),1,3,'')
from table1 t1
Output:
Product | Categories |
---|---|
Apple | Food / Fruit |
With that out of the way, I’m trying to do that same function, but in a more complex query involving linked servers. I originally had a query that joined 2 tables across 2 servers, but it was way too slow, so I discovered openquery, which seems to be OK and is returning me the results way, way faster, but I’m finding some duplicate rows because of one column with differeing values, so I’m trying to squash them into a single row with the STUFF.
The awkward thing is that the 2 tables I’m trying to compare/link, one of them is implemented normally in terms of column data types. The other is implemented very poorly. On the poor table, they’re just varchar columns, and the table is a bit of a dumping ground for data, so one column which might be 90% numeric values, the rest will have alphabetical values, or other characters, or just a blank value (not null, just ”).
So I’m trying to pre-filter this poor table so that it joins/matches to the good table, but no matter what I try, it seems to be trying to pull ALL rows from the bad table, which returns a “Conversion failed when converting the varchar value ‘ABC’ to data type int” error.
Below is where I’m up to at the moment, but still the same issue
with src as (
select * from openquery(
[LinkedServer], 'select * from [LinkedServer].[a].[b].[good_table] where ID in (
select ID from [LocalServer].[a].[b].[bad_table]
where
ID not like ''%[A-Z]%'' and ID not like ''%.%'' and ID not like ''''
)')
)
select distinct
Dates = STUFF((
select distinct ' / ' + bt2.aDate
from (
select distinct
convert(nvarchar(MAX), bt1.[date], 23) as 'aDate'
, bt1.ID
from bad_table bt1
where
bt1.ID not like '%[A-Z]%' and bt1.ID not like '%.%' and bt1.ID not like ''
) bt2
where
bt2.id = s1.id
for xml path('')),1,3,'')
, s1.ID
, s1.*
from src s1
Returns
Conversion failed when converting the varchar value 'ABC' to data type int.
I’m probably getting caught out by the order of operation or something, but how can it possibly still try to match on the non-numeric rows when I’m filtering them out with CTEs and subqueries within subqueries? The only thing that remotely works is when I add in
bt2.id = s1.id and bt2.ID not in ('.', 'abc','hhh','WWQS')
Which I don’t find a good solution.