I’m using SQL Server 2014 and SSMS, and I have some data that’s like ‘abc123-123’, I’d like to query whether all the entries look like this not.
Is there a way of creating this column “pattern”:
id | Value | Pattern |
---|---|---|
1 | jam21 | aaax11 |
2 | byt9*9 | aaa1x1 |
3 | 21y 21 | 11ax11 |
So that I can do a count grouped by pattern?
As long as it can differentiate alpha characters from numeric characters and those two from anything else, I can probably make it work!
Basically looking for something along the lines of:
SELECT PATTERN, COUNT(*)
FROM
(SELECT id, value, GETPATTERN(value) AS PATTERN
FROM have) AS A
GROUP BY PATTERN
ORDER BY 2 DESC
With some help around this GETPATTERN
function, assuming such a thing exists.
In Python it’s basically trivial:
def getalphanumberwhite(inString):
resultList = []
resultstring = ""
for i in inString:
if i.isnumeric():
resultList.append("1")
elif i.isalpha():
resultList.append("a")
else:
resultList.append("x")
resultstring=''.join(resultList)
return resultstring
print(getalphanumberwhite("Jam21&I 32"))
#Returns aaa11xax11
Thanks
4
Such UDF is simple to create…
CREATE OR ALTER FUNCTION dbo.F_COMPUTE_PATTERN (@STRING NVARCHAR(4000))
RETURNS NVARCHAR(4000)
WITH RETURNS NULL ON NULL INPUT
AS
BEGIN
IF @STRING = N''
RETURN N'';
DECLARE @I SMALLINT = 1, @C NCHAR(1), @RETURN NVARCHAR(4000) = N'';
WHILE @I <= LEN(@STRING)
BEGIN
SET @C = SUBSTRING(@STRING, @I, 1);
SET @RETURN = @RETURN
+ CASE
WHEN @C BETWEEN '0' AND '9' THEN '1'
WHEN @C BETWEEN 'a' AND 'z' THEN 'a'
ELSE '-'
END;
SET @I = @I + 1;
END;
RETURN @RETURN
END;
GO
And now you can do :
WITH
T AS (SELECT *
FROM (VALUES (1, 'jam21'),
(2, 'byt9*9'),
(3, '21y 21')) A (id, val)
)
SELECT *, dbo.F_COMPUTE_PATTERN(val) AS PATTERN
FROM T;
That’s the results :
id val PATTERN
---- ------- ---------
1 jam21 aaa-11
2 byt9*9 aaa1-1
3 21y 21 11a-11
But beware of collations…
1
On SQL Server 2014, you should be able to do this with an inline table-valued function, you just need to put in some bounds for your longest possible length of Value
:
CREATE FUNCTION dbo.RenderPattern2014
(
@Value varchar(10) -- or longest possible LEN(Value)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
(
WITH y AS
(
SELECT x.ordinal, result = COALESCE(p.result, 'x')
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)
-- keep adding up to longest LEN(Value)
-- or search for numbers table solutions,
-- "tally" CTEs, or recursive CTEs
) AS x(ordinal)
LEFT OUTER JOIN
(VALUES('%[A-Za-z]%', 'a'),('%[0-9]%', '1')) AS p(pattern, result)
ON SUBSTRING(@Value, x.ordinal, 1) LIKE p.pattern
WHERE x.ordinal <= LEN(@Value)
)
SELECT DISTINCT pattern = (SELECT y2.result
FROM y AS y2 ORDER BY y2.ordinal
FOR XML PATH, TYPE).value(N'.[1]',N'varchar(max)')
FROM y
);
In SQL Server 2022, you could be much tidier imo:
CREATE FUNCTION dbo.RenderPattern2022
(
@Value varchar(10) -- or longest possible LEN(Value)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
(
SELECT pattern = STRING_AGG(COALESCE(p.result, 'x'), '')
WITHIN GROUP (ORDER BY x.ordinal)
FROM STRING_SPLIT(REPLICATE(',', LEN(@Value)-1), ',', 1) AS x
LEFT OUTER JOIN (VALUES('%[A-Za-z]%', 'a'),('%[0-9]%', '1'))
AS p(pattern, result)
ON SUBSTRING(@Value, x.ordinal, 1) LIKE p.pattern
);
In either case, your query then becomes:
SELECT src.<columns>, src.Value, fn.pattern
FROM dbo.src
CROSS APPLY dbo.RenderPattern(src.Value) AS fn;
- Example db<>fiddle
Results:
id | Value | pattern |
---|---|---|
1 | jam21 | aaax11 |
2 | byt9*9 | aaa1x1 |
3 | 21y 21 | 11ax11 |
4 | Jam21&I 32 | aaa11xax11 |