I’m working with a table that has a ‘url’ field, and I need to replace specific patterns if they exist in each URL. Here is an example of my table:
<code>| url |
|-----------------------------|
| url-1.html |
| url-2.html |
| url-3.html?q=my+string-1 |
| url-3.html?q=my+string-2 |
| url-4.html?q=other-string |
| url-4.html?q=no-string |
</code>
<code>| url |
|-----------------------------|
| url-1.html |
| url-2.html |
| url-3.html?q=my+string-1 |
| url-3.html?q=my+string-2 |
| url-4.html?q=other-string |
| url-4.html?q=no-string |
</code>
| url |
|-----------------------------|
| url-1.html |
| url-2.html |
| url-3.html?q=my+string-1 |
| url-3.html?q=my+string-2 |
| url-4.html?q=other-string |
| url-4.html?q=no-string |
I have the following patterns to replace:
- text1 to textone
- my+string to mystringReplaced
- other+string to otherstringReplaced
I’m trying to accomplish this using SQL with the following query:
<code>WITH
['text1','my+string','other+string'] as replaced,
['textone','mystringReplaced','otherstringReplaced'] as substitute,
['url-1.html','url-2.html','url-3.html?q=my+string-1','url-3.html?q=my+string-2'] as urls
select * from (
select arrayMap((x,y)->if(positionCaseInsensitive(urllTemp,x)>0,(x,replaceAll(urllTemp,replaced[toInt64(y)],substitute[toInt64(y)])),(urllTemp,urllTemp)),replaced,range(1,length(replaced)+1)) as FinalURL
from
(select arrayJoin(urls) as urllTemp)) FORMAT TabSeparatedWithNames
</code>
<code>WITH
['text1','my+string','other+string'] as replaced,
['textone','mystringReplaced','otherstringReplaced'] as substitute,
['url-1.html','url-2.html','url-3.html?q=my+string-1','url-3.html?q=my+string-2'] as urls
select * from (
select arrayMap((x,y)->if(positionCaseInsensitive(urllTemp,x)>0,(x,replaceAll(urllTemp,replaced[toInt64(y)],substitute[toInt64(y)])),(urllTemp,urllTemp)),replaced,range(1,length(replaced)+1)) as FinalURL
from
(select arrayJoin(urls) as urllTemp)) FORMAT TabSeparatedWithNames
</code>
WITH
['text1','my+string','other+string'] as replaced,
['textone','mystringReplaced','otherstringReplaced'] as substitute,
['url-1.html','url-2.html','url-3.html?q=my+string-1','url-3.html?q=my+string-2'] as urls
select * from (
select arrayMap((x,y)->if(positionCaseInsensitive(urllTemp,x)>0,(x,replaceAll(urllTemp,replaced[toInt64(y)],substitute[toInt64(y)])),(urllTemp,urllTemp)),replaced,range(1,length(replaced)+1)) as FinalURL
from
(select arrayJoin(urls) as urllTemp)) FORMAT TabSeparatedWithNames
View this SQL in Fiddle
This query should produce the desired result:
<code>| FinalURL |
|-----------------------------------|
| url-1.html |
| url-2.html |
| url-3.html?q=mystringReplaced-1 |
| url-3.html?q=mystringReplaced-2 |
| url-4.html?q=otherstringReplaced |
| url-4.html?q=no-string |
</code>
<code>| FinalURL |
|-----------------------------------|
| url-1.html |
| url-2.html |
| url-3.html?q=mystringReplaced-1 |
| url-3.html?q=mystringReplaced-2 |
| url-4.html?q=otherstringReplaced |
| url-4.html?q=no-string |
</code>
| FinalURL |
|-----------------------------------|
| url-1.html |
| url-2.html |
| url-3.html?q=mystringReplaced-1 |
| url-3.html?q=mystringReplaced-2 |
| url-4.html?q=otherstringReplaced |
| url-4.html?q=no-string |
Can someone explain why this works and if there’s a more efficient way to achieve the same result?