I have a text file of several thousand URLs that I need to truncate or trim with regex. I am using BBEdit as a text editor as it has a great regex find/replace function.
This is an example of one of the URLs:
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUk2LfEXvKMZ48tpWUR607L5y_TRn-lXyajH_tJBOeWPqNFmfU1UV7pKginB78MHnuGS-luzq-RCIj1Z6rJ2y8VE3P93gIGeN_ZMjFii1Vnb2wZMnbyLTH241UTuu8kcvMZHFii1Vnb2wZMnbyLTH241gaZGDlgWTfx4EVdAlNFncc2XZJNz0fE0-JK1iDP7WgLEJWNg/w640-h196/Oscar.png
I need to truncate/trim the longest subdirectory path, i.e., which is this:
/AVvXsEhUk2LfEXvKMZ48tpWUR607L5y_TRn-lXyajH_tJBOeWPqNFmfU1UV7pKginB78MHnuGS-luzq-RCIj1Z6rJ2y8VE3P93gIGeN_ZMjFii1Vnb2wZMnbyLTH241UTuu8kcvMZHFii1Vnb2wZMnbyLTH241gaZGDlgWTfx4EVdAlNFncc2XZJNz0fE0-JK1iDP7WgLEJWNg/
What I need to do is truncate or trim that one subdirectory path to the leading /AVvXsE
and include the next 20 characters to the right.
i.e., this is what I need as a result:
/AVvXsEhUk2LfEXvKMZ48tpWUR6/
so the resulting full URL path is this:
https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUk2LfEXvKMZ48tpWUR6/w640-h196/Oscar.png
The first six characters of the URL /AVvXsE
are the same in all the URLs I need to truncate/trim. I need the next 20 characters to the right of the /AVvXsE
to create unique paths because I can see that other subdirectories for the image files, i.e. w640-h196
, are used for many other images.
How can I do this with Regex? Or is Regex not the best way to do this? What about sed?
Regex Fiddle: https://regex101.com/r/W2t82Z/1