I have string values (filed name: filename) that follow a certain template:
filename_this_is_called_Plan_A
file_this_is_Plan_A
filename2024_this_is_known_as_Plan_A
While the strings are all a different length, they all end with _Plan_A
, also known as the segment.
How can I trim this in PostgreSQL in order to always trim the string by the third to last underscore? Basically, so my table looks something like this
filename | segment |
---|---|
filename_this_is_called_Plan_A | Plan_A |
file_this_is_Plan_A | Plan_A |
I’ve tried a lot of different things, but have trouble getting that nth character.
select
filename,
right(filename, charindex('_', reverse(filename)) -1),
count(1)from table
group by 1,2
Output will typically be something like ‘A’ by itself when the desired output is Plan_A
Additionally, this can follow many naming conventions… Plan_A is example. There can be _Plan_B
, _Plan_C
, _File_C
, etc.
3
always trim the string by the third to last underscore
Looks like your really want the second to last:
SELECT substring(filename, '_([^_]+_[^_]+)$');
Or, if all trailing strings are six characters like your sample suggests, simply:
SELECT right(filename, 6);
If your file name is always after the penultimate (second to last) underscore in the string – then you could use something like below. This code will do the job (using just the basic string functions) regardless the length of the specific name and/or sufix and regardless the number of underscores before the name.
-- S a m p l e D a t a :
Create Table tbl ( ID Int, FILE_NAME Varchar(64) );
Insert Into tbl VALUES
( 1, 'filename_this_is_called_Plan_A' ),
( 2, 'file_this_is_Plan_A' ),
( 3, 'filename2024_this_is_known_as_Plan_A' ),
( 4, 'filename2024_this_could_be_Whatever_BCDE' );
Select ID, FILE_NAME,
SUBSTR( SubStr(FILE_NAME, 1, SUFIX_POSITION - 1),
Length( SubStr( FILE_NAME, 1, SUFIX_POSITION - 1 ) ) -
Position('_' In Reverse(SubStr(FILE_NAME, 1, SUFIX_POSITION - 1))) + 2
) || SUFIX as FNAME
From ( Select tbl.*,
SubSTr( FILE_NAME, Length(FILE_NAME) - Position('_' In Reverse(FILE_NAME)) + 1 ) as SUFIX,
Length(FILE_NAME) - Position('_' In Reverse(FILE_NAME)) + 1 as SUFIX_POSITION
From tbl
)
/* R e s u l t :
id file_name fname
-- --------------------------------------------- -----------------
1 filename_this_is_called_Plan_A Plan_A
2 file_this_is_Plan_A Plan_A
3 filename2024_this_is_known_as_Plan_A Plan_A
4 filename2024_this_could_be_Whatever_BCDE Whatever_BCDE */
See the fiddle here.