I have a table in a PostgreSQL DB and I am trying to extract a substring with a certain pattern from a string as a result of extracting specific data from a jsonb column. The string of interest is extracted by using the following command (I include all the details in case of some side-effect that I didn’t notice):
SELECT (e.content #> '{RefOrder, Ordertxt}')::text
FROM cistec.kis_t_event e
WHERE kat::text = 'MEDI'::text
AND (sender='NEO_FK' OR sender='PAED_IPS') AND dat >= '2023-01-01' AND dat <= '2023-12-31' AND LEFT(patnr, 1) <> 'T'
AND patnr NOT IN ('381227', '381234', '381235', '225328') AND fkey NOT ILIKE '%IF%'
The problem is that I want to extract a substring with a certain pattern using a regular expression, but I don’t obtain the same result in two cases that for me they should be equivalent (obviously they are not and I need to understand why):
Case 1: In the first case, I pick one particular value from one of the records as result of the above query.
SELECT (regexp_matches('1. Res. Fieber, Schmerz: max. 1 Stk max. 4x/24h, min. Intervall 6h rektal; 20- (25) mg/kg/Dosis; NG 10-15mg/kg/Dosis; max. 4x/d; 20 mg/kg', '(d+(.d+)?s?(g|mg|ucg|mmol)/kgw*(/(h|d))?)$'))[1]
Result: 20 mg/kg
as it should be.
Case 2: In the second case, I apply the following CTE to the concerning table from the DB.
SELECT
(e.content #> '{RefOrder, Ordertxt}')::text,
regexp_matches((e.content #> '{RefOrder, Ordertxt}')::text, '(d+(.d+)?s?(g|mg|ucg|mmol)/kgw*(/(h|d))?)$')[1]
FROM cistec.kis_t_event e
WHERE kat::text = 'MEDI'::text
AND (sender='NEO_FK' OR sender='PAED_IPS') AND dat >= '2023-01-01' AND dat <= '2023-12-31' AND LEFT(patnr, 1) <> 'T'
AND patnr NOT IN ('381227', '381234', '381235', '225328') AND fkey NOT ILIKE '%IF%'
Result: empty
.
But even weirder: if I remove the dollar sign ($
) to the regex I obtain the expected result in the corresponding line:
SELECT
(e.content #> '{RefOrder, Ordertxt}')::text,
(regexp_matches((e.content #> '{RefOrder, Ordertxt}')::text, '(d+(.d+)?s?(g|mg|ucg|mmol)/kgw*(/(h|d))?)'))[1]
FROM cistec.kis_t_event e
WHERE kat::text = 'MEDI'::text
AND (sender='NEO_FK' OR sender='PAED_IPS') AND dat >= '2023-01-01' AND dat <= '2023-12-31' AND LEFT(patnr, 1) <> 'T'
AND patnr NOT IN ('381227', '381234', '381235', '225328') AND fkey NOT ILIKE '%IF%'
Result: 15mg/kg
.
But the problem is that I want to obtain the substring when it is the last part of the string (so 20 mg/kg
not 15mg/kg
. Hence, the use of $
to capture that feature.
Do you know why this behavior?