i am using python. i have a dataframe with string column “description”. in these strings i have things like: “п. 5.6.2 ГОСТ 2.114-2016”, “п. 4.1 ГОСТ 2.102-2013”, “п.5 ГОСТ Р 51672-2000” and so on. i want to remove them
I tried this code:
import pandas as pd
import re
# Sample DataFrame
data = {'description': ["п. 5.6.2 ГОСТ 2.114-2016",
"п. 4.1 ГОСТ 2.102-2013",
"п.5 ГОСТ Р 51672-2000"]}
df = pd.DataFrame(data)
# Define the regex pattern to match the specified format
pattern = r'п.s*d+(?:.d+)*s*ГОСТ(?:s*Р)?s*d+(?:-d+)*'
# Remove the matched patterns from the 'description' column
df['description'] = df['description'].apply(lambda x: re.sub(pattern, '', x))
print(df)
but result is:
description
0 .114-2016
1 .102-2013
2
what am is doing wrong?
New contributor
user24958090 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.