This is my first post in this community and I am excited.
Environment
I am using a Notebook in Microsoft Fabric. The language is PySpark.
Objective
I want to convert a column in RTF to plaintext.
pip install striprtf
# Import the modules needed
from striprtf.striprtf import rtf_to_text
from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType
# Wrap function in UDF
my_udf = udf(lambda x:rtf_to_text(x), StringType())
# Create new column with plaintext
df.withColumn("plaintext", my_udf(col("formattedtext")))
.show(truncate=False)
Challenge
I have installed the module striprtf, imported the needed functions and defined a UDF. Still I get an error message for the last command saying “No module named ‘striprtf'”. If I test the function “rtf_to_text” on a variable it works.
New contributor
Sutit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.