The following command when run in BASH cleans The Raven.
cat The_Raven.txt | gawk '{print tolower($0)}' | tr -d "!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~«»"
The following Python code uses subprocess
to clean “The Raven”.
command = "cat The_Raven.txt | gawk '{print tolower($0)}' | tr -d "!\"#$%&'()*+,-./:;<=>?@[\\]^_\`{|}~""
cleaned_text_from_command = subprocess.run(command, shell = True, capture_output = True, text = True, encoding = 'utf-8').stdout
Inserting «»
after ~
in the above Python code causes the following error.
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
How do I address this error?