I have a peculiar issue with reading a csv file from pandas, the file would look something like:
col1 col2 col3rn
1 this is a place for some text 50rn
2 this is some other n text 100rn
3 and finally n a bit more n text 150rn
Note that, as a windows standard, the lineterminators are rn, while the data sometimes contains n. The separator between the columns is a tab.
I have a code that looks something like this:
df = pd.read_csv(io.StringIO(str(file_content_binary,'utf-8')), sep = 't', lineterminator = 'r', quoting=csv.QUOTE_NONE)
For the most part this works I only need to clean up the extra n in the first column post-processing, as pd.read_csv doesn’t support multi character line terminators. This looks something like so:
data[col1] = data[col1].str.lstrip('n')
data = data[data[col1] != '']
Now, the above works as expected when testing locally on windows (via VS Code), however, when I deploy it to Azure Function Apps, the code breaks without an error (I have tried to get one without luck).
The only problem I could foresee happening is that Function App uses Linux to run the code, and I found that Linux uses n as line terminators rather than rn.
Would anyone have any idea as to how I could process these files correctly while still keeping the n in the column values? Additionally it would be great if I could avoid the funky lstrip post-processing.