Background
We have a system created for a client that uses a Logic App to process some files and pass them to an API which stores the files in blob storage and the file metadata in a database.
When passing files to the API directly, the filename is stored in the database and subsequent requests for the file return the filename exactly as it was uploaded. This includes characters with diacritics etc.
Issue
When using the Logic App, I can see in the run details of each step that the filename is as it was uploaded and it seems this is what is passed to the API.
However, requesting that info from the API shows these extended characters as a question mark ?
.
We had similar issues with clients uploading text files that were not encoded with UTF-8, but we pointed this out and it was resolved.
However, we’ve come across a similar problem with PDFs with extended characters in the filename.
Now, as PDFs are not text files, they don’t use text encoding – but what about the filename itself?
Is there any way to determine the encoding used for a filename?
Is it dependent on the filesystem that created the file?
I’ve done a bunch of digging online and can’t find anything specific to filename text encodings. And the client is grumbling about their filenames with missing umlauted letters.
Update
Logic App showing the steps to set various bits of info (filename, category etc), then a step to get the blob content, then an HTTP step to send it to the API:
(NB: ‘quote’ is simply double quotes. We were having issues getting this working until someone pointed out this handy trick.)
If we run it and look at the logs, we see:
And in the DB at the other end, once processed by the API:
The second row from the DB in this screenshot is uploading the same file directly to the API via Postman. The filename is different because it’s prefixed with other info that the Logic App extracts and passes as other values.
That is irrelevant though – what is clear is that if the Logic App grabs the file from Blob Storage, when it calls the API, any characters with diacritics in the title end up as ?
in the DB whereas if uploaded to the API directly diacritics are retained in the DB.
8