I am faced with an issue. I occur on some strings which use some fancy Unicode styling, as follows:
<code>???????????????????? ????????????????????
???????????????????? ????????????????????
???????????????????? ????????????????????
</code>
<code>???????????????????? ????????????????????
???????????????????? ????????????????????
???????????????????? ????????????????????
</code>
???????????????????? ????????????????????
???????????????????? ????????????????????
???????????????????? ????????????????????
And I would like to turn that to normal text, Lorem Ipsum
. However, encoding that to ascii does not work for me, since I want to keep legitimate text, for ex. to preserve languages not using the roman alphabet.
My initial idea was to have some dictionary to match every single unwanted character to a nice one. However, that’s quite inelegant and inefficient. I was wondering if anyone had a better solution or some tool that does that.
I am currently working in python.