Relative Content

Tag Archive for unicodepython-unicodeunicode-normalization

How to ignore space in Sound Mark during Unicode Composition/Decomposition in Japanese text?

I have two different tables with data, in one of them Katakana-Hiragana Sound Mark is part of the previous character, in another it’s a separate symbol. I need to match values between the two tables. The Unicode Equivalence should handle these cases, but suddenly U+309B (Katakana-Hiragana Voiced Sound Mark) is decomposed into U+0020 (space) and U+3099 (Combining Katakana-Hiragana Voiced Sound Mark). The space doesn’t let me combine U+3099 with the previous character.