I have a JSON which contains escaped Unicode characters. For example:
{
"description": "This is an ellipsis: u2026"
}
The JSON is parsed with Jackson. At a later stage, the strings are converted into bytes for a ISO-8859-15/Latin9 platform:
final byte[] d = description.getBytes(Charset.forName("ISO-8859-15"));
Obviously, the ellipsis character (…) is not ISO-8859-15/Latin9 character set (see https://www.charset.org/charsets/iso-8859-15).
I am looking for a way to convert non-supported Unicode characters to a sensible ISO-8859-15/Latin9-supported character or set of characters. Here, I would expect three dots.
Examples of other characters which are present in the input and an expected counterpart:
u2013 -> – -> -
u2018 -> ‘ -> '
u2019 -> ’ -> '
u201c -> “ -> "
u201d -> ” -> "
u2022 -> • -> .
Ideally, this is done without having to enumerate all possible inputs and outcomes. That is, not by myself, as I don’t want to maintain a rather extensive mapping table.
Is there a JDK class or external library out there which can do the conversion?
Friet Stoofvlees is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.