We have been asked to create an email address generator, that will create a normalised format (based on the alphabet used by English), given first name and last name.
So far we have a solution that works for any latin-type name, but we are stuck on how to deal with characters used in Scandinavian names. Namely: å, ø, æ.
Right now we are likely going with:
- å -> a
- ø -> o (some sources suggest oe)
- æ -> ae
We’ve looked to see if there are any accepted rules on how to ‘transliterate’ (not even sure the right term for this?) from extended latin alphabets to the basic one used by English, but haven’t found one. Can anyone point us to a good reference?
The code we are using to ‘flatten’ / ‘normalise’ the text:
function flattenText (text) {
const extendedChar = ['ß', 'Œ', 'œ'];
const englishChar = ['ss', 'oe', 'oe'];
// Deal with accents
text = text.normalize('NFD').replace(/[u0300-u036f]/g, '');
// Deal with other letter types
let newText = '';
for (let i = 0; i < text.length; i++) {
const charIdx = extendedChar.indexOf(text.charAt(i));
if (charIdx > -1) {
newText += englishChar[charIdx];
} else {
newText += text.charAt(i);
}
}
return newText;
}
In context: https://gist.github.com/ajmas/0330c14944beba6b2f291e5bda42a82b
Edit: Looks like it is just ‘Ø’ causing issues now. The normalize()
method dealt with the other cases, though would still be interested is a good reference
3