I recently implemented incoming emails for an application and boy, did I open the gates of hell? Since then every other day an email arrives that makes the app fail in a different way.
One of those things is emails encoded as UTF-7. Most emails come as ASCII, some of the Latin encodings, or thankfully, UTF-8.
Hotmail error messages (like email address doesn’t exist or quota exceeded) seem to come as UTF-7. Unfortunately, UTF-7 is not an encoding Ruby understands:
> "hello world".encode("utf-8", "utf-7")
Encoding::ConverterNotFoundError: code converter not found (UTF-7 to UTF-8)
> Encoding::UTF_7
=> #<Encoding:UTF-7 (dummy)>
My application doesn’t crash, it actually handles the email quite well, but it does send me a notification about the potential error.
I spent some time googling and I can’t find anyone that implemented the conversion, at least not as a Ruby 1.9.3 Encoding::Converter.
So, my question is, since I never got an email with actual content, from an actual person, in UTF-7, how relevant is that encoding? can I safely ignore it?
1
The only relevant feature of UTF-7 (over UTF-8 for example) is that it’s a 7-bit encoding, just like good old ASCII is. That means that it works over a system that is not 8-bit clean.
The only large-scale system where this even matters today is mail servers (don’t ask me why they didn’t fix this problem 10-20 years ago, most servers did, but some ostensibly still didn’t).
So: UTF-7 will only have relevancy in email systems. Everywhere else UTF-8 is the better choice.
1
Thanks to Charles Salvia’s comment, I found a method in the IMAP module that helped:
require "net/imap"
Net::IMAP.decode_utf7(mail_body)