I use imap-tools
to access my emails.
My problem is that I’m trying to access emails sent from someone whose email contains special characters like ø
which I can’t encode correctly because from_
accepts a string as input, so I am not getting anywhere.
import imap_tools
with imap_tools.MailBox('imap.gmx.net').login(email, password, 'INBOX') as mailbox:
for msg in mailbox.fetch(imap_tools.AND(from_ = 'beskeder@mød.dk')):
print('Found')
I shortend my code. I am expecting my program to print Found
when the email sent by beskeder@mød.dk
is found in my mailbox. Other emails with no special characters are found.
The error message:
Traceback (most recent call last):
File "/Users/user/Desktop/test.py", line 4, in <module>
for msg in mailbox.fetch(imap_tools.AND(from_ = 'beskeder@mød.dk')):
File "/Users/user/Library/Python/3.9/lib/python/site-packages/imap_tools/mailbox.py", line 130, in fetch
nums = tuple((reversed if reverse else iter)(self.numbers(criteria, charset)))[limit_range]
File "/Users/user/Library/Python/3.9/lib/python/site-packages/imap_tools/mailbox.py", line 67, in numbers
encoded_criteria = criteria if type(criteria) is bytes else str(criteria).encode(charset)
UnicodeEncodeError: 'ascii' codec can't encode character 'xf8' in position 17: ordinal not in range(128)
I tried to add 'beskeder@hottemøder.dk'.encode('ascii', 'ignore')
but it’s not working either.
Error:
TypeError: "from_" expected str value, "<class 'int'>" received
and when I convert it to str() nothing happens.
2
A email address at the protocol level by definition only comprise of characters that are part of ASCII character set (even when internationalized email addresses is now a standard), so the library is correct in that encoding the provided string into the underlying bytes using the ascii
codec. Given that ø
does not map to one of the valid ASCII character this results in that error message.
Now, the problematic character in that email address appeared in the domain part, which indicates that the domain is in fact an IDN, and encoding scheme for IDNs into bytes are in fact is not through any of the unicode encodings but rather using Punycode representation (related SO thread). As that library does not appear to support IDN, manual encoding of the domain portion into Puncode will be needed, which would be xn--md-lka.dk
, and thus the email address that would be understood by that library would be [email protected]
.
Now this only covers the domain part of the email, but not the local part. If the local part also contains characters with code points outside of the ascii character set, they will need to be encoded into bytes using UTF-8 as per RFC 6530.
Modern email related libraries should be able to address modern requirements, but sometimes they may be slow to uptake new standards so workarounds like manually encoding parts of email address into the underlying encoding(s) may be required.