Imap Message Gets Unicodedecodeerror 'utf-8' Codec Can't Decode
After 5 hours of trying, time to get some help. Sifted through all the stackoverflow questions related to this but couldn't find the answer. The code is a gmail parser - works for
Solution 1:
Here is an example how to retrieve and read mail parts with imapclient
and the email.*
modules from the python standard libs:
from imapclient import IMAPClient
import email
from email import policy
defwalk_parts(part, level=0):
print(' ' * 4 * level + part.get_content_type())
# do something with part content (applies encoding by default)# part.get_content()if part.is_multipart():
for part in part.get_payload():
get_parts(part, level + 1)
# context manager ensures the session is cleaned upwith IMAPClient(host="your_mail_host") as client:
client.login('user', 'password')
# select some folder
client.select_folder('INBOX')
# do something with folder, e.g. search & grab unseen mails
messages = client.search('UNSEEN')
for uid, message_data in client.fetch(messages, 'RFC822').items():
email_message = email.message_from_bytes(
message_data[b'RFC822'], policy=policy.default)
print(uid, email_message.get('From'), email_message.get('Subject'))
# alternatively search for specific mails
msgs = client.search(['SUBJECT', 'some subject'])
## do something with a specific mail:## fetch a single mail with UID 12345
raw_mails = client.fetch([12345], 'RFC822')
# parse the mail (very expensive for big mails with attachments!)
mail = email.message_from_bytes(
raw_mails[12345][b'RFC822'], policy=policy.default)
# Now you have a python object representation of the mail and can dig# into it. Since a mail can be composed of several subparts we have# to walk the subparts.# walk all parts at oncefor part in mail.walk():
# do something with that partprint(part.get_content_type())
# or recurse yourself into sub parts until you find the interesting part
walk_parts(mail)
See the docs for email.message.EmailMessage. There you find all needed bits to read into a mail message.
Solution 2:
Solution 3:
I had the same issue And after a lot of research I realized that I simply need to use, message_from_bytes
function from email
rather than using message_from_string
so for your code simply replace:
raw_email_str = raw_email.decode('utf-8')
email_message = email.message_from_string(raw_email_str)
to
email_message = email.message_from_bytes(raw_email)
should work like a charm :)
Post a Comment for "Imap Message Gets Unicodedecodeerror 'utf-8' Codec Can't Decode"