使用Python下载邮件(pop/imap)
最近实现了一个使用Python自带的poplib和imaplib库下载邮件的程序,本文将对这个程序进行介绍。详细的模块介绍参见Python的官方文档(poplib,imaplib)。
1. 程序源码
本程序的实现流程:
- 1. 设置邮箱地址、密码、邮件服务器、传输协议(pop/imap)、是否使用SSL及输出目录,并对其进行解析。(为方便起见,上述设置本文使用了硬编码的方式,请根据自己的需求修改输入来源)
- 2. 根据邮箱地址创建输出目录。
- 3. 根据传输协议选择pop/imap对应的执行函数,读取邮件信息,解码并保存。
- 4. 如果目录为空或异常退出,删除输出目录。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 |
#! /usr/bin/env python # -*- coding: utf-8 -*- import os import re import time import email import poplib import imaplib import cStringIO from hashlib import md5 # Configuration # ------------- # Email address MAILADDR = "username@163.com" # Email password PASSWORD = "password" # Mail Server (pop/imap) SERVER = "pop.163.com" # Transfer protocol (pop3/imap4) PROTOCOL = "pop3" # Use SSL? (True/False) USE_SSL = True # Main output direcotory OUTDIR = "result" # Static variable # --------------- # Default port of each protocol DEFAULT_PORT = { "pop3": {False: 110, True: 995}, "imap4": {False: 143, True: 993}, } # Function # -------- def exit_script(reason, e=""): """Print error reason and exit this script :param reason: exit error reason :param e: exception """ # Print exit string exit_str = "[-] {0}".format(reason) if e: exit_str += " ({0})".format(e) print(exit_str) # Remove result path remove_dir(result_path) # Exit script print("[-] Fetch email failed!") exit(-1) def parse_protocol(protocol): """Parse transfer protocol :param protocol: transfer protocol :return: handled protocol """ if protocol in ["pop", "pop3"]: return "pop3" elif protocol in ["imap", "imap4"]: return "imap4" else: exit_script("Parse protocol failed: {0}".format(protocol)) def parse_server(server, use_ssl, protocol): """Change server to host and port. If no port specified, use default value :param server: mail server (host, host:port) :param use_ssl: True if use SSL else False :param protocol: transfer protocol (pop3/imap4) :return: host and port """ if not server: exit_script("No available server") server_item = server.split(":") server_item_len = len(server_item) if server_item_len > 2: exit_script("Too many colons in server: {0}".format(server)) try: host = server_item[0] port = DEFAULT_PORT[protocol][use_ssl] if server_item_len == 1 else int(server_item[1]) except BaseException as e: exit_script("Parse server format failed: {0}".format(server), e) return host, port def create_dir(result_path): """Create output directory if not exist :param result_path: main result path """ try: if not os.path.exists(result_path): os.mkdir(result_path) print("[*] Create directory {0} successfully".format(result_path)) else: if os.path.isfile(result_path): exit_script("{0} is file".format(result_path)) else: print("[*] Directory {0} has already existed".format(result_path)) except BaseException as e: exit_script("Create directory {0} failed".format(result_path), e) def remove_dir(result_path): """Remove output directory if no file in this directory :param result_path: main result path """ try: if os.path.isdir(result_path): if len(os.listdir(result_path)) == 0: os.rmdir(result_path) print("[*] Remove directory {0} successfully".format(result_path)) else: print("[*] Directory {0} is not empty, no need remove".format(result_path)) else: print("[*] No directory {0}".format(result_path)) except BaseException as e: print("[-] Remove directory {0} failed: {1}".format(result_path, e)) def protocol_manager(protocol, host, port, usr, pwd, use_ssl): """Choose handle function according to transfer protocol :param protocol: transfer protocol (pop3/imap4) :param host: host :param port: port :param usr: username :param pwd: password :param use_ssl: True if use ssl else False """ import __main__ if hasattr(__main__, protocol): getattr(__main__, protocol)(host, port, usr, pwd, use_ssl) else: exit_script("Wrong protocol: {0}".format(protocol)) def pop3(host, port, usr, pwd, use_ssl): """Pop3 handler :param host: host :param port: port :param usr: username :param pwd: password :param use_ssl: True if use SSL else False """ # Connect to mail server try: conn = poplib.POP3_SSL(host, port) if use_ssl else poplib.POP3(host, port) conn.user(usr) conn.pass_(pwd) print("[+] Connect to {0}:{1} successfully".format(host, port)) except BaseException as e: exit_script("Connect to {0}:{1} failed".format(host, port), e) # Get email message number try: msg_num = len(conn.list()[1]) print("[*] {0} emails found in {1}".format(msg_num, usr)) except BaseException as e: exit_script("Can't get email number", e) # Get email content and attachments for i in range(1, msg_num+1): print("[*] Downloading email {0}/{1}".format(i, msg_num)) # Retrieve email message lines, and write to buffer try: msg_lines = conn.retr(i)[1] buf = cStringIO.StringIO() for line in msg_lines: print >> buf, line buf.seek(0) except BaseException as e: print "[-] Retrieve email {0} failed: {1}".format(i, e) continue # Read buffer try: msg = email.message_from_file(buf) except BaseException as e: print "[-] Read buffer of email {0} failed: {1}".format(i, e) continue # Parse and save email content/attachments try: parse_email(msg, i) except BaseException as e: print("[-] Parse email {0} failed: {1}".format(i, e)) # Quit mail server conn.quit() def imap4(host, port, usr, pwd, use_ssl): """Imap4 handler :param host: host :param port: port :param usr: username :param pwd: password :param use_ssl: True if use SSL else False """ # Connect to mail server try: conn = imaplib.IMAP4_SSL(host, port) if use_ssl else imaplib.IMAP4(host, port) conn.login(usr, pwd) print("[+] Connect to {0}:{1} successfully".format(host, port)) except BaseException as e: exit_script("Connect to {0}:{1} failed".format(host, port), e) # Initial some variable list_pattern = re.compile(r'\((?P<flags>.*?)\) "(?P<delimiter>.*)" (?P<name>.*)') download_num = 0 download_hash = [] # Get all folders try: type_, folders = conn.list() except BaseException as e: exit_script("Get folder list failed", e) for folder in folders: # Parse folder info and get folder name try: flags, delimiter, folder_name = list_pattern.match(folder).groups() folder_name = folder_name.strip('"') print "[*] Handling folder: {0}".format(folder_name) except BaseException as e: print "[-] Parse folder {0} failed: {1}".format(folder, e) continue # Select and search folder try: conn.select(folder_name, readonly=True) type_, data = conn.search(None, "ALL") except BaseException as e: print "[-] Search folder {0} failed: {1}".format(folder_name, e) continue # Get email number of this folder try: msg_id_list = [int(i) for i in data[0].split()] msg_num = len(msg_id_list) print "[*] {0} emails found in {1} ({2})".format(msg_num, usr, folder_name) except BaseException as e: print "[-] Can't get email number of {0}: {1}".format(folder_name, e) continue # Get email content and attachments for i in msg_id_list: print "[*] Downloading email {0}/{1}".format(i, msg_num) # Get email message try: type_, data = conn.fetch(i, "(RFC822)") msg = email.message_from_string(data[0][1]) except BaseException as e: print "[-] Retrieve email {0} failed: {1}".format(i, e) continue # If message already exist, skip this message try: msg_md5 = md5(data[0][1]).hexdigest() if msg_md5 in download_hash: print "[-] This email has been downloaded in other folder" continue else: download_hash.append(msg_md5) download_num += 1 except BaseException as e: print "[-] Parse message md5 failed: {0}".format(e) continue # Parse and save email content/attachments try: parse_email(msg, download_num) except BaseException as e: print "[-] Parse email {0} failed: {1}".format(i, e) # Logout this account conn.logout() def parse_email(msg, i): """Parse email message and save content & attachments to file :param msg: mail message :param i: ordinal number """ global result_file # Parse and save email content and attachments for part in msg.walk(): if not part.is_multipart(): filename = part.get_filename() content = part.get_payload(decode=True) if filename: # Attachment # Decode filename h = email.Header.Header(filename) dh = email.Header.decode_header(h) filename = dh[0][0] result_file = os.path.join(result_path, "mail{0}_attach_{1}".format(i, filename)) else: # Main content result_file = os.path.join(result_path, "mail{0}_text".format(i)) try: with open(result_file, "wb") as f: f.write(content) except BaseException as e: print("[-] Write file of email {0} failed: {1}".format(i, e)) if __name__ == "__main__": print("[*] Start download email script") start_time = time.time() mailaddr = MAILADDR password = PASSWORD server = SERVER protocol = PROTOCOL use_ssl = USE_SSL outdir = OUTDIR result_path = os.path.join(OUTDIR, mailaddr) protocol = parse_protocol(protocol) host, port = parse_server(server, use_ssl, protocol) create_dir(result_path) protocol_manager(protocol, host, port, mailaddr, password, use_ssl) remove_dir(result_path) end_time = time.time() exec_time = end_time - start_time print("[*] Finish download email of {0} in {1:.2f}s".format(mailaddr, exec_time)) |
下面将对其中的核心函数(pop3、imap4、parse_email)进行详细讲解。
2. 核心函数讲解
2.1 pop3
poplib.POP3_SSL(host, port)
和poplib.POP3(host, port)
:连接pop3服务器。
conn.user(usr)
:设置用户名。
conn.pass_(pwd)
:设置密码。
len(conn.list()[1])
:取得邮箱中邮件的个数。
conn.retr(i)[1]
:获取邮件内容。
2.2 imap4
imaplib.IMAP4_SSL(host, port)
和imaplib.IMAP4(host, port)
:连接imap4服务器。
conn.login(usr, pwd)
:用户登录。
conn.list()
:取得邮箱中所有文件夹的信息。每行信息包含标识、分隔符、文件夹名称三项,文件夹名称为UTF-7编码(如需转码请自行查找实现方法,如:某实现)。
conn.select(folder_name, readonly=True)
:选择文件夹,其中readonly选项表示只读。如果不指定文件夹名称,则默认为INBOX。
conn.search(None, "ALL")
:查询选定文件夹下的邮件。第一个参数为字符集,一般设为None。第二个参数“ALL”为查询条件,其详细用法可参见这篇文章。其返回值是邮件的编号。
conn.fetch(i, "(RFC822)")
:使用RFC822协议获取编号为i的邮件的内容。
注意:在本实现中,每次下载邮件时会计算md5值,如果重复(不同文件夹均有该邮件)则不再下载该邮件。
2.3 parse_email
msg.walk()
:遍历该邮件的每个子部分。
part.get_filename()
:获取该部分的文件名(有则为附件,否则为邮件正文)。
part.get_payload(decode=True)
:获取邮件内容,并解码。
email.Header.Header(filename)
:获取邮件附件的头信息。
email.Header.decode_header(h)
:对头信息进行解码,获得解码后的文件名。
本文内容遵从CC3.0版权协议,转载请注明:转自Pythoner
本文链接地址:使用Python下载邮件(pop/imap)
貌似代码挺长,略复杂。。
() “/” “INBOX”
(‘OK’, [‘() “/” “INBOX”‘, ‘(\\Drafts) “/” “&g0l6P3ux-“‘, ‘(\\Sent) “/” “&XfJT0ZAB-“‘, ‘(\\Trash) “/” “&XfJSIJZk-“‘, ‘(\\Junk) “/” “&V4NXPpCuTvY-“‘, ‘() “/” “&Xn9USpCuTvY-“‘, ‘() “/” “&i6KWBZCuTvY-“‘])
[*] Handling folder: INBOX
333
[-] Search folder INBOX failed: command SEARCH illegal in state AUTH, only allowed in states SELECTED
有人怀疑163只能用邮箱大师收,不能用其他客户端收。
163 邮箱今天屏蔽了所有 “未知” 客户端的邮件收发功能
http://www.v2ex.com/t/152504
不是这样的,你是需要在邮箱客户端里面进行设置的。我的163邮箱进行设置后就能收发邮件了
端口定义中IMAP的键名写错了,应该为imap4:
DEFAULT_PORT = {
“pop3”: {False: 110, True: 995},
“imap3”: {False: 143, True: 993},
}
源代码中为imap3,使用imap收取邮件会报错的.
谢谢,确实是
有没有更高效的方式验证已下载的邮件,例如用邮件的唯一性标识,而不是抓取邮件类容用md5来验证,
可以看下IMAP4.uid方法(https://docs.python.org/2/library/imaplib.html#imaplib.IMAP4.uid)
正文有图片怎么处理?