一头乱码's OffIcE: 多线程获得帖子标题和最后发表时间

2009年3月2日星期一

多线程获得帖子标题和最后发表时间

练练Py的正则

import urllib2,re,threading

from time import ctime

def RegexMatch(regmatch):

    for reg_temp in re.findall(regmatch,file):

        print reg_temp,ctime()

regex_match_element=['\s*<span\s*id="thread_\d*"><a\s*href="thread-\d*-\d*-\d*\.html">(.+?)</a></span>','<em><a\s*href="redirect\.php\?tid=\d*&goto=lastpost#lastpost">(\d*-\d*-\d*\s* \d*:\d*)</a></em>']

def main():

    global file

    f=urllib2.urlopen('http://bbs.cfan.com.cn/forum-53-1.html')

    file=f.read()

    threads=[]

    nloops=range(len(regex_match_element))

    for i in nloops:

        t=threading.Thread(target=RegexMatch,args=(regex_match_element[i],))

        threads.append(t)

    for i in nloops:

        threads[i].start()

    for i in nloops:

        threads[i].join()

    print '结束'

if __name__=='__main__':

    main()

一头乱码's OffIcE

2009年3月2日星期一

多线程获得帖子标题和最后发表时间

没有评论:

发表评论

Sruing即时动态

你看我博客了！

值得一看的博客

博客归档

一头乱码's OffIcE

2009年3月2日星期一

多线程获得帖子标题和最后发表时间

没有评论:

发表评论

Sruing即时动态

你看我博客了！

订阅

值得一看的博客

博客归档