一头乱码's OffIcE: BBS_Post_Surveillant_v1.0——用来监测新主题贴和新回复,我听说标题长看得人会多

思路:分析源代码，用正则匹配所需要的信息进行判断
分析图:

# BBS_Post_Surveillant1.0.py

import urllib2,re,threading

from time import ctime,sleep

# 下面几个变量用来作全局变量

#主要的有记录刷新前的帖子标题、最后发表日期、最后发表ID

#刷新后的帖子标题、最后发表日期、最后发表ID

first_loop=True

new_post=True

regex_match_title=[]

regex_match_date=[]

regex_match_lastpost=[]

regex_match_title_old=[]

regex_match_date_old=[]

regex_match_lastpost_old=[]

# 线程调用的函数 匹配出标题、最后发表日期、和最后发表ID

# 因为功能一样仅仅参数不一样所以用线程来节省时间

def RegexMatch(regmatch,listindex):

    global regex_match_title

    global regex_match_date

    global regex_match_lastpost

    regex_match_title=[]

    regex_match_date=[]

    regex_match_lastpost=[]

    

    # 获取网页源代码

    

    f=urllib2.urlopen('http://bbs.cfan.com.cn/forum-48-1.html')

    file_text=f.read()

    f.close()

    for reg_temp in re.findall(regmatch,file_text):

        if listindex==0:

            regex_match_title.append(reg_temp)

        elif listindex==1:

            regex_match_date.append(reg_temp)

        else:

            regex_match_lastpost.append(reg_temp)

            

# 函数的名字没起好-_-!!!实际上是用来甄选出更新的帖子的信息的

            

def Threadworking():

    global new_post

    global first_loop

    global regex_match_title

    global regex_match_date

    global regex_match_lastpost

    global regex_match_title_old

    global regex_match_date_old

    global regex_match_lastpost_old

    regex_match_element=['\s*<span\s*id="thread_\d*"><a\s*href="thread-\d*-\d*-\d*\.html".*?>(.+?)</a></span>','<em><a\s*href="redirect\.php\?tid=\d*&goto=lastpost#lastpost">(\d*-\d*-\d*\s* \d*:\d*)</a></em>','<cite>by <a href="space.php\?action=viewpro&username=.+?">(.+?)</a></cite>']

    # 创建线程以及执行

    

    threads=[]

    nloops=range(len(regex_match_element))

    for i in nloops:

        t=threading.Thread(target=RegexMatch,args=(regex_match_element,i))

        threads.append(t)

    for i in nloops:

        threads.start()

    for i in nloops:

        threads.join()

        # 对帖子的判断部分

        

    if first_loop==True:

        first_loop=False

        len_new=range(len(regex_match_title))

        for i in len_new:

            regex_match_title_old.append(regex_match_title)

            regex_match_date_old.append(regex_match_date)

            regex_match_lastpost_old.append(regex_match_lastpost)

    else:

        len_new=range(len(regex_match_title))

        len_old=range(len(regex_match_title_old))

        for i in len_new:

            for j in len_old:

                if regex_match_title==regex_match_title_old[j]:

                    new_post=False

                    if regex_match_date!=regex_match_date_old[j]:

                        print '《'+regex_match_title+'》'+' 有新回复  '+'   最后发表ID:'+regex_match_lastpost+'   最后发表日期：'+regex_match_date

                        print '==============================无与伦比的分割线=============================='

                        break

                    elif regex_match_lastpost!=regex_match_lastpost_old[j]:

                        print '《'+regex_match_title+'》'+' 有新回复'+'   最后发表ID:'+regex_match_lastpost+'   最后发表日期：'+regex_match_date

                        print '==============================无与伦比的分割线=============================='

                        break

            if new_post==True:

                print '有新主题贴：'+'《'+regex_match_title+'》'+'   最后发表ID：'+regex_match_lastpost+'   最后发表日期：'+regex_match_date

                print '==============================无与伦比的分割线=============================='

            else:

                new_post=True

        regex_match_title_old=[]

        regex_match_date_old=[]

        regex_match_lastpost_old=[]

        for i in len_new:

            regex_match_title_old.append(regex_match_title)

            regex_match_date_old.append(regex_match_date)

            regex_match_lastpost_old.append(regex_match_lastpost)

            

# 主函数不停地获取源文件的代码并交给调用函数分析 期间睡眠10秒

def main():

    while True:

        Threadworking()

        sleep(10)

if __name__=='__main__':

    main()

运行结果:

注意：自己的回复也会判断成新回复，需要的话可以把登录的ID也匹配出来判断下

郑重声明：此程序本人原创，只在本人blogger、百度博客、QQ空间和电脑爱好者官方论坛编程版发布，除此之外未经本人授权散布者均为侵权，本人保留诉讼权
授权链接: http://sruing.blogspot.com
http://hi.baidu.com/sruingking
http://bbs.cfan.com.cn/thread-840294-1-1.html

一头乱码's OffIcE

2009年3月11日星期三

BBS_Post_Surveillant_v1.0——用来监测新主题贴和新回复,我听说标题长看得人会多

没有评论:

发表评论

Sruing即时动态

你看我博客了！

值得一看的博客

博客归档