模仿b站做了一个网页爬虫,但是运行错误,不知道是哪里出错了。

0 0 python u7f51u9875u722cu866b
帅气丶月半
帅气丶月半

声望值:104 0人

2019-03-11 09:06:22 提问

关注 0关注

收藏 0收藏, 209浏览

代码如下:

import requests
import re

def getHTMLText(url):
    try:
        r = requests.get(url, timeout = 1000)
        r.raise_for_status()
        r.encoding = r.apparent.encoding
        return r.text
    except: 
        return ''

def parsePage(ilt, html):
    try:
        llt = re.findall(r'\"title\"\:\".*?\"',html)
        plt = re.findall(r'\"initialPrice\"\:\"[\d\.]*\"',html)
        for i in range (len(plt)) :
            location = eval(llt[i].split(':')[1])
            price = eval(plt[i].split(':')[1])
            ilt.append([location, price])
    except:
        print('')

def printGoodsList(ilt):
    tlpt = '{:4}\t{:20}\t{:8}'
    print(tlpt.format('序号','房子','价格'))
    count = 0
    for g in ilt:
        count = count + 1
        print(tlpt.format(count,g[0],g[1]))
  
def main():
    depth = 30
    start_url = 'https://sf.taobao.com/list/50025969__1___%BA%BC%D6%DD.htm?spm=a213w.7398504.pagination.1.Hn2fOe&auction_start_seg=-1'
    infoList = []
    for i in range(2,1,depth):
        try:
            url = start_url + '&page=' + str(i)
            html = getHTMLText(url)
            parsePage(infoList, html)
        except:
            continue
    printGoodsList(infoList)

main( )

运行结果如下图
图片描述

请先 登录 后评论

0个回答

注册新账号

悬赏追问
10
  • 10
  • 20
  • 50
  • 100
  • 200
  • 输入数值
发布追问