最近有了点爬取热搜新闻的需求,找个微博热搜热热身吧。

一页应该是50个,就实时爬取top50

使用的库:

requests、bs4

完整代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
import requests
import bs4

url = "https://s.weibo.com/top/summary"

headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36',
'Accept-Language': 'zh-CN,zh;q=0.9',
}

response = requests.get(url=url,headers=headers)
htmlText = response.text
soup = bs4.BeautifulSoup(htmlText,"html.parser")
newList = soup.find_all("td",attrs={"class":"td-02"})
num = 0
for i in newList[1:]:
data = i.contents[1].contents[0]
title = i.contents[3].contents[0]
num+=1
print(str(num)+"、",data,title)

爬取效果: