【爬蟲案例】採集北上廣深天氣數據
爬取網站:https://tianqi.2345.com/
爬取目標:採集北上廣深2020~2022年每天的天數數據,包括“最高溫”,“最低溫”,“天氣”,“風力風曏”,“空氣質量指數”,竝存儲在CSV文件中。
歷史天氣數據:
北京:https://tianqi.2345.com/wea_history/54511.htm 上海:https://tianqi.2345.com/wea_history/58362.htm 廣州:https://tianqi.2345.com/wea_history/59287.htm 深圳:https://tianqi.2345.com/wea_history/59493.htm
爬取代碼:
import csv
import requests
import time
from bs4 import BeautifulSoup
with open(r'.\北上廣深歷史天氣.csv', mode='w ', newline='', encoding='utf-8') as f:
csv_writer = csv.writer(f)
csv_writer.writerow(['城市', '日期', '最高溫', '最低溫', '天氣', '風力風曏', '空氣質量指數'])
city_dict = {'北京': 54511, '上海': 58362, '廣州': 59287, '深圳': 59493}
for city in city_dict:
time.sleep(1000)
for year in range(2020, 2023):
for month in range(1, 13):
url = f'https://tianqi.2345.com/Pc/GetHistory' \
f'?areaInfo[areaId]={city_dict[city]}' \
f'&areaInfo[areaType]=2&date[year]={year}' \
f'&date[month]={month}'
response = requests.get(url=url)
json_data = response.json()
html_data = json_data['data']
page = BeautifulSoup(html_data, "html.parser")
table = page.find("table", attrs={"class": "history-table"})
trs = table.find_all("tr")
for it in trs[1:]:
td = it.find_all('td')
e1 = td[0].text # 日期
e2 = td[1].text # 最高溫
e3 = td[2].text # 最低溫
e4 = td[3].text # 天氣
e5 = td[4].text # 風力風曏
e6 = td[5].text # 空氣質量指數
lst = [city, e1, e2, e3, e4, e5, e6]
print(lst)
csv_writer.writerow(lst)
最後結果:
![【爬蟲案例】採集北上廣深天氣數據,第2張 【爬蟲案例】採集北上廣深天氣數據,第2張](/img.php?pic=http://image109.360doc.com/DownloadImg/2022/12/2611/258144028_1_20221226115745510_wm.png)
一鍵三連,一起學習⬇️
0條評論