【爬蟲案例】採集北上廣深天氣數據

爬取網站：https://tianqi.2345.com/

爬取目標：採集北上廣深2020~2022年每天的天數數據，包括“最高溫”，“最低溫”，“天氣”，“風力風曏”，“空氣質量指數”，竝存儲在CSV文件中。

歷史天氣數據：

北京：https://tianqi.2345.com/wea_history/54511.htm
上海：https://tianqi.2345.com/wea_history/58362.htm
廣州：https://tianqi.2345.com/wea_history/59287.htm
深圳：https://tianqi.2345.com/wea_history/59493.htm

爬取代碼：

import csv
import requests
import time
from bs4 import BeautifulSoup

with open(r'.\北上廣深歷史天氣.csv', mode='w ', newline='', encoding='utf-8') as f:
    csv_writer = csv.writer(f)
    csv_writer.writerow(['城市', '日期', '最高溫', '最低溫', '天氣', '風力風曏', '空氣質量指數'])
    city_dict = {'北京': 54511, '上海': 58362, '廣州': 59287, '深圳': 59493}
    for city in city_dict:
        time.sleep(1000)
        for year in range(2020, 2023):
            for month in range(1, 13):
                url = f'https://tianqi.2345.com/Pc/GetHistory' \
                      f'?areaInfo[areaId]={city_dict[city]}' \
                      f'&areaInfo[areaType]=2&date[year]={year}' \
                      f'&date[month]={month}'
                response = requests.get(url=url)
                json_data = response.json()
                html_data = json_data['data']
                page = BeautifulSoup(html_data, "html.parser")
                table = page.find("table", attrs={"class": "history-table"})
                trs = table.find_all("tr")
                for it in trs[1:]:
                    td = it.find_all('td')
                    e1 = td[0].text  # 日期
                    e2 = td[1].text  # 最高溫
                    e3 = td[2].text  # 最低溫
                    e4 = td[3].text  # 天氣
                    e5 = td[4].text  # 風力風曏
                    e6 = td[5].text  # 空氣質量指數
                    lst = [city, e1, e2, e3, e4, e5, e6]
                    print(lst)
                    csv_writer.writerow(lst)