一、爬蟲物件-豆瓣讀書TOP250
二、python爬蟲代碼講解
三、講解視頻
四、完整原始碼

一、爬蟲物件-豆瓣讀書TOP250

今天我們分享一期python爬蟲案例講解，爬取物件是，豆瓣讀書TOP250排行榜資料：
https://book.douban.com/top250
? 豆瓣網頁

開發好python爬蟲代碼后，爬取成功后的csv資料，如下：
? 結果資料

代碼是怎樣實作的爬取呢？下面逐一講解python實作，

二、python爬蟲代碼講解

首先，匯入需要用到的庫：

import requests  # 發送請求
from bs4 import BeautifulSoup  # 決議網頁
import pandas as pd  # 存取csv
from time import sleep  # 等待時間

然后，向豆瓣讀書網頁發送請求：

res = requests.get(url, headers=headers)

利用BeautifulSoup庫決議回應頁面：

soup = BeautifulSoup(res.text, 'html.parser')

用BeautifulSoup的select函式，（css決議的方法）撰寫代碼邏輯，部分核心代碼：

name = book.select('.pl2 a')[0]['title']  # 書名
book_name.append(name)
bkurl = book.select('.pl2 a')[0]['href']  # 書籍鏈接
book_url.append(bkurl)
star = book.select('.rating_nums')[0].text  # 書籍評分
book_star.append(star)
star_people = book.select('.pl')[1].text  # 評分人數
star_people = star_people.strip().replace(' ', '').replace('人評價', '').replace('(\n', '').replace('\n)',
                                                                                                 '')  # 資料清洗
book_star_people.append(star_people)

最后，將爬取到的資料保存到csv檔案中：

def save_to_csv(csv_name):
	"""
	資料保存到csv
	:return: None
	"""
	df = pd.DataFrame()  # 初始化一個DataFrame物件
	df['書名'] = book_name
	df['豆瓣鏈接'] = book_url
	df['作者'] = book_author
	df['譯者'] = book_translater
	df['出版社'] = book_publisher
	df['出版日期'] = book_pub_year
	df['價格'] = book_price
	df['評分'] = book_star
	df['評分人數'] = book_star_people
	df['一句話評價'] = book_comment
	df.to_csv(csv_name, encoding='utf8')  # 將資料保存到csv檔案

其中，把各個list賦值為DataFrame的各個列，就把list資料轉換為了DataFrame資料，然后直接to_csv保存，
這樣，爬取的資料就持久化保存下來了，

三、講解視頻

同步講解視頻：https://www.zhihu.com/zvideo/1464515550177546240

四、完整原始碼

附完整源代碼：【python爬蟲案例】利用python爬蟲爬取豆瓣讀書TOP250的資料！

我是 @馬哥python說，持續分享python原始碼干貨中！

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/556302.html

標籤：其他

上一篇：【后端面經-Java】公平鎖和加鎖流程

下一篇：返回列表

【python爬蟲案例】用python爬豆瓣讀書TOP250排行榜！

一、爬蟲物件-豆瓣讀書TOP250

二、python爬蟲代碼講解

三、講解視頻

四、完整原始碼