我想要實作的是縮短完成抓取程序所需的時間并將所有資料存盤在字典中(字典是Untiters
鍵是用戶名,值是用戶使用特定名稱發布帖子的次數)我將此站點用作教程,但我無法弄清楚如何實作我的代碼中解釋的內容。這是代碼,對不起,如果我提供了不必要的大部分代碼。
from multiprocessing import Pool
import requests
from bs4 import BeautifulSoup
z = 0
Untitleds = ["Sin título","Untitled","Sans titre","?simsiz","Ohne Titel","??? ?????",
"Без названия","無標題","夕イトルなし"]
Untiters = {}
Untits = []
x = 138
for i in range(1,20):
y = x 1
x = y
Id = y
link = "https://folioscope.co/blank/" str(Id)
Url = (link)
R = requests.get(Url)
Soup = BeautifulSoup(R.text,"html5lib")
Pretitle = (Soup.find("div",{"class":"container_padding"}))
Title = Pretitle.div.text
if Title in (Untitleds):
Prename = Soup.find("div",{"class":"padding_bottom_normal"})
Name = Prename.a.text
Untitled = z 1
z = Untitled
if Name not in Untiters:
Untiters.update({Name : 1})
else:
c0 = Untiters[Name]
c1 = c0 1
Untiters[Name] = c1
Untits.append(Title)
print (Title, Name)
uj5u.com熱心網友回復:
要用于multiprocessing.Pool
從站點獲取資料,您可以使用以下示例:
from multiprocessing import Pool
import requests
from bs4 import BeautifulSoup
def get_data(id_):
url = "https://folioscope.co/blank/" str(id_)
soup = BeautifulSoup(requests.get(url).content, "html.parser")
title = soup.select_one("#animation_container .title") or ""
if title:
title = title.text
username = soup.select_one(".username") or ""
if username:
username = username.text
return id_, title, username
if __name__ == "__main__":
with Pool() as pool:
for id_, title, username in pool.imap_unordered(
get_data, range(138, 158)
):
if title and username:
print("{:<4} {:<40} {}".format(id_, title, username))
# here you can add the result to list, filter duplicates etc.
印刷:
153 First attempt CyberAly
149 Minecraft Loop MisterD
142 An Idea! Pyro
148 Untitled szymun
152 Thunder dpknyk1993
139 Untitled WoopDeDoo
146 Untitled szymun
144 Loop pjrd
138 Blink fairyfina
140 Test sknob
154 Dragon Ball kameha piedicmolkok
157 Boom animation33
156 Tree in wind CyberAly
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/470136.html
上一篇:從實作Runnable的類訪問TextView-AndroidStudio
下一篇:獲取目錄大小的最快方法是什么?