我正在嘗試從這個網站上抓取貿易出版物:https ://www.webwire.com/IndustryList.asp
我可以毫無問題地瀏覽每個單獨的行業部分(例如,“航空公司/航空”或“汽車”),但是當我到達結果的最后一頁時,我的回圈卡住了,并且沒有進入下一個行業-環形。
我認為我也沒有遇到例外,那么當它到達最后一個可用頁面時如何結束回圈,以便它繼續到 for 回圈中的下一個專案?
import requests
from bs4 import BeautifulSoup
industries = ["AIR","AUT","LEI"]
for industry in industries:
print(industry)
print("==================")
num = 1
while True:
url = f"https://www.webwire.com/TradePublications.asp?ind={industry}&curpage={num}"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
for e in soup.select('#syndication-list li'):
print(e.get_text())
num = num 1
else:
break
uj5u.com熱心網友回復:
此示例將遍歷串列industries
并獲取所有頁面,直到最后一頁:
import requests
from bs4 import BeautifulSoup
industries = ["AIR", "AUT", "LEI"]
url = "https://www.webwire.com/TradePublications.asp?ind={}&curpage={}"
for ind in industries:
u = url.format(ind, 1)
while True:
soup = BeautifulSoup(requests.get(u).content, "html.parser")
for li in soup.select("#syndication-list li"):
print("{:<10} {}".format(ind, li.text))
next_page = soup.select_one('a:-soup-contains("Next ?")')
if next_page:
u = (
"https://www.webwire.com/TradePublications.asp"
next_page["href"]
)
else:
break
印刷:
...
LEI Women's Wear Daily/Fairchild Financial
LEI Worcester Quarterly Magazine
LEI Word Association/Econoguide Travel Books
LEI Worldwide Spa Review
LEI Worth Magazine
LEI Y Not Girl Magazine
LEI Yankee Driver
LEI Ziff Davis Media
uj5u.com熱心網友回復:
您可以使用 for 回圈和范圍函式進行分頁,如下所示:
import requests
from bs4 import BeautifulSoup
industries = ["AIR","AUT","LEI"]
for industry in industries:
# print(industry)
# print("==================")
#url = f"https://www.webwire.com/TradePublications.asp?ind={industry}&curpage=1"
#print(url)
for page in range(1,14):
print(page)
url=f'https://www.webwire.com/TradePublications.asp?ind={industry}&curpage={page}'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
for e in soup.select('#syndication-list li'):
print(e.get_text())
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/470664.html