使用beautifulsoup從帶有ID的網站上抓取表格-有解無憂

我在抓取這個網站的表格時遇到了問題，我應該得到標題，但我得到了

AttributeError: 'NoneType' object has no attribute 'tbody'

我對網路抓取有點陌生，所以如果你能幫助我，那就太好了

import requests
from bs4 import BeautifulSoup

URL = "https://www.collincad.org/propertysearch?situs_street=Willowgate&situs_street_suffix" \
      "=&isd[]=any&city[]=any&prop_type[]=R&prop_type[]=P&prop_type[]=MH&active[]=1&year=2021&sort=G&page_number=1"

s = requests.Session()

page = s.get(URL)
soup = BeautifulSoup(page.content, "lxml")

table = soup.find("table", id="propertysearchresults")
table_data = table.tbody.find_all("tr")

headings = []
for td in table_data[0].find_all("td"):
    headings.append(td.b.text.replace('\n', ' ').strip())

print(headings)

uj5u.com熱心網友回復：

怎么了？

注意： 總是先看看你的湯 - 這就是真相。內容可能總是與開發工具中的視圖略有不同。

訪問權限被撤銷

您的 IP 地址已被阻止。

我們檢測到來自您的 IP 地址的對我們的屬性搜索的不規則、類似機器人的使用。設定此塊是為了減輕我們的網路服務器的壓力，以確保我們為科林縣的納稅人提供最佳的網站性能。

我們沒有阻止您下載

您應該在請求中添加一些標頭，因為該網站阻止了您的請求。在您的特定情況下，添加一個就足夠了User-Agent：

import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36' } URL = "https://www.collincad.org/propertysearch?situs_street=Willowgate&situs_street_suffix" \ "=&isd[]=any&city[]=any&prop_type[]=R&prop_type[]=P&prop_type[]=MH&active[]=1&year=2021&sort=G&page_number=1" s = requests.Session() page = s.get(URL, headers=headers) soup = BeautifulSoup(page.content, "lxml") table = soup.find("table", id="propertysearchresults") table_data = table.tbody.find_all("tr") headings = [] for td in table_data[0].find_all("td"): headings.append(td.b.text.replace('\n', ' ').strip()) print(headings)

如果添加標題，您仍然會出現錯誤，但在行中：

headings.append(td.b.text.replace('\n', ' ').strip())

你應該把它改成

headings.append(td.text.replace('\n', ' ').strip())

因為td并不總是有b.

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/401187.html
標籤：蟒蛇-3.x 网页抓取美汤

上一篇：Selenium(python)如何最好地處理頁面例外
下一篇：使用BeautifulSoup從類中提取一個元素