Python：參考的URL在使用請求的網站上未正確轉換-有解無憂

我正在嘗試從 Glosbe.com 上抓取一些德陳述句子。請求的 URL 包含一些 utf-8 字符。請求完成后，網站不會將參考的字符更改為 utf-8 字符。請求的 URl 應如下所示

https://glosbe.com/de/hu/abkühlen

但是從網站請求的 URL 沒有轉換為 utf-8 并且搜索到的詞是這個

https://glosbe.com/de/hu/abkühlen/

使用的代碼：

def beautifulSoapPrepare(sourceLang,destLang,phrase):
    headers = {
            'User-Agent': 'My User Agent 1.0',
            'From': '[email protected]'  # This is another valid field
        }
    url="https://glosbe.com/" sourceLang "/" destLang "/" urllib.parse.quote(phrase) "/"
    r = requests.get(url, "lxml",headers=headers)
    soup = BeautifulSoup(r.content,features="lxml")
    return soup

這里的圖片顯示了問題。圖片中的問題

你能幫我解決這個問題嗎？我希望網站搜索德語單詞 abkühlen 而不是這個 abkühlen。

解決方案： 問題出在 URL 中。一旦我洗掉了 URL 末尾的斜杠，它就起作用了。

前：

url="https://glosbe.com/" sourceLang "/" destLang "/" urllib.parse.quote(phrase) "/"

后：

url="https://glosbe.com/" sourceLang "/" destLang "/" urllib.parse.quote(phrase)

uj5u.com熱心網友回復：

鑒于您的最終目標是獲得您正在尋找的特定單詞的翻譯，以下代碼將為您提供這一點（您最終可以對其進行分類，對其進行功能化，無論您想要什么）：

import requests
from bs4 import BeautifulSoup as bs

url = 'https://glosbe.com/de/hu/'

word = 'abkühlen'

headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
r = requests.get(url   word, headers=headers)

soup = bs(r.text, 'html.parser')
translations = soup.select('h3.translation')
for t in translations:
    print(t.get_text(strip=True))

終端列印的結果：

leh?l
h?tés
leh?t
h?v?s
h?tés
el?h?tés

可以在https://requests.readthedocs.io/en/latest/找到請求檔案

此外，BeautifulSoup 檔案位于：https ://beautiful-soup-4.readthedocs.io/en/latest/index.html

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/507422.html

標籤：Python python-3.x 网页抓取网址

上一篇：遍歷一個txt的url檔案來抓取它們

下一篇：如何使用正則運算式增加數字？