據我所見,當按下語言按鈕時,該網站https://www.learnit.nl/通過向https://cdn-api-weglot.com/translate?api_key發送 POST 請求來獲取英文版本=wg_6199f2422428fc4285eb776a1ab915c08&v=1我不知道如何使用 Scrapy 進行復制。我會很感激任何幫助。
uj5u.com熱心網友回復:
資料在 API 呼叫 json 回應中使用 post 方法,其中有效負載是一個大 json 以及如何使用 Scrapy 進行復制,您可以按照下一個示例進行操作:
import json
import scrapy
class CourseSpider(scrapy.Spider):
name = 'course'
body = add payload here
def start_requests(self):
yield scrapy.Request(
url='https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1',
callback=self.parse,
body=json.dumps(self.body),
method="POST",
headers={
}
)
def parse(self, response):
response = json.loads(response.body)
for resp in response['to_words']:
yield {
'course': resp
}
輸出:
{'course': 'Writing clear texts'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML e-mail'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Basics'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML and CSS Continued'}
2022-04-28 22:03:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://cdn-api-weglot.com/translate?api_key=wg_6199f2422428fc4285eb776a1ab915c08&v=1>
{'course': 'HTML Training E-learning'}
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 1.879555,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 4, 28, 16, 3, 22, 536326),
'httpcompression/response_bytes': 36269,
'httpcompression/response_count': 1,
'item_scraped_count': 514,
... 很快
由于有效負載是一個大 json,因此無法在此處發布超出限制。完整的作業代碼在這里
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/468756.html
標籤:python-3.x 网页抓取 刮擦
上一篇:網頁抓取無序串列問題