我正在嘗試匹配 HTML 檔案行中的模式。
這是檔案的片段
<tr>
<td>CBB <span> • </span> CB <span> • </span> LTSB</td>
<td>2022-09-13</td>
<td>14393.5356</td>
<td><a href="https://support.microsoft.com/help/5017305" target="_blank" data-linktype="external">KB5017305</a></td>
</tr>
<tr>
<td>CBB <span> • </span> CB <span> • </span> LTSB</td>
<td>2022-08-09</td>
<td>14393.5291</td>
<td><a href="https://support.microsoft.com/help/5016622" target="_blank" data-linktype="external">KB5016622</a></td>
</tr>
<tr>
<td>CBB <span> • </span> CB <span> • </span> LTSB</td>
<td>2022-07-12</td>
<td>14393.5246</td>
<td><a href="https://support.microsoft.com/help/5015808" target="_blank" data-linktype="external">KB5015808</a></td>
</tr>
<tr>
<td>CBB <span> • </span> CB <span> • </span> LTSB</td>
<td>2022-06-14</td>
<td>14393.5192</td>
<td><a href="https://support.microsoft.com/help/5014702" target="_blank" data-linktype="external">KB5014702</a></td>
</tr>
<tr>
這是我正在運行的代碼。
with open('file.html') as htmltext:
htmldata = htmltext.readlines()
pattern = "([\r\n].*?)(?:=?\r|\n)(.*?(?:14393).*)"
for data in htmldata:
matchedx = re.search(pattern, data)
if matchedx:
print(matchedx)
正則運算式模式是匹配一個字串并回傳上一行。
在此處檢查正則運算式https://regex101.com/r/7vI31a/1會回傳匹配項,但是在 python 中運行時找不到匹配項。
在 python 中運行時,將其用作模式會回傳匹配項。
pattern = "(14393.*)"
uj5u.com熱心網友回復:
正如 jasonharper 評論的那樣,您需要將正則運算式應用于所有資料。
這對我有用:
import re
# data = open('file.html').read()
data = """<tr>
<td>CBB <span> • </span> CB <span> • </span> LTSB</td>
<td>2022-09-13</td>
<td>14393.5356</td>
<td><a href="https://support.microsoft.com/help/5017305" target="_blank" data-linktype="external">KB5017305</a></td>
</tr>
<tr>
<td>CBB <span> • </span> CB <span> • </span> LTSB</td>
<td>2022-08-09</td>
<td>14393.5291</td>
<td><a href="https://support.microsoft.com/help/5016622" target="_blank" data-linktype="external">KB5016622</a></td>
</tr>
<tr>
<td>CBB <span> • </span> CB <span> • </span> LTSB</td>
<td>2022-07-12</td>
<td>14393.5246</td>
<td><a href="https://support.microsoft.com/help/5015808" target="_blank" data-linktype="external">KB5015808</a></td>
</tr>
<tr>
<td>CBB <span> • </span> CB <span> • </span> LTSB</td>
<td>2022-06-14</td>
<td>14393.5192</td>
<td><a href="https://support.microsoft.com/help/5014702" target="_blank" data-linktype="external">KB5014702</a></td>
</tr>
<tr>"""
pattern = re.compile("([\r\n].*?)(?:=?\r|\n)(.*?(?:14393).*)")
matches = re.findall(pattern, data)
for match in matches:
print(match)
哪個列印:
('\n<td>2022-09-13</td>', '<td>14393.5356</td>')
('\n<td>2022-08-09</td>', '<td>14393.5291</td>')
('\n<td>2022-07-12</td>', '<td>14393.5246</td>')
('\n<td>2022-06-14</td>', '<td>14393.5192</td>')
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/517299.html
標籤:Python正则表达式