我正在嘗試使用正則運算式從具有編號 headers 的檔案部分之間獲取文本。該檔案有一個目錄和節標題,節的編號中帶有句點。例如: 1. Introduction, 1.1 Something, 1.1.1 Something Else 我能夠很好地決議 TOC 并獲得節號(1.1、1.1.1 等),但未能嘗試決議這兩個數字之間的檔案。
考慮以下內容(假設檔案文本只是一個大字串):
1.1 Introduction
There are some sentences in here that I want and I want to do other things with them. There could be hundreds of sentences, who cares.
1.1.1 Something Else
This is where we talk about something else in life.
...
5.1.1 Conclusion
例如,我嘗試了以下方法來獲取 1.1 和 1.1.1 之間的文本,以及一些這樣的變體,似乎卡住了。
(?s)1\.0(.*)1\.1
如果檔案中唯一的內容是第 1.0 節和第 1.1 節,則此方法有效,但由于我沒有那種奢侈……非常感謝任何幫助。
uj5u.com熱心網友回復:
使用re.split
如下所示的正則運算式來拆分數字。
^\d (?:\.\d )*
這匹配一個或多個數字\d
,后跟零個或多個子模式,句點后跟一個或多個數字(?:\.\d )*
。
結果串列的專案是數字之間的文本,包括標題行本身的文本。
如果您也需要節號,請在正則運算式中使用捕獲模式(在已有的內容周圍添加括號)。然后,該串列將包含節號和它們之間的文本。偶數項是中間的文本,奇數項是節號。
uj5u.com熱心網友回復:
您可以使用 2 個捕獲組和一個負前瞻來匹配所有不以數字和點開頭的行:
^\d (?:\.\d ) \b(.*)((?:\n(?!\d \.\d).*)*)
模式匹配:
^
字串的開始\d (?:\.\d )
匹配 1 個數字,并重復 1 次 a.
和 1 個數字\b
一個詞的邊界(.*)
捕獲組 1,匹配該行的其余部分(
捕獲組 2(?:\n(?!\d \.\d).*)*
如果換行符不是以數字和點開頭,則匹配換行符和行的其余部分
)
關閉組 2
正則運算式演示
例子
import re
pattern = r"^\d (?:\.\d ) \b(.*)((?:\n(?!\d \.\d).*)*)"
s = ("1.1 Introduction\n"
"There are some sentences in here that I want and I want to do other things with them. There could be hundreds of sentences, who cares.\n"
"1.1.1 Something Else\n"
"This is where we talk about something else in life.\n"
"...\n"
"5.1.1 Conclusion")
print(re.findall(pattern, s, re.M))
輸出
[(' Introduction', '\nThere are some sentences in here that I want and I want to do other things with them. There could be hundreds of sentences, who cares.'), (' Something Else', '\nThis is where we talk about something else in life.\n...'), (' Conclusion', '')]
uj5u.com熱心網友回復:
我不完全確定它是如何在您的 python 代碼中使用的,但這里有一個可能有幫助的正則運算式:
/([\d\.] )/g
或在 python 中:
import re
matches = re.findall("([\d\.] )", your_string)
作為解釋:
\d
表示任何數字字符 (0-9)\.
意思是字面意思.
[<multiple_things>]
指任何一個<mutliple_things>
因此,正則運算式連續匹配一個數字或句點任意次數,只要它們之間沒有任何內容。
# Some examples it would match:
1
.
1.1
1.1.1
11.1.111
1.11111111.111111
1.1.
.1
1....
....
1111
# Examples it would NOT match:
1 .1
1a.2
uj5u.com熱心網友回復:
text='''1.1 Introduction
There are some sentences in here that I want and I want to do other things with them. There could be hundreds of sentences, who cares.
1.1.1 Something Else
This is where we talk about something else in life.
...
5.1.1 Conclusion'''
for e in re.findall(r'^[^\d\.] ', text,re.MULTILINE):
print(e)
Introduction
There are some sentences in here that I want and I want to do other things with them
There could be hundreds of sentences, who cares
Something Else
This is where we talk about something else in life
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/508447.html