使用Regex/VBA獲取每個以點結尾的單詞-有解無憂

我正在使用 excel 2019，我正在嘗試從一堆亂七八糟的文本單元格中提取任何（最多 5 個）以 . 之后的點結尾的單詞。

這是我要決議/清理的文本示例

some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan.

我希望得到這個： ost. ult. lot. sino. collan.

我正在使用在互聯網上某個地方找到的這個功能，它似乎可以完成這項作業：`

Public Function RegExtract(Txt As String, Pattern As String) As String

With CreateObject("vbscript.regexp")
    '.Global = True
    .Pattern = Pattern
    If .test(Txt) Then
        RegExtract = .Execute(Txt)(0)
    Else
        RegExtract = "No match found"
    End If
End With

End Function

我從一個空單元格中呼叫它： =RegExtract(D2; "([\]])(\s\w [.]){0,5}")

這是我第一次使用正則運算式，所以我可能在專家眼中做了可怕的事情。

所以這是我的表達： ([]])(\s\w [.]){0,5}

現在它只回傳 ] ost.

這比我在第一種正則運算式方法中所期望的要多得多，但是：

我無法擺脫第一個 ] ，這是在文本塊內找到我的有用位開始的位置所需的，因為 \K 在 excel 中不起作用。我以后可能會“找到并替換”它作為一個聰明的野蠻人，但我想知道清潔的方法，如果存在任何清潔的方法:)

2）我不明白迭代器如何作業以使我的所有“最多 5 次發生”：我期待第二組之后的 {0,5} 完全意味著：“再次重復前一組，直到文本塊結束（或者直到你成功完成 5 次）”。

感謝您的時間：）

uj5u.com熱心網友回復：

有一種方法可以回傳從某個模式開始的字串中的所有匹配項。但我現在想不起來了。

同時，似乎最簡單的方法是洗掉 first 之前的所有內容]，然后將 Regex 應用于其余部分。

例如：

Option Explicit
Sub findit()
  Const str As String = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan."
  Dim RE As RegExp, MC As MatchCollection, M As Match
  Dim S As String
  Dim sOutput As String
  
S = Mid(str, InStr(str, "]"))

Set RE = New RegExp
With RE
    .Pattern = "\w (?=\.)"
    .Global = True
    If .Test(S) = True Then
        Set MC = .Execute(S)
        For Each M In MC
            sOutput = sOutput & vbLf & M
        Next M
    End If
End With


MsgBox Mid(sOutput, 2)

End Sub

您當然可以通過使用計數器而不是For each回圈將匹配數限制為 5

使用 Regex/VBA 獲取每個以點結尾的單詞

uj5u.com熱心網友回復：

您可以使用以下正則運算式

([a-zA-Z] )。

讓我稍微解釋一下。

[a-zA-Z] ----> 這將查找包含從 a 到 z 和 A 到 Z 的任何字母的任何內容，但它只匹配第一個字母。

----> 用這個你告訴它匹配所有字母，直到它找到不是從 a 到 z 和 A 到 Z 的字母的東西

\。---->有了這個，你只是在尋找 . 在比賽結束時

這里的例子

希望這是您正在尋找的。

uj5u.com熱心網友回復：

正則運算式匹配：

除了@RonRosenfeld給出的答案之外，還可以應用一些人所說的“有史以來最好的正則運算式技巧”，這意味著首先匹配您不想要的內容，然后在捕獲組中匹配您想要的內容。例如：

^.*\]|(\w \.)

查看在線演示，簡而言之，這意味著：

^.*\]- 匹配從字串開頭到最后出現的右方括號的 0 （貪婪）字符；
|- 或者;
(\w \.)- 捕獲持有 1 （貪婪）以點結尾的單詞字符的組。

以下是它在 UDF 中的作業方式：

Sub Test()

Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "

Debug.Print RegExtract(s, "^.*\]|(\w \.)")

End Sub

'------

'The above Sub would invoke the below function as an example.
'But you could also invoke this through: `=RegExtract(A1,"^.*\]|(\w \.)")`
'on your sheet.

'------

Public Function RegExtract(Txt As String, Pattern As String) As String

Dim rMatch As Object, arrayMatches(), i As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = Pattern
    If .Test(Txt) Then
        For Each rMatch In .Execute(Txt)
            If Not IsEmpty(rMatch.SubMatches(0)) Then
                ReDim Preserve arrayMatches(i)
                arrayMatches(i) = rMatch.SubMatches(0)
                i = i   1
            End If
        Next
        RegExtract = Join(arrayMatches, " ")
    Else
        RegExtract = "No match found"
    End If
End With

End Function

正則運算式替換：

根據您想要的輸出，也可以使用替換功能。您必須將任何剩余的字符與另一個替代字符匹配。例如：

^.*\]|(\w \.\s?)|.

請參閱在線演示，簡而言之，這意味著我們添加了另一種選擇，它只是任何單個字符。第二個小補充是我們在第二個選項中添加了可選空格字符\s?的選項。

Sub Test()

Dim s As String: s = "some text [asred.] ost. |Monday - Ribben (ult.) lot. ac, sino. other maybe long text; collan. "

Debug.Print RegReplace(s, "^.*\]|(\w \.\s?)|.", "$1")

End Sub

'------

'There are now 3 parameters to parse to the UDF; String, Pattern and Replacement.

'------

Public Function RegReplace(Txt As String, Pattern As String, Replacement) As String

Dim rMatch As Object, arrayMatches(), i As Long

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = Pattern
    RegReplace = Trim(.Replace(Txt, Replacement))
End With

End Function

請注意，我曾經Trim()洗掉可能的尾隨空格。

RegexMatch 和 RegexReplace 目前都將回傳一個字串來清理輸入，但前者確實為您提供了處理 arrayMatches() 變數中的陣列的選項。

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/530976.html

標籤：擅长正则表达式vba正则表达式组

上一篇：使用陣列公式在單個單元格中獲取多個值

下一篇：計算檔案夾中的檔案-Dir的奇怪行為