python-替換資料框中不包含某些單詞的所有元素-有解無憂

我有一個非常大的資料框，我想用 NaN 替換所有不包含特定單詞的元素（同時保持第一個“id”列不變）。

例如：

index  id    text1                        text2                        ...
1      123   {'"key'": '"living_space'"   '"value'": '"01.04.2022'"}   ...
2      124   {'"key'": '"rooms'"          '"value'": '"3'"}            ...
3      125   23                           {'"key'": '"rooms'"          ...
4      126   45                           Apartment sold               ...

我想保留資料框中包含單詞key或value的所有元素，并用nan替換所有其他元素，所以我會得到如下資料框：

index  id    text1                        text2                        ...
1      123   {'"key'": '"living_space'"   '"value'": '"01.04.2022'"}   ...
2      124   {'"key'": '"rooms'"          '"value'": '"3'"}            ...
3      125   nan                          {'"key'": '"rooms'"          ...
4      126   nan                          nan                          ...

我曾嘗試使用以下代碼，但它只是清除了整個資料集。

l1 = ['key', 'value']
df.iloc[:,1:] = df.iloc[:,1:].applymap(lambda x: x if set(x.split()).intersection(l1) else '')

提前致謝。

uj5u.com熱心網友回復：

考慮以下方法來解決問題。它由 2 個部分組成。(1) 決定是否保留或洗掉資料的邏輯在函式中實作substring_filter- 我們只需檢查target字串是否包含來自的任何單詞words。np.where(2) 使用numpy 中非常方便的輔助函式執行實際過濾。

import numpy as np
import pandas as pd


def substring_filter(target, words):
    for word in words:
        if word in target:
            return True
    return False


if __name__ == '__main__':

    df = pd.DataFrame({
        'A': [1, 2, 3, 4],
        'B': [True, False, False, True],
        'C': ['{"key": 1}', '{"value": 2}', 'text', 'abc']})

    words_to_search = ('key', 'value')
    df.loc[:, 'C'] = np.where(
        df.loc[:, 'C'].apply(lambda x: substring_filter(x, words_to_search)),
        df.loc[:, 'C'],
        None)
    print(df)

結果是：

   A      B             C
0  1   True    {"key": 1}
1  2  False  {"value": 2}
2  3  False          None
3  4   True          None

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/454555.html

標籤：Python 熊猫数据框楠替代

上一篇：在Pandas行中的某個值之后，在后續行中查找第一次出現的值

下一篇：如何使用pd.get_dummies將布爾列轉換為0和1