讀取資料的程式，按分隔符拆分，洗掉空格然后計數-有解無憂

我有一個正在處理的程式，我需要讀取一個 .txt 檔案，該檔案具有多行資料，如下所示：

[ABC/DEF//25GHI////JKLM//675//]

我下面的程式可以在新行上列印每個序列以進行分析，但是該功能是我遇到問題的地方。我可以讓它洗掉單個數值“675”并保留字母數字。（從樣本中洗掉 675）

a = "string.txt"
file1 = open(a, "r")
with open(a, 'r') as file:
  lines = [line.rstrip('\n') for line in file]
  print(*lines, sep = "\n")

cleaned_data = []
def split_lines(lines, delimiter, remove = '[0-9] $'):
  for line in lines:
    tokens = line.split(delimiter)
    tokens = [re.sub(remove, "", token) for token in tokens]
    clean_list = list(filter(lambda e:e.strip(), tokens))
    cleaned_data.append(clean_list)
    print(clean_list) # Quick check if function works
split_lines(lines, "/")

然后列印出這樣的分隔行，洗掉空格（“/”所在的位置和數值）

[“ABC”、“DEF”、“25GHI”、“JKLM”]

然后我要做的是使用包含這些新分隔的行的“cleaned_data”串列，并量化它們以輸出：

4x [“ABC”、“DEF”、“25GHI”、“JKLM”]

接下來我可以使用“cleaned_data”來讀取每一行并列印重復字串的計數？

uj5u.com熱心網友回復：

from pprint import pprint

unique_data = {}
cleaned_data = [1, 2, 3, 4, 5, 'a', 'b', 'c', 'd', 3, 4, 5, 'a', 'b', [1, 2,
                                                                       ],
                [1, 2, ]]
for item in cleaned_data:
    key = str(item) # convert mutable objects like list to immutable string.
    if not unique_data.get(key):  # key does not exist
        unique_data[key] = 1, item  # Add count of 1 and the data
    else:  # A duplicate has been encountered
        # Increment the count
        unique_data[key] = (unique_data[key][0]   1), item

for k, v in unique_data.items():
    print(f"{v[0]}:{v[1]}")

輸出：

1:1
1:2
2:3
2:4
2:5
2:a
2:b
1:c
1:d
2:[1, 2]

uj5u.com熱心網友回復：

如果您只需要洗掉重復項：

    deduped_row_of_cleaned_data = list(set(row_of_cleaned_data))

如果您需要知道有多少重復項，只需從 len(row_of_cleaned_data) 中減去 len(deduped_row_of_cleaned_data)。

如果您需要計算所有重復項，您可以從您的重復資料行創建一個分配空字典的串列：

    empty_dict=dict.from_keys(list(set(row_of_cleaned_data)),[])

然后遍歷串列以添加每個值：

    for item in row_of_cleaned_data:
        empty_dict[item].append(item)

遍歷字典以獲取計數：

    for key, value in empty_dict.items():
        empty_dict[key] = len(value)

之后，您將獲得重復資料

    list(empty_dict.keys())

和每個專案的計數

    list(empty_dict.values()).

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/470320.html

標籤：Python python-3.x 功能收藏品数数

上一篇：使用purrr在幾列中映射我的用戶函式

下一篇：將for回圈變成函式python