strtok()僅列印第一個單詞，其余為(null)-有解無憂

我正在嘗試決議一個大文本檔案并使用 strtok 將其拆分為單個單詞。分隔符洗掉所有特殊字符、空格和換行符。出于某種原因，當我 printf() 它時，它只列印第一個單詞，其余的則列印一堆 (null)。

    ifstream textstream(textFile);
    string textLine;
    while (getline(textstream, textLine))
    {
        struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX]  = textLine.length()   1;
        char *line_c = new char[textLine.length()   1]; // creates a character array the length of the line
        strcpy(line_c, textLine.c_str());               // copies the line string into the character array
        char *word = strtok(line_c, delimiters);        // removes all unwanted characters
        while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
        {
            MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
            word = strtok(NULL, delimiters);                                            // move to next word
            printf("%s", word);
        }
    }

uj5u.com熱心網友回復：

與其跳過必要的使用strtok，我會寫一個小替換，直接與字串一起作業，而不修改它的輸入，按照這個一般順序：

std::vector<std::string> tokenize(std::string const &input, std::string const &delims = " ") {
    std::vector<std::string> ret;
    int start = 0;

    while ((start = input.find_first_not_of(delims, start)) != std::string::npos) {
        auto stop = input.find_first_of(delims, start 1);
        ret.push_back(input.substr(start, stop-start));
        start = stop;
    }
    return ret;
}

至少對我來說，這似乎大大簡化了其余的代碼：

std::string textLine;
while (std::getline(textStream, textLine)) {
    struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX]  = textLine.length()   1;
    auto words = tokenize(textLine, delims);
    for (auto const &word : words) {
        MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n';
        std::cout << word << '\n';
    }
}

這也避免了（除其他外）大量記憶體泄漏，在回圈的每次迭代中分配記憶體，但從不釋放任何記憶體。

uj5u.com熱心網友回復：

向上移動printf兩行。

while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
    printf("%s", word);
    MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
    word = strtok(NULL, delimiters);                                            // move to next word

}

uj5u.com熱心網友回復：

正如@j23 指出的那樣，您printf的位置錯誤。

正如@Jerry-Coffin 指出的那樣，您嘗試做的事情有更多的 c 風格和現代方式來完成。除了避免突變之外，您還可以避免從文本字串中復制單詞。（在我下面的代碼中，我們逐行讀取，但如果您知道整個文本適合記憶體，您也可以將整個內容讀入std::string.)

因此，使用std::string_view避免執行額外的副本，它就像指向字串的指標和長度一樣。

在這里，對于一個用例，您不需要將單詞存盤在另一個資料結構中的樣子 - 某種單詞的一次性處理：

#include <iostream>
#include <fstream>
#include <string>
#include <string_view>
#include <cctype>

template <class F>
void with_lines(std::istream& stream, F body) {
  for (std::string line; std::getline(stream,line);) {
    body(line);
  }
}

template <class F>
void with_words(std::istream& stream, F body) {
  with_lines(stream,[&body](std::string& line) {
    std::string_view line_view{line.cbegin(),line.cend()};
    while (!line_view.empty()) {
      // skip whitespaces
      for (; !line_view.empty() && isspace(line_view[0]);
       line_view.remove_prefix(1));
      size_t position = 0;
      for (; position < line_view.size() &&
         !isspace(line_view[position]);
       position  );
      if (position > 0) {
        body(line_view.substr(0,position));
        line_view.remove_prefix(position);
      }
    }
  });
}

int main (int argc, const char* argv[]) {
  size_t word_count = 0;
  std::ifstream stream{"input.txt"};
  if(!stream) {
    std::cerr
      << "could not open file input.txt" << std::endl;
    return -1;
  }
  with_words(stream, [&word_count] (std::string_view word) {
    std::cout << word_count << " " << word << std::endl;
    word_count  ;
  });
  std::cout
    << "input.txt contains "
    << word_count << " words."
    << std::endl;
  return 0;
}

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/436697.html

標籤：C 斯托克

上一篇：Java物件建構式的GetMethodID

下一篇：是否可以在main()之外呼叫函式？