我正在嘗試決議一個大文本檔案并使用 strtok 將其拆分為單個單詞。分隔符洗掉所有特殊字符、空格和換行符。出于某種原因,當我 printf() 它時,它只列印第一個單詞,其余的則列印一堆 (null)。
ifstream textstream(textFile);
string textLine;
while (getline(textstream, textLine))
{
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] = textLine.length() 1;
char *line_c = new char[textLine.length() 1]; // creates a character array the length of the line
strcpy(line_c, textLine.c_str()); // copies the line string into the character array
char *word = strtok(line_c, delimiters); // removes all unwanted characters
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
printf("%s", word);
}
}
uj5u.com熱心網友回復:
與其跳過必要的使用strtok
,我會寫一個小替換,直接與字串一起作業,而不修改它的輸入,按照這個一般順序:
std::vector<std::string> tokenize(std::string const &input, std::string const &delims = " ") {
std::vector<std::string> ret;
int start = 0;
while ((start = input.find_first_not_of(delims, start)) != std::string::npos) {
auto stop = input.find_first_of(delims, start 1);
ret.push_back(input.substr(start, stop-start));
start = stop;
}
return ret;
}
至少對我來說,這似乎大大簡化了其余的代碼:
std::string textLine;
while (std::getline(textStream, textLine)) {
struct_ptr->numOfCharsProcessedFromFile[TESTFILEINDEX] = textLine.length() 1;
auto words = tokenize(textLine, delims);
for (auto const &word : words) {
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n';
std::cout << word << '\n';
}
}
這也避免了(除其他外)大量記憶體泄漏,在回圈的每次迭代中分配記憶體,但從不釋放任何記憶體。
uj5u.com熱心網友回復:
向上移動printf
兩行。
while (word != nullptr && wordCount(struct_ptr->dictRootNode, word) > struct_ptr->minNumOfWordsWithAPrefixForPrinting)
{
printf("%s", word);
MyFile << word << ' ' << wordCount(struct_ptr->dictRootNode, word) << '\n'; // writes each word and number of times it appears as a prefix in the tree
word = strtok(NULL, delimiters); // move to next word
}
uj5u.com熱心網友回復:
正如@j23 指出的那樣,您printf
的位置錯誤。
正如@Jerry-Coffin 指出的那樣,您嘗試做的事情有更多的 c 風格和現代方式來完成。除了避免突變之外,您還可以避免從文本字串中復制單詞。(在我下面的代碼中,我們逐行讀取,但如果您知道整個文本適合記憶體,您也可以將整個內容讀入std::string
.)
因此,使用std::string_view
避免執行額外的副本,它就像指向字串的指標和長度一樣。
在這里,對于一個用例,您不需要將單詞存盤在另一個資料結構中的樣子 - 某種單詞的一次性處理:
#include <iostream>
#include <fstream>
#include <string>
#include <string_view>
#include <cctype>
template <class F>
void with_lines(std::istream& stream, F body) {
for (std::string line; std::getline(stream,line);) {
body(line);
}
}
template <class F>
void with_words(std::istream& stream, F body) {
with_lines(stream,[&body](std::string& line) {
std::string_view line_view{line.cbegin(),line.cend()};
while (!line_view.empty()) {
// skip whitespaces
for (; !line_view.empty() && isspace(line_view[0]);
line_view.remove_prefix(1));
size_t position = 0;
for (; position < line_view.size() &&
!isspace(line_view[position]);
position );
if (position > 0) {
body(line_view.substr(0,position));
line_view.remove_prefix(position);
}
}
});
}
int main (int argc, const char* argv[]) {
size_t word_count = 0;
std::ifstream stream{"input.txt"};
if(!stream) {
std::cerr
<< "could not open file input.txt" << std::endl;
return -1;
}
with_words(stream, [&word_count] (std::string_view word) {
std::cout << word_count << " " << word << std::endl;
word_count ;
});
std::cout
<< "input.txt contains "
<< word_count << " words."
<< std::endl;
return 0;
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/436697.html