Video Coming Soon...
30: wc
The wc command counts the number of characters, words, and lines in a file. This seems simple but you have to realize that there's different definitions of "word" depending on languages and how they're parsed. We'll keep it simple and say you only have to deal with words in a .cpp file.
The Challenge
As I mentioned, defining "word" is difficult but we'll keep it to whatever the original wc commands considers a word in a .cpp file. You could go insane and try to make it handle a language like Thai but that's for another time.
Your first version doesn't need to handle command line options, but when you push this farther try to get at least two options working.
You are also required to use filesystem to handle the filenames given to your wc. This won't be too complex so don't over think it.
The Code
My code has a bug! Actually, the first version I wrote used regex but I realized this is an advanced topic and decided to write a new one that does simple word counting. The problem is this new version is broken, but I'm going to use this as an opportunity to debug it.
Here's my buggy version:
See my first version of wc
View Source file wc.cpp Only#include <fmt/core.h>
#include <filesystem>
#include <fstream>
#include <string>
#include <cctype>
int count_words(const std::string& buffer) {
int words = 0;
bool eat_space = false;
// buffer.append("\n"); // this fails because const
for(size_t i = 0; i < buffer.size(); i++) {
if(isspace(buffer[i])) {
if(!eat_space) {
words++;
eat_space = true;
}
} else {
eat_space = false;
}
}
return words;
}
int main(int argc, char* argv[]) {
std::filesystem::path file_name{argv[1]};
std::ifstream in_file{file_name};
std::string buffer;
int line_count = 0;
int word_count = 0;
int char_count = 0;
while(in_file) {
getline(in_file, buffer);
line_count++;
char_count += buffer.size() + 1; // one extra for \n
word_count += count_words(buffer);
}
fmt::println("lines: {}, words: {}, chars: {}", line_count - 1, word_count, char_count - 1);
}
The Discussion
Think the only other documentation you should read for this is string and cctype. There's also a comment in my code:
// buffer.append("\n"); // this fails because const
Uncomment this line and see the error message. It's an insane error message that seems like you're calling the wrong function, but it's actually because you can't call .append() on a const string. Remove the const from const std::string& buffer and see what happens when you keep the trailing \n (newline).
Debugging
My version of this program has a bug, but my "regex version" below does not. Rather than fix it I'm using this as a chance to debug this for you live so you can see how I would debug this. Before I do, try to find the bug yourself. Why does my version give incorrect results for words?
Further Study
An initial further thing to do is implement more command line options. You can do this with all of the tools until you get bored. My versions are always "clues" to get you going and then you improve them using what you learn and can figure out on your own.
I've also written a very far future version of wc.cpp that uses regex in three different ways. Can you figure it out? Don't worry if you can't figure it out. Just think of this as an extreme challenge.
View Source file wc_regex.cpp Only
// WARNING: This is the advanced further study.
// If you don't get it that's totally alright. Come back to
// it when you can.
//
#include <fmt/core.h>
#include <filesystem>
#include <fstream>
#include <string>
#include <regex>
#include <iterator>
int count_words(const std::string& buffer) {
std::regex words_regex("([^\\s]+)");
std::sregex_iterator words_begin(buffer.begin(), buffer.end(), words_regex);
std::sregex_iterator words_end{};
return std::distance(words_begin, words_end);
}
int wtf_count_words(const std::string& buffer) {
std::regex words_regex("([^\\s]+)");
int word_count = 0;
std::smatch sm;
for(auto i = buffer.begin(); i != buffer.end(); i += sm.size() + 1) {
if(std::regex_search(i, buffer.end(), sm, words_regex)) {
word_count++;
} else {
break;
}
}
return word_count;
}
int size_count_words(const std::string& buffer) {
std::regex words_regex("([^\\s]+)");
int word_count = 0;
std::smatch sm;
for(size_t i = 0; i < buffer.size(); i += sm.size() + 1) {
if(std::regex_search(buffer.begin() + i, buffer.end(), sm, words_regex)) {
word_count++;
} else {
break;
}
}
return word_count;
}
int copy_copy_count_words(std::string buffer) {
std::regex words_regex("([^\\s]+)");
int word_count = 0;
for(std::smatch sm; std::regex_search(buffer, sm, words_regex); buffer = sm.suffix()) {
word_count++;
}
return word_count;
}
int main(int argc, char* argv[]) {
std::filesystem::path file_name{argv[1]};
std::ifstream in_file{file_name};
std::string buffer;
int line_count = 0;
int word_count = 0;
int char_count = 0;
while(in_file) {
getline(in_file, buffer);
line_count++;
char_count += buffer.size() + 1; // one extra for \n
word_count += copy_copy_count_words(buffer);
}
fmt::println("lines: {}, words: {}, chars: {}", line_count - 1, word_count, char_count - 1);
}
Register for Learn C++ the Hard Way
Register to gain access to additional videos which demonstrate each exercise. Videos are priced to cover the cost of hosting.