Question

So, I am trying to create an English language trie data structure implementation in C++. I have created a Trie and TrieNode class. The TrieNode class takes in its constructor a vector<string> that is a list of words to construct the Trie from.

My solution to getting this vector<string> was to use an EnglishWordsListGenerator class, which in its constructor, takes in a string of the file name of a file which presumably contains a list of valid English words. (The ever popular enable1.txt file).

I'm new to C++, so I don't know if it is considered good practice or not to read the file directly from the constructor when initializing the object. After all, what if the operation fails? On the other hand though, I know Bjarne heavily pushes for RAII. What is the right thing to do here?

Was it helpful?

Solution

When seeing a class with a constructor signature like

 EnglishWordsListGenerator(const std::string &wordFileName)

I think it is pretty obvious that this constructor will read the given file (and so need some time), and it should not be to hard to understand that the caller has to care for possible exceptions from this (because file IO can fail). So despite what other answers are saying, this design is ok.

Don't get me wrong, in the future you might get requirements where this design might not be sufficient any more, but as long as this simple interface and behaviour is all your program needs, I would stick to the KISS and YAGNI principles and avoid overcomplicating things by providing a separate initialization method "just in case".

OTHER TIPS

If you are new to C++, it may be desirable to shy away from doing big work in a constructor. Constructors are wedded tightly to the way the language behaves, and you really don't want the constructor getting called unexpectedly (explicit is your friend!). Constructors are also rather unique in that they cannot return a value. This can make error handling more complicated.

This sort of recommendation becomes less important as you learn more about C++ and all the different things which can cause constructors to get called. In the perfect world, the correct answer is that your library should do exactly what the user wanted it to at all times. Quite often in C++ that is best implemented by doing things inside constructors. However, you just want to make sure that you understand constructors enough not to trap your users in a corner. Debugging why the program operates poorly because a constructor got invoked when a user did not intend it to be invoked can be a pain.

Likewise exception handling in constructors can be more difficult because the scope where the exception occurs is entwined with the scope where a variable gets created, rather than where you want to use it. Exceptions can be infuriating when they occur before main() is run, denying you the ability to handle that exception for the user. This can happen if you use global variables and the errors can be cryptic.

One option that sits between the extremes is to use factory methods. If your users expect to use factory methods, it makes sense that they could do a little extra initialization at that point. A factory method won't be called by accident like a constructor might, so it's harder to mess it up.

Once you are confident you understand the consequences of doing work in a constructor, it can be a powerful tool. The ultimate example of doing work in a constructor might be the lock_guard class. Lock guards are constructed by passing them a lock such as a mutex. They call .lock() on it during their constructor (aka doing real work) and .unlock() on it during their destructor.

These classes get away with doing real work in the constructor because they are designed to be used that way. Anyone working with a lock_guard is well aware that the constructor does work because code that uses it looks something like:

{
    lock_guard guardMe(myLock); // locks myLock
    doSomeStuff();
    doMoreStuff();
    // as I leave this block, guardMe will unlock myLock.  It will do so
    // even if I leave via an exception!
}

If you look at the implementations of lock_guard classes, you will find easily 80% of their attention or more is focused on how to manage the constructors correctly to prevent surprises. Properly implemented, they're very powerful. Improperly implemented lock_guards can be the bane of your existence because they can appear to do one thing when they actually do another.

It's not a good practice to load an object from a file in the constructor.

Beyond the good C++ arguments in the other answers, I'd add a clean code principle: a function should do only one thing. Ok, the constructor is not a normal function, but the principle still applies.

I suggest the following:

  • keep the Trie as general as possible to aim at reuse. There are plenty of other applications that could use it.
  • keep it independent of the way you feed it: ideally. Different application could use different approaches, for example to feed it from a string or from an istream. After all, future versions of your app could querry the words from a database with different languages.
  • use a builder design pattern, to assemble the pieces of the construction process. The builder would then load the trie from a source, using some kind of general approach (e.g. Create the trie, Open the source, iterate through source data and inserting it into the trie, once it is done return the trie). You could then make a concrete builder for loading from a file, and easily opt for other alternatives.

I don't think I would read a file in the constructor for a couple of reasons:

  1. it will significantly slow down object creation and potentially the startup of your program.
  2. as you mentioned, there are limited options for error handling. You would have to either put the entire program in a try...catch block or set some sort of flag if something goes wrong.
  3. what if you need to read it in more than once? You would have to have a duplicate of the code, or split the code out to a function the constructor calls, in which case just do it after the object is created anyway.

I'm sure there are other reason, those are just the first ones that came to my mind. If its a small file and you are only reading it the once, you are probably ok.

Licensed under: CC-BY-SA with attribution
scroll top