Question

I am writing a script that does something to a text file (what it does is irrelevant for my question though). So before I do something to the file I want to check if the file exists. I can do this, no problem, but the issue is more that of aesthetics.

Here is my code, implementing the same thing in two different ways.

def modify_file(filename):
    assert os.path.isfile(filename), 'file does NOT exist.'


Traceback (most recent call last):
  File "clean_files.py", line 15, in <module>
    print(clean_file('tes3t.txt'))
  File "clean_files.py", line 8, in clean_file
    assert os.path.isfile(filename), 'file does NOT exist.'
AssertionError: file does NOT exist.

or:

def modify_file(filename):
    if not os.path.isfile(filename):
        return 'file does NOT exist.'


file does NOT exist.

The first method produces an output that is mostly trivial, the only thing I care about is that the file does not exist.

The second method returns a string, it is simple.

My questions is: which method is better for letting the user know that the file does not exist? Using the assert method seems somehow more pythonic.

Was it helpful?

Solution

You'd go with a third option instead: use raise and a specific exception. This can be one of the built-in exceptions, or you can create a custom exception for the job.

In this case, I'd use IOError, but a ValueError might also fit:

def modify_file(filename):
    if not os.path.isfile(filename):
        raise IOError('file does NOT exist.')

Using a specific exception allows you to raise other exceptions for different exceptional circumstances, and lets the caller handle the exception gracefully.

Of course, many file operations (like open()) themselves raise OSError already; explicitly first testing if the file exists may be redundant here.

Don't use assert; if you run python with the -O flag, all assertions are stripped from the code.

OTHER TIPS

assert is intended for cases where the programmer calling the function made a mistake, as opposed to the user. Using assert under that circumstance allows you to make sure a programmer is using your function correctly during testing, but then strip it out in production.

Its value is somewhat limited since you have to ensure you exercise that path through the code, and you often want to additionally handle the problem with a separate if statement in production. assert is most useful in situations like, "I want to helpfully work around this problem if a user hits it, but if a developer hits it, I want it to crash hard so he will fix the code that calls this function incorrectly."

In your particular case, a missing file is almost certainly a user error, and should be handled by raising an exception.

From UsingAssertionsEffectively

Checking isinstance() should not be overused: if it quacks like a duck, there's perhaps no need to enquire too deeply into whether it really is. Sometimes it can be useful to pass values that were not anticipated by the original programmer.

Places to consider putting assertions:

checking parameter types, classes, or values
checking data structure invariants
checking "can't happen" situations (duplicates in a list, contradictory state variables.)
after calling a function, to make sure that its return is reasonable 

The overall point is that if something does go wrong, we want to make it completely obvious as soon as possible.

It's easier to catch incorrect data at the point where it goes in than to work out how it got there later when it causes trouble.

Assertions are not a substitute for unit tests or system tests, but rather a complement. Because assertions are a clean way to examine the internal state of an object or function, they provide "for free" a clear-box assistance to a black-box test that examines the external behaviour.

Assertions should not be used to test for failure cases that can occur because of bad user input or operating system/environment failures, such as a file not being found. Instead, you should raise an exception, or print an error message, or whatever is appropriate. One important reason why assertions should only be used for self-tests of the program is that assertions can be disabled at compile time.

If Python is started with the -O option, then assertions will be stripped out and not evaluated. So if code uses assertions heavily, but is performance-critical, then there is a system for turning them off in release builds. (But don't do this unless it's really necessary. It's been scientifically proven that some bugs only show up when a customer uses the machine and we want assertions to help there too. :-) )

Licensed under: CC-BY-SA with attribution
scroll top