Pergunta

This is sort of the complementary question to How to best protect from 0 passed to std::string parameters?. Basically, I'm trying to figure out whether there is a way to have the compiler warn me if a code path would unconditionally try to call std::string's char* constructor using NULL.

Run time checks are all well and good, but for a case like:

std::string get_first(const std::string& foo) {
    if (foo.empty()) return 0; // Or NULL, or nullptr
    return foo.substr(0, 1);
}

it's annoying that, even though the code is guaranteed to fail if that code path is exercised, and the system header is usually annotated with the precondition saying that the pointer must not be null, this still passes compilation under gcc, clang, etc., even with -std=c++11 -Wall -Wextra -pedantic -Werror. I can block the specific case of 0 on gcc with -Werror=zero-as-null-pointer-constant, but that doesn't help with NULL/nullptr and it's sort of tackling the related but dissimilar problem. The major issue is that a programmer can make this mistake with 0, NULL or nullptr and not notice it if the code path isn't exercised.

Is it possible to force this check to be compile time, covering an entire code base, without nonsense like replacing std::string with a special subclass throughout the code?

Foi útil?

Solução

There are different approaches, depending on whether it should work on all compilers, in very restricted circumstances and with some side-effects, or only on some, but in exchange far more broadly:

  1. Adding overloads which complain when used. There is [[deprecated]] since C++11 which will complain at the immediate call-site, as long as it's not suppressed, like normally in a system-header.

    GCC and CLANG provide a better suited custom attribute, __attribute__((error("message"))), which will always break the build if the function is used and name the call-site.

    The problem with adding overloads accepting all those things which could be nullpointer-literals, is that it might confuse other template's SFINAE, thus breaking code, and cannot catch an argument already of type char* which unfortunately just happens to be a nullpointer:

    // added overload, common lines:
    template<class X, class = typename
        std::enable_if<std::is_arithmetic<X>() && !std::is_pointer<X>()>
    // best on GCC / clang:
    string(X) __attribute__((error("nullpointer")));
    // best on others:
    [[deprecated("UB")]] string(X) /* no definition, link breaks */;
    
  2. The preferable alternative is marking the argument as non-null, and letting the compilers optimizer figure it out. You are asking for and heeding such warnings, right?
    That's only an option in GCC and CLANG, but it avoids the disadvantage of additional overloads and catches all cases the compiler can figure out, which means it works better with more optimization.

    basic_string(const CharT* s,
                 size_type count,
                 const Allocator& alloc = Allocator()) __attribute__((nonnull));
    basic_string(const CharT* s,
                 const Allocator& alloc = Allocator()) __attribute__((nonnull));
    

    Generally, one can ask GCC / clang whether it can determine the value of any expression at compile-time, using __builtin_constant_p.

Outras dicas

It wouldn't be done at compile time, but it would be fairly easy to define a string type that handled this more gracefully than (at least most implementations of) std::string currently does.

The basis would be the fact that NULL has type int and nullptr has type nullptr_t. As such, it would be fairly easy to overload on these types, and have them fail "fast and loud" when passed the wrong types of arguments:

#include <iostream>
#include <fstream>
#include <algorithm>
#include <iterator>
#include <vector>
#include <numeric>
#include <string>

class test_string {
    std::vector<char> data;
public:

    test_string(nullptr_t) { throw std::runtime_error("Bad input data"); }
    test_string(int) { throw std::runtime_error("Bad input data"); }
    test_string(void *) { throw std::runtime_error("Bad input data"); }
    test_string(char const *s) { size_t length = strlen(s); data.resize(length); std::copy_n(s, length, &data[0]); }
    size_t length() const { return data.size(); }

    friend std::ostream &operator<<(std::ostream &os, test_string const &s) {
        std::copy(s.data.begin(), s.data.end(), std::ostream_iterator<char>(os));
        return os;
    }
};

int main() {

    // These will all immediately throw if uncommmented:
    //test_string x(NULL);
    //test_string y(nullptr);
    //test_string z(0);

    // But this still works fine:
    test_string a("This is a string");
    std::cout << a;
}

Doing this with std::string might be a little more difficult. The problem is that std::string already has quite a few overloads of its constructor. The overload for nullptr_t should always be safe (the only thing of that type is nullptr, and you clearly never want that).

Having an overload to take int in place of each charT const * might be more difficult. The problem is that some of the overloads already distinguish the intent based on (for example) the position of the pointer vs. a count indicating the length. You'd have to look pretty carefully to be sure you were catching invocation with NULL, but not interfering with an existing (legitimate) use that takes (for example) a char and a size_t. I'm not convinced it's impossible, but it would take some care (and you might end up having to live with leaving a few cases that could still slip through).

To be very practical, however, this would probably have to be a modification to std::string though--having it as a separate class leads to some fairly serious impracticality (at least in my opinion).

Licenciado em: CC-BY-SA com atribuição
scroll top