سؤال

I came up with this little code but all the professionals said its dangerous and I should not write code like this. Can anyone highlight its vulnerabilities in 'more' details?

int strlen(char *s){ 
    return (*s) ? 1 + strlen(s + 1) : 0; 
}
هل كانت مفيدة؟

المحلول

It has no vulnerabilities per se, this is perfectly correct code. It is prematurely pessimized, of course. It will run out of stack space for anything but the shortest strings, and its performance will suck due to recursive calls, but otherwise it's OK.

The tail call optimization most likely won't cope with such code. If you want to live dangerously and depend on tail-call optimizations, you should rephrase it to use the tail-call:

// note: size_t is an unsigned integertype

int strlen_impl(const char *s, size_t len) {
    if (*s == 0) return len;
    if (len + 1 < len) return len; // protect from overflows
    return strlen_impl(s+1, len+1);
}        

int strlen(const char *s) {
   return strlen_impl(s, 0);
}

نصائح أخرى

Dangerous it a bit of a stretch, but it is needlessly recursive and likely to be less efficient than the iterative alternative.

I suppose also that given a very long string there is a danger of a stack overflow.

There are two serious security bugs in this code:

  1. Use of int instead of size_t for the return type. As written, strings longer than INT_MAX will cause this function to invoke undefined behavior via integer overflow. In practice, this could lead to computing strlen(huge_string) as some small value like 1, malloc'ing the wrong amount of memory, and then performing strcpy into it, causing a buffer overflow.

  2. Unbounded recursion which can overflow the stack, i.e. Stack Overflow. :-) A compiler may choose to optimize the recursion into a loop (in this case, it's possible with current compiler technology), but there is no guarantee that it will. In a best case, stack overflow will simply crash the program. In a worst case (e.g. running on a thread with no guard page) it could clobber unrelated memory, possibly yielding arbitrary code execution.

The problem with killing the stack that have been pointed out, ought to be fixed by a decent compiler, where the apparent recursive call is flattened into a loop. I verified this hypothesis and asked clang to translate your code:

//sl.c
unsigned sl(char const* s) {
  return (*s) ? (1+sl(s+1)) : 0;
}

Compiling and disassembling:

clang -emit-llvm -O1 -c sl.c -o sl.o
#                 ^^ Yes, O1 is already sufficient.
llvm-dis-3.2 sl.o

And this is the relevant part of the llvm result (sl.o.ll)

define i32 @sl(i8* nocapture %s) nounwind uwtable readonly {
  %1 = load i8* %s, align 1, !tbaa !0
  %2 = icmp eq i8 %1, 0
  br i1 %2, label %tailrecurse._crit_edge, label %tailrecurse

tailrecurse:                                      ; preds = %tailrecurse, %0
  %s.tr3 = phi i8* [ %3, %tailrecurse ], [ %s, %0 ]
  %accumulator.tr2 = phi i32 [ %4, %tailrecurse ], [ 0, %0 ]
  %3 = getelementptr inbounds i8* %s.tr3, i64 1
  %4 = add i32 %accumulator.tr2, 1
  %5 = load i8* %3, align 1, !tbaa !0
  %6 = icmp eq i8 %5, 0
  br i1 %6, label %tailrecurse._crit_edge, label %tailrecurse

tailrecurse._crit_edge:                           ; preds = %tailrecurse, %0
  %accumulator.tr.lcssa = phi i32 [ 0, %0 ], [ %4, %tailrecurse ]
  ret i32 %accumulator.tr.lcssa
}

I don't see a recursive call. Indeed clang called the looping label tailrecurse which gives us a pointer as to what clang is doing here.

So, finally (tl;dr) yes, this code is perfectly safe and a decent compiler with a decent flag will iron the recursion out.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top