Question

First of all a bit of context: I'm trying to debug an issue that's happening with neovim, I'm not sure if this also happens with plain vim, but it's not all that relevant.

Even though the reporter uses linux and I use OSX 10.9, I've been able to get "similar" behaviour by using specific compilers + flags:

When I use either gcc 4.8.2 or gcc 4.9 (dev) combined with even a little bit of optimization, fortification and stack smashing protection, neovim crashes on startup.

$ edit CMakeLists.txt
$ ... -Wall -O1 -g -g3 -ggdb -mtune=generic -pipe -fstack-protector --param=ssp-buffer-size=4 -D_FORTIFY_SOURCE=2 -Wextra -pedantic -Wno-unused-parameter -std=gnu99 ...

$ make clean && make cmake CMAKE_EXTRA_FLAGS="-DCMAKE_C_COMPILER=/usr/local/bin/gcc-4.8" && make

I've been trying to debug it with lldb (gdb doesn't seem to give my any symbols, even after codesigning). I've gotten as far as this:

compiled with gcc 4.9:

➜  neovim git:(fortify-and-stack-protector) ✗ lldb ./build/bin/nvim
Current executable set to './build/bin/nvim' (x86_64).
(lldb) run src/eval.c
Process 1295 launched: './build/bin/nvim' (x86_64)
Process 1295 stopped
* thread #1: tid = 0x242b4f, 0x00007fff93f34866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread, stop reason = signal SIGABRT
    frame #0: 0x00007fff93f34866 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
-> 0x7fff93f34866:  jae    0x7fff93f34870            ; __pthread_kill + 20
   0x7fff93f34868:  movq   %rax, %rdi
   0x7fff93f3486b:  jmpq   0x7fff93f31175            ; cerror_nocancel
   0x7fff93f34870:  ret
(lldb) bt
* thread #1: tid = 0x242b4f, 0x00007fff93f34866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread, stop reason = signal SIGABRT
    frame #0: 0x00007fff93f34866 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff91b6435c libsystem_pthread.dylib`pthread_kill + 92
    frame #2: 0x00007fff8ce68b1a libsystem_c.dylib`abort + 125
    frame #3: 0x00007fff8ce68c91 libsystem_c.dylib`abort_report_np + 181
    frame #4: 0x00007fff8ce8c860 libsystem_c.dylib`__chk_fail + 48
    frame #5: 0x00007fff8ce8c830 libsystem_c.dylib`__chk_fail_overflow + 16
    frame #6: 0x00007fff8ce8ca7f libsystem_c.dylib`__strcpy_chk + 83
    frame #7: 0x000000010002e1a6 nvim`call_user_func [inlined] add_nr_var(nr=1, name=<unavailable>, v=<unavailable>, dp=<unavailable>) + 42 at eval.c:18744
    frame #8: 0x000000010002e17c nvim`call_user_func(fp=0x000000010030bd30, argcount=0, argvars=0x00007fff5fbfed80, rettv=0x00007fff5fbfef50, firstline=1, lastline=1, selfdict=0x0000000000000000) + 425 at eval.c:18455
    frame #9: 0x000000010002ef33 nvim`call_func(funcname=<unavailable>, len=<unavailable>, rettv=0x00007fff5fbfef50, argcount=0, argvars=0x00007fff5fbfed80, firstline=1, lastline=1, doesrange=0x00007fff5fbfef44, evaluate=1, selfdict=0x0000000000000000) + 717 at eval.c:7363
    frame #10: 0x0000000100032d1a nvim`get_func_tv(name=0x000000010030be20, len=9, rettv=0x00007fff5fbfef50, arg=0x00007fff5fbfef48, firstline=1, lastline=1, doesrange=0x00007fff5fbfef44, evaluate=1, selfdict=0x0000000000000000) + 340 at eval.c:7222
    frame #11: 0x000000010003673e nvim`ex_call(eap=0x00007fff5fbff190) + 475 at eval.c:3086
    frame #12: 0x000000010005634b nvim`do_cmdline(cmdline=<unavailable>, fgetline=0x00000001000494e3, cookie=0x00007fff5fbff790, flags=7) + 13602 at ex_docmd.c:2103
    frame #13: 0x0000000100049d52 nvim`do_source(fname=0x000000010017bb3b, check_other=<unavailable>, is_vimrc=<unavailable>) + 1615 at ex_cmds2.c:2695
    frame #14: 0x00000001001702c3 nvim`main + 251 at main.c:2009
    frame #15: 0x00000001001701c8 nvim`main(argc=<unavailable>, argv=<unavailable>) + 5152 at main.c:1919
    frame #16: 0x00007fff8b32d5fd libdyld.dylib`start + 1
(lldb) frame variable
(lldb) frame info
frame #0: 0x00007fff93f34866 libsystem_kernel.dylib`__pthread_kill + 10
(lldb) frame select 7
frame #7: 0x000000010002e1a6 nvim`call_user_func [inlined] add_nr_var(nr=1, name=<unavailable>, v=<unavailable>, dp=<unavailable>) + 42 at eval.c:18744
   18741     */
   18742    static void add_nr_var(dict_T *dp, dictitem_T *v, char *name, varnumber_T nr)
   18743    {
-> 18744      STRCPY(v->di_key, name);
   18745      v->di_flags = DI_FLAGS_RO | DI_FLAGS_FIX;
   18746      hash_add(&dp->dv_hashtab, DI2HIKEY(v));
   18747      v->di_tv.v_type = VAR_NUMBER;

compiled with gcc 4.8.2:

➜  neovim git:(fortify-and-stack-protector) ✗ lldb ./build/bin/nvim
Current executable set to './build/bin/nvim' (x86_64).
(lldb) rune
error: 'rune' is not a valid command.
(lldb) run
Process 3242 launched: './build/bin/nvim' (x86_64)
Process 3242 stopped
* thread #1: tid = 0x2454cb, 0x00007fff93f34866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread, stop reason = signal SIGABRT
    frame #0: 0x00007fff93f34866 libsystem_kernel.dylib`__pthread_kill + 10
libsystem_kernel.dylib`__pthread_kill + 10:
-> 0x7fff93f34866:  jae    0x7fff93f34870            ; __pthread_kill + 20
   0x7fff93f34868:  movq   %rax, %rdi
   0x7fff93f3486b:  jmpq   0x7fff93f31175            ; cerror_nocancel
   0x7fff93f34870:  ret
(lldb) bt
* thread #1: tid = 0x2454cb, 0x00007fff93f34866 libsystem_kernel.dylib`__pthread_kill + 10, queue = 'com.apple.main-thread, stop reason = signal SIGABRT
    frame #0: 0x00007fff93f34866 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fff91b6435c libsystem_pthread.dylib`pthread_kill + 92
    frame #2: 0x00007fff8ce68b1a libsystem_c.dylib`abort + 125
    frame #3: 0x00007fff8ce68c91 libsystem_c.dylib`abort_report_np + 181
    frame #4: 0x00007fff8ce8c860 libsystem_c.dylib`__chk_fail + 48
    frame #5: 0x00007fff8ce8c830 libsystem_c.dylib`__chk_fail_overflow + 16
    frame #6: 0x00007fff8ce8ca7f libsystem_c.dylib`__strcpy_chk + 83
    frame #7: 0x000000010002a969 nvim`eval_init + 129 at eval.c:868
    frame #8: 0x0000000100089ffd nvim`main(argc=1, argv=0x00007fff5fbffa58) + 140 at main.c:175
    frame #9: 0x00007fff8b32d5fd libdyld.dylib`start + 1
    frame #10: 0x00007fff8b32d5fd libdyld.dylib`start + 1
(lldb) frame select 7
frame #7: 0x000000010002a969 nvim`eval_init + 129 at eval.c:868
   865
   866    for (i = 0; i < VV_LEN; ++i) {
   867      p = &vimvars[i];
-> 868      STRCPY(p->vv_di.di_key, p->vv_name);
   869      if (p->vv_flags & VV_RO)
   870        p->vv_di.di_flags = DI_FLAGS_RO | DI_FLAGS_FIX;
   871      else if (p->vv_flags & VV_RO_SBX)

For this issue to appear, there needs to be some optimization, however this means that the compiler will inline things and throw away arguments, which is annoying and doesn't allow me to see the most important things at a glance. Would there be a combination of flags I could try that would keep the problem but allow better debuggability?

  • Is this likely a compiler bug? Clang seems to avoid it somehow, and GCC too when I turn off optimization.
  • gcc 4.8.2 and gcc 4.9 give a similar error (both times at a STRCPY) but in different locations, this worries me even more.
  • Where is the code for __chk_fail_overflow and __chk_overlap? They get called in strcpy_chk which I assume is the replacement of strcpy that gets inserted when compiling with -D_FORTIFY_SOURCE=2. I haven't been able to grep for it. I've grepped in this repo: https://github.com/aosm/Libc which appears to be the OSX 10.9 libc as I've tried to verify with apple's own opensource site.
  • How does the compiler decide what the 3rd argument to strcpy_chk is?! The original invocation to STRCPY doesn't include any size information:

.

STRCPY(v->di_key, name);
// I think gcc/clang replace this with:
__strcpy_chk(v->di_key, name, SOME_MAGIC_SIZE);

I hope some of the guru's on stack overflow could give me some tips/hints on what I should do next!

EDIT: I've been able to compile with -Og and gcc 4.8.2 and still provoke the error, hopefully this will give some more info.

No correct solution

OTHER TIPS

So, I couldn't stop myself from digging further and finally got the idea of looking at the stack + registers when entereing the dreaded __strcpy_chk, where I found:

(lldb) register read
General Purpose Registers:
       rax = 0x0000000000000057
       rbx = 0x00000001001d3900  vimvars
       rcx = 0x6300f1e7a96add52
       rdx = 0x0000000000000001 /* this is probably the size parameter */
       rdi = 0x00000001001d3919  vimvars + 25
       rsi = 0x000000010016730e  "count"

So gcc had deduced that the size of the dst parameter was 1, and is passing that to __strcpy_chk. So, because of the -D_FORTIFY_SOURCE=2, gcc replaces calls of well-known functions with their safe variants when it can deduce the size. Such is the case here as we'll see: the dst parameter is the vv_di.di_key field of this struct:

static struct vimvar {
  char        *vv_name;         /* name of variable, without v: */
  dictitem_T vv_di;             /* value and name for key */
  char vv_filler[16];           /* space for LONGEST name below!!! */
  char vv_flags;                /* VV_COMPAT, VV_RO, VV_RO_SBX */
}

struct dictitem_S {
  typval_T di_tv;               /* type and value of the variable */
  char_u di_flags;              /* flags (only used for variable) */
  char_u di_key[1];             /* key (actually longer!) */
}

Which has size 1. Gcc of course doesn't realize that this was somehow intentional: the vv_filler flag is meant to store the actual string. Which is why the comment says actually longer. vv_filler follows immediately on di_key in the struct.

I can't really fault the compiler for this, it makes sense. Someone at the neovim project is busy rebuilding the interpreter to be a translator from VimL to lua. A "stupid" fix at the moment is to either disable source fortifying or lower it a level to -D_FORTIFY_SOURCE=1, which should stop gcc from making these assumptions. Ideally though, we'd be able to fix this while keeping -D_FORTIFY_SOURCE=2 and not regressing in performance (vimvar is of course an oft-used struct).

I have some extra questions though. Is what the vim codebase does here legal? There were some discussions on the neovim issue tracker where some commenters suggested that relying on this was entirely undefined. One argument that I found interesting was that there could be space between the end of struct dictitem_S and vv_filler (padding). The commenter argued that it was illegal (UB) to use this space. Reading the relevant C99 documents doesn't provide immediate clarity on this specific case. The struct hack is allowed, but we were not sure if it is allowed for nested structs. i.e.: the struct hack appears at the end of dictitem_S, but definitely not at the end of its containing struct (vimvar).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top