Question

I've written native ruby gems before and run them on Heroku with good results. But this one has me stumped.

All tests - aggressive ones - run fine under both Mac OS X (Mavericks) and Linux (though the only installation I have access to is Redhat). On Heroku the same gem seg faults on live data that are smaller and simpler than the tests. The gem also runs fine in my OS X Rails development environment with a copy of the live data pulled down from Heroku.

The stack trace from the Heroku logs is not helpful. The Ruby-level trace shows the offending call. Great. But then the C call stack is weird. It has the top level gem API call with a line number that doesn't make any sense. Deeper stack frames name files from my gems and seem roughly possible, but the line numbers point to straight-line code and blank space, not call sites. It looks like the debug information mapping is somewhat off.

Another problem is that even though the file names are from my gem, the referenced .so file is wrong (zlib in fact).

The general form of the stack trace seemed to point to memory allocation or deallocation or bogus pointer reference bug, so I followed these instructions to run the gem under valgrind. It comes up completely clean.

Not sure where to go from here. Questions that seem reasonable to me, but YMMV:

  1. Is there a way to get the Heroku stack trace to make sense?
  2. Are there suggestions to force the error in a dev environment?
  3. Any other suggestions?

Here is the relevant part of the Heroku trace. I can Gist the whole thing if anyone thinks it's relevant.

2014-05-05T01:20:29.366714+00:00 app[web.1]: /app/vendor/ruby-1.9.3/bin/ruby(rb_bug+0xb3) [0x573153] error.c:277
2014-05-05T01:20:29.366720+00:00 app[web.1]: /app/vendor/ruby-1.9.3/bin/ruby() [0x4b5f50] signal.c:644
2014-05-05T01:20:29.366722+00:00 app[web.1]: /lib/libpthread.so.0(+0xf8f0) [0x7f99ace9f8f0]
2014-05-05T01:20:29.366724+00:00 app[web.1]: /lib/libz.so.1(adler32+0x288) [0x7f99ab0325b8]
2014-05-05T01:20:29.366726+00:00 app[web.1]: /lib/libz.so.1(+0x4f8d) [0x7f99ab034f8d] qt.c:58
2014-05-05T01:20:29.366728+00:00 app[web.1]: /lib/libz.so.1(+0x699d) [0x7f99ab03699d] lulu.c:208
2014-05-05T01:20:29.366730+00:00 app[web.1]: /lib/libz.so.1(deflate+0x151) [0x7f99ab035281] qt.c:24
2014-05-05T01:20:29.366732+00:00 app[web.1]: /lib/libz.so.1(compress2+0xa6) [0x7f99ab032986] marker.c:76
2014-05-05T01:20:29.366734+00:00 app[web.1]: /app/vendor/bundle/ruby/1.9.1/gems/lulu-0.0.2/lib/lulu/lulu.so(+0x6e82) [0x7f99a6ddce82] lulu.c:223
2014-05-05T01:20:29.369302+00:00 app[web.1]:   428 /app/vendor/bundle/ruby/1.9.1/gems/sass-3.2.14/lib/sass/selector/abstract_sequence.rb

My gem is named lulu. The bottom entry is the initial call and higher frames end at signal.c where the error is raised. marker.c and qt.c are my files, but they are defined in lulu.so, not zlib.

Here is source code, though I don't expect anyone to do a code review... I'm calling all the ruby malloc routines like a good ruby citizen. It's ruby 1.9.3, incidentally.

Addition

Maybe I have a clue. The reference lulu.c:223 is a call to my own function compress. There is a compress in zlib, too. My guess is that the dynamic loader is getting the wrong one. Wait out while I try renaming compress.

Was it helpful?

Solution

It did turn out that the loader was finding compress in zlib instead of the same-named function in my code.

The call and definition are in the same C file! I did not realize Linux would dynamically link a function and its uses within the same object file and prefer a previously loaded function of the same name!

I fixed the problem by declaring my compress to be static.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top