Thursday, February 25, 2010

panda athena jobs crash with ls/rm/bash segfault

During January we had a number of issues getting our Panda analysis queue back on-line.
15.5.1 jobs kept crashing with a "rm segfault".
After a lot of tracking we managed to hone it down to a very specific LD_PRELOAD and LD_LIBRARY_PATH combination causing the core binaries to segfault.

This behavior only occurs for a 8 character length window of the LD_LIBRARY_PATH, so the length of experimental software path is a clincher as to whether a site will see it.
The same problem occurs in CentOS and RHEL5 as well, so it's an upstream issue.

The ticket is in to Redhat...

https://bugzilla.redhat.com/show_bug.cgi?id=563759

For the moment we have just modified our local config to append an extra useless LD_LIBRARY_PATH path. Unfortunately CMT setup cleans the path of any entries which do not contain libraries or do not exist, so we created an extra directory in the software area and created an empty dummy.so file.

No comments: