Friday, September 12, 2008

Laying down the GroundWork

Without a doubt, Nagios is a great way to monitor hosts and services on the grid. But those of us who've ever edited the convoluted configuration files by hand know the joy of getting syntax errors and a overload of falsely-triggered alert emails enough to go on an office-destroying rampage. Thankfully, there are several good solutions out there in the form of frameworks.

At Australia-ATLAS we use Groundwork.

Far more fully featured than 'configuration generators' like NagiosAdmin(German), Lilac(alpha) and ignoramus(lacking), Groundwork wraps nagios entirely and is very stable.

Groundwork uses a MySQL backend to manage all of the configuration before it is committed (the standard .cfg files are eventually fed to nagios) which makes the interface smooth to use. Existing users of nagios take heart - it is well supported to load previously painstakingly produced .cfg files into Groundwork through the 'Load' functionality. In this way, scripts written to automatically generate workernode host instances can still be used - though this can also be done using Groundwork's 'clone host' tool.

Groundwork also takes care of all the mundane things like, the nagios daemon itself, managing users, roles and add-on packages.


All that sounds like a bit of an ad. Why would a time-starved grid admin move to groundwork?

Configuration is easier.

You no longer need to remember anything: all of the options you have are in a drop-down box or multi-select list. That also means no more typos! Finding any host, service, command or profile is a two-click operation. You tend to use groups more because they're so much simpler to create - instead of adding them to every host in a file, you just select them from a list.

There is a couple of times when the improved robustness of Groundwork can be a little annoying. For example, when you update a service check, you need to remember to deploy it to hosts/hostgroups otherwise you can commit changes and wonder why nothing has changed.

However, these are small in comparison with the improved productivity you gain.

So why not give Groundwork a try - you can get it at http://www.groundworkopensource.com/.

Thursday, September 11, 2008

Cfengine; fixes syslog-ng's wagon good

We just finished implementing syslog-ng to send all logs from the nodes in the TIER2 to a single logging server.

Seems simple at first, unfortunately most of the grid services do not log via the standard logging interface and make their own log files. This gets even worse when syslog-ng will not start if a log file it is supposed to track does not exist.

After a bit of pain we realised that cfengine could detect the presence of the gLite log files, rewrite the syslog-ng server config and restart syslog-ng.

A couple of shell scripts and we are away, for each log file they produce something like:

classes:

s_var_log_gridftp_session = ( FileExists("/var/log/gridftp-session.log") )

editfiles:
# /var/log/gridftp-session.log
###################################
s_var_log_gridftp_session::
{ /etc/syslog-ng.conf
DefineClasses "newsyslog_ng"
BeginGroupIfNoLineContaining "# s_var_log_gridftp_session v15"
DeleteLinesContaining "s_var_log_gridftp_session"
Append "# s_var_log_gridftp_session v15"
Append "source s_var_log_gridftp_session { file ('/var/log/gridftp-session.log' follow_freq(30) log_prefix('log_gridftp_session: ')); };"
Append "log { source(s_var_log_gridftp_session); destination(d_stunnel); };"
EndGroup
}
!s_var_log_gridftp_session::
{ /etc/syslog-ng.conf
DefineClasses "newsyslog_ng"
DeleteLinesContaining "s_var_log_gridftp_session"
}

shellcommands:

any::

newsyslog_ng::
"/sbin/service syslog stop" umask=022
"/sbin/chkconfig --level 2345 syslog off"
"/sbin/chkconfig --add syslog-ng"
"/sbin/chkconfig --add syslog-ng-stunnel"
"/etc/init.d/syslog-ng-stunnel restart" umask=022
"/sbin/service syslog-ng restart" umask=022

So central logging is a go, and there is only one master config file for all nodes. Even better if you start a new service, its logs get added automatically.