Features of mon 0.38.12

mon was developed under Linux, but it is known to work under Solaris 2.5 and 2.6. Since the clients and server are written completely in Perl, portability shouldn't really be too much of an issue.

The following is a list of some of the features of mon:

Monitors: "Monitors" are programs that check for a particular condition, and report success or failure to the server, along with any output. They are independent of mon, so to add a test for a new service, you can just write your monitor in any language, put it in the monitor directory, and it just works.
Asynchronous Events: Support for asynchronous events communicated to the mon server. This will be open-ended, like the monitor and alert scripts, so that you can trigger on anything. One obvious use is acting on SNMP traps.
Alerts: "Alert" scripts send a message or otherwise act on a failure that mon detects. These alerts, like the monitors, are not part of mon, and are easy to add. "Upalerts" are also supported, which are used to trigger an alert when a server comes back up after being down for a long amount of time.
Failure Handling: Failure of any monitor can trigger any (and multiple) alerts, to different people at different times. You can effectively construct "on call" schedules using this feature. For example, you can send a page to all system administrators if a resource goes down before 8PM, but after 8PM, page only Joe, but send email to everyone else.
Parallelization: Parallelizes the checking of services on different hosts or groups of hosts. For example, pinging your routers can happen while it is also pinging your WWW servers. There's no queue that can postpone the scheduled testing of other services.
Repetitive Alert Supression: Repetitive alerts can be supressed. For example, only send email once an hour if a service continues to fail. As an option, small, transient failures of a service may be ignored.
Dependencies: Inter-service dependencies and even correlation. For example, if the router between the monitoring host and your WWW server is down, HTTP won't work, so only send an alert that the router is down. This prevents the cascading of zillions of alerts that happens when some critical resource is not accessible. Dependencies can be understood as a hierarchical form (a tree), and when a failure occurs, the tree is traversed towards the node which has no unresolved dependencies.
Flexible Configuration: A very flexible (and extensible) configuration file. Hosts can be grouped together, and each host or group can have multiple services. Have a look at an example configuration file.
Client/Server Model: Has interactive command-line, WWW-based, and SkyTel 2-Way alphanumeric pager-based clients that query the server for status and history. The protocol is simple, and it is very easy to make clients of your own. Authentication is supported along with per-user access control.
Run-time Alert Acknowledgement and Disabling: A service failure can be acknowledged so that alerts are surpressed until the problem is fixed. This "ack" state is retreivable from the client interface so that users can see that support staff are working on the problem. Also, Alerts for particular hosts, groups, or services can be temporarily disabled an re-enabled by the client, without stopping and restarting the server. If you're upgrading a particular server, you can disable the alert while you're doing the work, and re-enable it when you're done.
History: Keeps a historical list (queried by the clients) of both failures that were detected and alerts that were triggered.
Portability: Nothing to compile for the server or clients, and written in 100% Perl 5. This should help portability.

trockij@transmeta.com