Features of mon 0.38.12
mon was developed under Linux, but it is known to work under Solaris 2.5
and 2.6. Since the clients and server are written
completely in Perl, portability shouldn't really be too much of an issue.
The following is a list of some of the features of mon:
- Monitors
- "Monitors" are programs that check for a particular condition,
and report success or failure to the server, along with
any output.
They are independent of mon, so to add a test for a
new service, you can just write your monitor in any language,
put it in the monitor directory, and it just works.
- Asynchronous Events
- Support for asynchronous events communicated to the
mon server. This will be open-ended, like the monitor
and alert scripts, so that you can trigger on anything. One
obvious use is acting on SNMP traps.
- Alerts
- "Alert" scripts send a message or otherwise act
on a failure that mon detects. These alerts, like
the monitors, are not part of mon, and are easy to add.
"Upalerts" are also supported, which are used to trigger
an alert when a server comes back up after being down for
a long amount of time.
- Failure Handling
- Failure of any monitor can trigger any (and multiple) alerts,
to different people at different times. You can effectively
construct "on call" schedules using this feature. For
example, you can send
a page to all system administrators if a resource goes down
before 8PM, but after 8PM, page only Joe, but send email to
everyone else.
- Parallelization
- Parallelizes the checking of services on different
hosts or groups of hosts. For example, pinging your routers
can happen while it is also pinging your WWW servers. There's
no queue that can postpone the scheduled testing
of other services.
- Repetitive Alert Supression
- Repetitive alerts can be supressed. For example, only
send email once an hour if a service continues to fail.
As an option, small, transient failures of a service may be ignored.
- Dependencies
- Inter-service dependencies and even correlation. For example,
if the router between the monitoring host and your WWW
server is down, HTTP won't work, so only send an alert that
the router is down. This prevents the cascading of zillions
of alerts that happens when some critical resource is not
accessible. Dependencies can be understood as a hierarchical
form (a tree), and when a failure occurs, the tree is traversed
towards the node which has no unresolved dependencies.
- Flexible Configuration
- A very flexible (and extensible) configuration file.
Hosts can be grouped together, and each host or group
can have multiple services. Have a look
at an example configuration file.
- Client/Server Model
- Has interactive command-line,
WWW-based, and SkyTel 2-Way
alphanumeric pager-based clients
that query the server for status and history. The protocol is simple,
and it is very easy to make clients of your own.
Authentication is supported along with per-user access control.
- Run-time Alert Acknowledgement and Disabling
- A service failure can be acknowledged so that alerts are
surpressed until the problem is fixed. This "ack" state
is retreivable from the client interface so that users
can see that support staff are working on the problem.
Also, Alerts for particular hosts, groups, or services can
be temporarily disabled an re-enabled by the client, without
stopping and restarting the server.
If you're upgrading a particular server, you can disable
the alert while you're doing the work, and re-enable it
when you're done.
- History
- Keeps a historical list (queried by the clients)
of both failures that were detected and alerts that were
triggered.
- Portability
- Nothing to compile for the server or clients, and written
in 100% Perl 5. This should help portability.
trockij@transmeta.com