0 Comments

A very common task in every enterprise environment is to make sure that applications installed in production are running and functioning properly. For example, production windows service can go down, can start eating CPU, or might simply never restart after scheduled server patch. It happened to me more than once.

How would you monitor this? Every monitoring program that I looked at was expensive, complex, and naturally required remote access with elevated permissions. Here is a problem with this. In a large enterprise environment, I as a developer understand best how my service should work, but I do not have any access to production environment. Production support folks have access but have no clue about the service. I can schedule a window of time working together with production support to set something up but it has to run locally at the server like an Agent. This way you do not have to worry about remote access.

I wanted something simple and functional, i.e. in case I added some new item to monitor I did not want to schedule new installation. So I came up with a simple console application based on MEF that can be started by Windows Scheduler on a timer. In the core of it are just two basic foreach loops and two groups of components, Checks and Actions. The main program simply reads the directory, finds every check component, and runs it, if something falls outside the range specified in the configuration of the component the main program will pass it to the second group. Therefore the second foreach will loop through every Action taking an action on the alert, like sending an email.

 

Alerts

Here is the code:
First, the interfaces for the MEF

public interface ICheck
{            
    IEnumerable Inspect();
}

Every component that wants to check on something has to implement Inspect function and will return a list of Alerts, in case check fails.

[Serializable]
public class Alert
{    
    public string Message { get; set; }    
    public string Source { get; set; }    
    public string StackTrace { get; set; }    
    public string Target { get; set; }    
    public string Status { get; set; }
}

And every component that takes an action will have to implement PerformAction function.

public interface IAction
{        
    void PerformAction(IEnumerable messages);
}

Main program is pretty much this:

[ImportMany]
public IEnumerable CheckSet { get; set; }
[ImportMany]
public IEnumerable ActionSet { get; set; }
///
/// Main run
/// For every Check dll located in the same directory runs Inspect function
/// for every message collected from inspection runs alert action
///
public void Run()
{
    Log.Debug("Begin checking");
    var catalog = new AggregateCatalog();
    catalog.Catalogs.Add(new DirectoryCatalog("."));
    var container = new CompositionContainer(catalog);
    container.ComposeParts(this);
    var messages = new List();
    foreach (var check in CheckSet)
    {
        try
        {
            var checkMessages = check.Inspect();
            messages.AddRange(checkMessages);
        }
        catch (Exception ex)
        {
            messages.Add(new Common.Alert
            {
                Message = ex.Message,
                StackTrace = ex.StackTrace,
                Source = "Inspector"
            });
            Log.Error("Error running the Check", ex);
        }
    }
    foreach (var action in ActionSet)
    {
        try
        {
            action.PerformAction(messages);
        }
        catch (Exception ex)
        {
            Log.Error("Error while performing the action", ex);
        }
    }
    container.Dispose();
    Log.Debug("End Checking");
}

As you can see MEF will locate two sets of dlls and will simply call interface functions. What you want to check and how you want to report the Alert is pretty much up to you, and it can change at any time. All you have to do is to drop new components into the directory or replace the existing components.
Here is an example of the most common Check components, for checking windows service status.

[Export(typeof(ICheck))]
public class CheckService : ICheck
{
    #region ICheck Members
    ///
    /// Inspects if windows service is running on that machine
    ///
    ///
    public IEnumerable Inspect()
    {
        var messages = new List();
        Assembly assembly = Assembly.GetExecutingAssembly();
        Configuration mainConfig = ConfigurationManager.OpenExeConfiguration(assembly.Location);
        var config = mainConfig.GetSection("servicesToCheck") as ServiceSection;
        if (config == null)
            return messages;
        foreach (ServiceElement queue in config.Services)
        {
            var mySc = new ServiceController(queue.ServiceName);
            try
            {
                string status = mySc.Status.ToString();
                if (status != queue.ServiceStatus)
                {
                    messages.Add(new Common.Alert
                    {
                        Message = "Alert: Service " + queue.ServiceName + " has status " + status,
                        Source = "CheckService",
                        Status = status,
                        Target = queue.ServiceName
                    });
                }
            }
            catch (Exception ex)
            {
                messages.Add(new Common.Alert
                {
                    Message = "Service not found. It is probably not installed. [exception=" + ex.Message + "]",
                    StackTrace = ex.StackTrace,
                    Source = "CheckService",
                    Target = queue.ServiceName
                });
                throw;
            }
        }
        return messages;
    }
    #endregion
}

The code at the start of the function is for keeping configuration of the component together with the dll. I did not want to maintain huge .config file for all the checks and alerts. So every dll that is dropped in the directory whether it is from Check group or Action group should come with its own configuration file. This way configuration for the main program is simple and only contains log4net settings and nothing more.
This is what configuration for CheckService looks like:


    
        

So this is pretty much it. You set a windows scheduler to run every 5 – 10 minutes, and if your windows service goes down you will know about it. In my environment I use CheckService, I have one called CheckQueue for checking how many messages are sitting in MSMQ. I also have CheckPerfCounter where I can configure to check on any performance counter at the server. But the main point is that those are easy to write and drop into the directory. And you can create custom components based on what you need in your enterprise environment.


Download Sample Code: