Monitoring your Log Monitoring Process

Published: 2011-11-19. Last Updated: 2011-11-19 22:30:53 UTC
by Kevin Liston (Version: 1)
1 comment(s)

A review of this year's diaries on Log Monitoring

We Write a lot about Log Monitoring and Analysis. Some recent entries that focus on log analysis:

Monitor your log submissions

Today I wanted to focus on point 5 in Lorna's overview "Logs - The Foundation of Good Security Monitoring."

"Monitor your log submissions"

How do you know are still getting the logs you asked for, at the level you want and nothing has changed? This is probably one of the toughest areas and the one most often overlooked My experience has been that people hand folks an SOP with how to send their logs, confirm they are getting logs and then that is it. I have to say this is tough, especially in a large organization where you can have thousands of devices sending you logs. How do you know if anything has changed? How can you afford not to when so much is riding on the logs you get?

As she points out: when you have many devices submitting logs, how do you know you're getting all of the logs that you should?

Feed Inventory

You have to have a list of what feeds you are collecting. Otherwise you can't tell if anything is missing. Any further effort that doesn't include an inventory is wasted.

The inventory should track:

  • the device
  • the file-name/format that it's expected to deliver
  • real-time or batch delivery
  • delivery period (e.g. hourly, daily)

If you maintain this level of detail, then you can create a simple monitor system that populates two more fields:

  • last delivery size
  • last delivery time/date

Once you are collecting this simple level of detail, you can begin to start alerting on missing files very easily. Just periodically scan through the inventory, and alert on anything that has a delivery time that's older than (current time - delivery period). For example, you could check every morning and scan through looking for anything that didn't drop overnight. This will work for a small shop running a manual check. Another environment may sweep through hourly and alert after 2 hours of silence.

This will catch major outages. If there's a more subtle failure where files are arriving but contain no real entries, you need to inspect the delivery size. Granted, a simple alert on 0-byte files will be effective. I recommend that you have such an alerting rule in place. I've seen some instances where a webserver was delivering files that contained only the log header and no values. This could indicate that traffic isn't getting to the webserver, which is something you might be interested in detecting.

A quick aside about units

I know I mention delivery size, but one could measure on any unit that you're interested in. A few that may suit your application:

  • File size
  • Alert/Event count (think IDS or AV)
  • Line Count

You'll want to capture the unit in your inventory, since it's unlikely that you'll use the same unit for every feed (except for File size perhaps.) I'll continue referring to file size below, but keep in mind you could substitute any other unit.

Should you maintain a history?

Since you're capturing the time and size of each feed, it's probably tempting to simply keep a history of each feed. If you can afford it, I recommend it. This will allow you to go back if you need to and visualize some events better. You could make a pretty dashboard of the feeds and allow your eye to pick out issues. But what if you have several hundred feeds?  We'll see below that while a history is nice, it's not required.

Simple Trending

Through the addition of one more field to the inventory, you can begin to track a running average of the delivery size. We'll call this the "trend" value. When you are updating the inventory, just take an average of the current delivery size with this trend value: (trend + current size) / 2. You can then set up alerts that compare the trend with the current size. For example a quick rule that will detect sudden drops in size is to alert when the current size is less than 75% of the trend value.

More than a few readers will spot that this is simply a special case of Brown's simple exponential smoothing (http://en.wikipedia.org/wiki/Exponential_smoothing) where I've set the smoothing factor to 0.5. If you want to tune your monitoring rules by playing with the smoothing factor. Replace my simple average above with:

new_trend <- (smoothing_factor * old_trend) + ((1 - smoothing_factor) * current_size)

 If you're keeping a history of your feeds, you can experiment with the smoothing factor by plotting the size history against this smoothed version.

What about cycles?

It's almost certain that what you're logging is affected by human behavior. Web logs will have an ebb and flow as employees arrive to work, a burst around lunch time perhaps, or a lull on evenings and weekends. Your log feeds are going to have similar cycles. Smoothing will account for the subtle changes over time and depending on your smoothing factor and the variance of your log sizes it may respond well. But it may likely create a lot of false alarms notably on Saturday morning, or Monday morning as there are larger shifts.  Expect a burst of alerts on Holidays.

Depending on your sample period (e.g. daily, hourly) you'll see different cycles. A daily sample will hide your lunch rush, so you'll likely see a 7-day period in your cycles. An hour sample will expose not only weekend lulls, but also highlight overnight spikes cause by your back-up jobs.

You can address this by adding additional fields to trend the data. If you want to track the 7-day cycle, keep a trend value for each day of the week. If you want to account for hourly changes, add another 24 for each hour. So at the cost of 30 more fields you can have a fairly robust model for predicting what your expected size should be.

Prediction?

Yes, I said prediction.

If you're going through the trouble to track those 31 values, and create a 3rd-order exponential smoothing function, you can let it run for a few more steps to predict the expected value of the delivered size. you can now alert on cases where the predicted value is widely different from what was actually delivered. This is effective anomaly detection.

You'll have to tune the smoothing factors and seasonal factors for each class of feed-- since I suspect that values you use for web proxies might not match what works best for IDS logs. But they should be consistent for general feed types. Once these values are set, your monitoring will not need any more tweaking, it will "learn" as time goes on and more samples are feed into it. And you'll have a system that will alert you when it "sees something odd." With a thousand feeds, having something point you to just the interesting ones is pretty valuable.

Don't Forget Content Checking

While I focused mostly on volume (mainly because it's universal to log feeds) do not ignore the content of the log files.  While monitoring line counts and file sizes will catch outages, it will miss other logging errors that can cause a lot of trouble down-stream in the monitoring and analysis process.  When you ingest the logs into your monitoring system it should check that the logs are in the correct/expected format.  It can be a real headache when you realize that the log format changed on system and your no longer getting a critical field, for example: the server IP address in a web proxy log.

Keywords: logs
1 comment(s)

Comments

The exponential smoothing you used in the Simple Trending section is know to DSP types (Digital Signal Processing) as an instance of an IIR filter (Infinite Impulse Response). Such a filter is easy to implement, but suffers from the fact that a large deviation in a single sample can have a major effect on a large number of subsequent output values. The FIR filter (Finite Impulse Response) is a weighted average of the last n samples, and therefore is guaranteed to loose the effect of an out-lier value after n samples have been processed. So for your log monitoring example, instead of (trend + current_size)/2, you would compute s[t-k]/n + s[t-k+1]/n + ... s[t]/n which gives the running average of the previous n samples. This filter does not suffer from the "integrator wind-up" problem that the IIR filter can exhibit.

Diary Archives