A technology blog for The Economist Group IT team

Thursday, April 13, 2006

Splunk > CSI for IT 

Splunk looks interesting. It gobbles up all your log file data, groups entries together and allows you to search that combined data easily.

Here's a scenario. Economist.com customers have reported that they couldn't use some bits of the site yesterday between 19:00 and 19:40 EST. Using splunk we could search for events across all the platforms that make up Economist.com - ColdFusion on six web servers, IIS on six webservers, the firewall, the load balancer and the web servers themselves. Splunk groups events together, so that if log entries signify that an event is ongoing you only see the one entry when searching.

This is a bit like a product I used on DEC kit ages ago that tracked levels of memeory, CPU usage and other counters over time. The neat thing about that system was that it did what it called "auto-correlation". It looked at how the behaviour each of the items being monitored was correlated so that you could see cause and effects (i.e. high extended batch processing time could be caused by high CPU usage which in turn could actually have been caused by increased disk wait times when swapping and on and on...).

Anyway, Splunk adds a bit of typically Web 2.0 goodness in the way that system administrators can tag events and benefit from the knowledge of all other system administrators by being able to see the resolution to similar collections of events online.

So why aren't we using it? Other priorities right now, but one to watch certainly.
Comments: Post a Comment

This page is powered by Blogger. Isn't yours?