IT Operations Management (ITOM) has become a major industry since the development of the first SNMP management systems and event consolidation alerting tools in the late 1980’s. Originally ITOM was solely the domain of technology in the guise of fault monitoring systems. However over the last 20 years the realm has expanded across Fault, Configuration, Accounting, Performance and Security (FCAPS) to offer a whole of IT Infrastructure suite of management technologies, and then with the proliferation of the adoption of IT Infrastructure Library (ITIL) process guidelines, the IT department has been given IT Service Management (ITSM), the integration of ITIL Process guidelines and ITOM technology platforms to increase the levels of service from their IT Infrastructures.
The practice of ITSM by the management of IT Operations is devoted solely to reducing the cost of managing their IT infrastructures, and cost savings can be demonstrated on paper through the deployment of FCAPS technologies in conjunction with Processes following ITIL guidelines. In 2007, according to Gartner Group, greater than 85% of medium and large enterprise companies have embraced a commitment to implementing ITIL guidelines, and Network2 claims that in 2007 the global spend on network and systems management technology was over $10 Billion dollars.
Despite this global investment in defined processes and supporting technology, when Gartner Group surveyed their customer base at their Data Center conference in November 2007, the need to troubleshoot problems faster and prevent performance problems still came out as the top two issues. The reason troubleshooting problems faster and averting performance issues consistently rank at the top of CIOs wish list is simple to comprehend; the results of IT Infrastructure problems are transparent to their customers.
- NetworkWorld reported that 82% of network problems are identified through users complaining about application performance.
- Apparent Networks demonstrated in their testing that 38% of 20,000 helpdesk issues showed that network issues translated directly to application issue calls.
- Telus Corporation even admitted that 78% of network problems are beyond their control [with today’s IT Operations Management technologies].
So, for an enterprise, the ability to change the perception of their customers by fixing problems sooner and providing more consistent application performance is paramount. This is the domain of the Fault Management technology platform. When a problem occurs, there is a corresponding impact upon the customers’ business operations. That impact carries a cost which is both measurable and real.
The cost of impact to each customer can be attributed to the time taken to resolve the problem. Therefore, if one can resolve the problem more quickly, one can reduce the impact and associated cost of the problem. Problem management has four clear phases:
(i) IT Infrastructure State Changes; the circumstances leading to the problem
(ii) Problem Identification; the knowledge that there is a problem
(iii) Problem Isolation; the isolation of the Root Cause, and
(iv) Problem Resolution; the return to normal operations
There is a fifth phase too which is Problem Impact; the ability to understand what Services and which Customers are impacted by the problem, but this is not the directly impact the cost of a problem to a business, just allows the measurement of the cost, and would allow prioritization of the fault management process – i.e. focus resources on one problem over another. The more quickly the IT Operations team can resolve a problem, the better the perception by their customer.
The application of automation to accelerate the problem identification and isolation process reduces the total time taken to resolve the problem, thereby reducing the cost of the impact of the problem, and so, Fault Management technology, in the form of Event Consolidation, Correlation and Root-Cause Analysis (RCA) solutions represent the major spend in an IT Operations Management budget.
Business, both Telecom and Enterprise, have employed an array of automation technologies to help identify problems sooner, isolate the root cause, and align problem management processes to reduce the time to resolve the issues which include:
- Network monitoring tools;
- HP OpenView Network Node Manager
- Tivoli Netcool Precision
- EMC SMARTS
- BMC Patrol
- Enterprise Event Consoles;
- Tivoli Netcool Omnibus
- HP OpenView Operations
- BMC Enterprise Manager, and
- Service Desk solutions;
- HP OpenView ServiceDesk & ServiceCenter
- BMC Remedy
- IBM MRO
All of these technologies conceived, designed and, developed in the late 1980’s and early 1990’s however still represent the pinnacle of IT Service Management technology innovation and expertize today. In fact, they now represent huge industries in themselves, with a whole supply chain evolved for the design, implementation, configuration and lifecycle maintenance of each tool.
Yet despite the investment in these tools and technologies, the ability for IT Operations Management to resolve problems faster has not eventualized. In fact, the cost of Problem Impact is the same today as it was in the 1990’s.
One thing people don’t tell you about running an open source project is that it takes quite a bit of software to support everything that you are trying to do. For Rivermuse, we’re running a whole slew of applications, many from Atlassian (all still in process of being installed.)
- JIRA
- Confluence
- Bamboo
- Crowd
- SVN
- Gforge
- jForums
And of course all of these applications live atop a Linux box running Apache. You can’t really do this overnight, contrary to popular belief. We’re also feeling a bit of pain of migrating from a set of not-big-enough servers to a new robust set at Contegix.
Nonetheless, everything is moving and soon the force will be with us!
Our Community Edition is currently in alpha* and we are getting some great feedback on both functionality and performance. We hope to have a beta available for download very shortly.
For those of you eagerly awaiting the general release we wanted to give you some insight into the architecture. We will go into more detail on individual features in future blogs.
An OpenMOS based platform offering a scaleable, resilient and secure high performance environment. All data management uses MySQL optimised for the ProCool architecture. Passive data collection is undertaken using Rsyslog and package deployment and automated updates are managed by RPM and YUM.
SOAP abstraction is leveraged to its best advantage alowing rapid integration and transport versatility whilst maintaining platform independence.
We invite all readers of our blog to post comments and questions.
Just because we are busy finalizing the release of RiverMuse ProCool Community don’t think that we have forgotten about our faithful followers.
We’ve done a bit of cleaning up on Rivermuse.com as well so take a look and let us know if you have any questions, comments or concerns.