IT Event and Fault Management Industry Ready to Shift
Over the past five years we have witnessed an acceleration of open source software (OSS) adoption by both service provider and enterprises alike. As Gartner stated in a November 2008 report, the primary advantages for customers adopting OSS tools were lower cost of ownership, ease of implementation, investment protection against a single vendor and faster time to market.
These key advantages have held true for the open source industry time and again, enabling it to gain significant market share across the entire IT stack, from operating system to middleware to tools to business applications and across enterprises and service providers (see the discussion on this subject on the microSperience blog). In fact, in the Network and System Management (NSM) space, there is already a wide selection of open source monitoring tools that have gained a broad following.
And with RiverMuse last year, we have now launched the first and only open source IT Operations Management platform. By our definition, Operations Management includes at its core the disciplines of IT event and fault management. This is the primary platform that operations teams rely upon to keep their services and infrastructure running and healthy 24/7.
The founders at Rivermuse recognized early on that the IT event and fault management market is at the cusp of a major shift. Technology innovation in this market has stagnated for nearly a decade while the infrastructure environment has gone through transformational changes. Existing tools have become overly complex and costly, and return on investment has become questionable. In fact, customers today demand a number of things that legacy vendors are unable to fulfill. These include built-in support for today’s dynamic infrastructures, marked simplification of current management toolsets, significant TCO reduction as required by the new software economics, and the desire of many IT organizations to be liberated from management vendor lock-in. Let’s take a brief look at each of these points.
Most of the legacy operations/fault management platforms were developed to manage network-centric infrastructures that were relatively slow to change. Hence they were built to deal with network-related faults but not much more. Ten years later the entire scenario has completely changed. Today’s infrastructure is abstracted away from the underlying physical assets and governed by policies that relate to business and service priorities. It has evolved to support the need for business agility – new projects can be deployed rapidly, resources can be dialed up or down as needed, and change is easily accommodated throughout a virtualized fabric of computing, storage and networking. The legacy fault management platforms, with their proprietary design, are ill conceived to cope with such a diverse and dynamic environment in an elegant and nimble way.
Second, these management toolsets have extremely rigid architectures built for a different era in computing. Rather than redesigning their products to meet technology and business needs, legacy vendors have responded with bolt-on products from acquisitions or partnerships. Consequently they have multiple user interfaces and programming languages, and do not support standard reference architectures that are needed to fully capture the modern IT infrastructure. This in turn limits their integration and automation capabilities. The number of products and options presented by a legacy vendor to a typical IT organization is a bewildering and convoluted list that only the largest customers can afford. Yet, there is no reason why IT Operations Management platforms can’t be simpler. Vendors like VMWare have delivered simplicity in complex environments of their own. To meet market demands, NSM tools must facilitate agility in IT operations, not get in the way as they often do today and that requires a complete change of attitude and engineering design.
Third, the overall solution cost of legacy operations / fault management tools is prohibitive, particularly in the areas of support and maintenance and ongoing administration. How is an IT organization to achieve ROI on the top of these exorbitant costs? For example, IBM Tivoli Netcool, the most widely deployed enterprise fault management solution requires months of setup, has minimal configuration automation capabilities among other limitations – all contributing to push its TCO through the roof. This is a losing proposition. As more enterprises, Tier 2 and 3 service providers, and managed service providers recognize the need to adopt a new class of IT Operations Management tools, they mandate a level of affordability that only a fully functional open source platform like RiverMuse can deliver.
RiverMuse arrived and will meet market demand because IT organizations have asked for change. The IT world evolves quickly and these organizations simply can’t afford to keep waiting for promised features and integration that never make it past the roadmap slides of many IT vendors. Too much ‘lock-in’ power rests with the legacy vendors. It is time to shift the balance of innovation and control from a few oligopolistic vendors to the many practitioners, and end users who ultimately know best what IT operations management capabilities they really need and when. It’s time for RiverMuse – and its open source roots to shake up the status quo and move the needle forward.