Common Failing of Current IT Event and Fault Management Systems
A common feature of current generation IT Event and Fault management systems is that you have to encode into the configuration of the system, a knowledge or representation of the logic that you use to manage the network, or device, or application. For example, a typical system management scenario is where you have a series of servers, routers and applications that you poll for specific data, which you then place conditions on to test for exceptions. Each exception can generate an alert that triggers actions.
All pre-existing approaches in legacy IT event and fault management systems prior to RiverMuse, encode the logic in a heavily scripted way, or, require detailed understanding of the topology or configuration of the managed system integrated in with the rules you write.
For example, in a typical probe rules file, to add an entry you need to include specific information, i.e. an IP address, for every device you want to ping. In others, you can consume an entire topology and output a code book that pattern matches the alert streams looking for particular root causes.
From a RiverMuse perspective, either of these approaches results in it being difficult to alter the configuration when the underlying network changes. In addition, if you want to collect all the intellectual property that has gone into building the business logic, and take it to a completely different network that is configured in a different way, i.e. different hostnames and IP addresses – this can also be complicated.
Hence, the work of configuring a IT event and fault management system becomes a consultancy driven exercise. You have teams of skilled people, with a deep understanding of the management system being used, and the systems or networks being managed, who execute something akin to a software development exercise.
The result is often a single purpose, environment specific configuration or a particular system or network. If you want to replicate the functionality, or solution elsewhere, you have to start over. If the network changes, you have to repeat large parts of the original exercise. All of these issues RiverMuse terms static and non transportable business logic.
Filed Under: Technology • blog
Static and non transportable business logic is another reason Open Source software is a big threat to the CA, BMCs of the BSS/OSS world, the open source community will make sure that Open Source software like RiverMuse has a faster time to value and not stuck using legacy technology retained due to historical acquisitions. Reducing costs is not the only driving force, increasing the IT business value to the business, by providing a service faster to the internal market is a key driver, being transparent on where the IT $ are being spent in relation to deliverables is also a high priority. It’s not just about reducing costs, BSS/OSS can be shown to increase value to the business, historically this type of ROI has been sales dept driven, the open source community will provide a much need sense of realism to ROI claims. Organisation that embraces open source will profit from many areas more than just reducing costs many of which will translate to visible additional bottom line growth.
[...] Common Failing of Current IT Event and Fault Management Systems: Static and Non-Transportable Busine… – A common feature of current generation IT Event and Fault management systems is that you have to encode into the configuration of the system, a knowledge or representation of the logic that you use to manage the network, or device, or application [...]
Very few ENMS implementations have really implemented KM like it ought to be. Not only does one deal with technical knowledge of what an event or alert really means, they also have to deal with idiosyncrasies of service delivery, customer knowledge, and situation knowledge beyond the scope of an individual alert.
Truth be known, KM is the last bastion of glueware that can make ENMS systems really work. Check out http://www.wiikno.com and Craig’s work on KM while @ Cisco.