RiverMuse harnesses the power of open source technology to produce a truly 21st century management platform.

— Michael Barnwell, Senior Software Engineer, RiverMuse
Welcome to the RiverMuse news blog
Here, we aim to provide commentary and news on topical subjects. Your opinions and thoughts are important to us, so feel free to become an active part of our community and add your own comments.
IT Event and Fault Management Industry Ready to Shift

Over the past five years we have witnessed an acceleration of open source software (OSS) adoption by both service provider and enterprises alike. As Gartner stated in a November 2008 report, the primary advantages for customers adopting OSS tools were lower cost of ownership, ease of implementation, investment protection against a single vendor and faster time to market.

These key advantages have held true for the open source industry time and again, enabling it to gain significant market share across the entire IT stack, from operating system to middleware to tools to business applications and across enterprises and service providers (see the discussion on this subject on the microSperience blog). In fact, in the Network and System Management (NSM) space, there is already a wide selection of open source monitoring tools that have gained a broad following.

(more…)

Driving Innovation in Event and Fault Management – January Survey
This is our January RiverMuse poll.

The poll results archive will be stored within the RiverMuse Community pages.

What is a MoM (Manager of Managers) in an IT Environment?
Posted by: Ian Best on January 15th, 2010
Filed under: Market, Product, blog
Tags: , ,
No Comments »

The concept of Manager of Managers (MoM) has existed for many years but does it mean the same today as it always has?

MoMs were introduced to overcome the problem where lower order management systems e.g. Element Management Systems (EMS) gave a very fragmented view of the network, leaving it up to operations staff to piece together the puzzle to form a picture of the complete network and its current status. This picture often only existed in the mind of the user and the detail of that picture dependent on the individual’s experience. Correlation of events across the diverse ‘stove pipe’ solutions was primarily a visual correlation on the part of the user.

The introduction of MoMs enabled Network operations staff to pull together management information into one central point. Thus providing a single integrated view of the entire network and enabling the introduction of automated correlation systems. This is why MOM is sometimes referred to in some circles as the ‘single pane of glass’. While the concept was fine, in the early days the practice was somewhat limited by the lack of integration capabilities supported by the lower order systems. Standard interfaces and APIs were few and far between. It probably wasn’t until the development of standards such as the Simple Network Management Protocol (SNMP) that MoM capability became a reality. With the advent of web services enabling more federated integration between individual management systems this has received further impetus.

Today we see the adoption of MoM concepts being widely used, although ironically the term itself seems to have faded from our common vocabulary. While the term ‘MoM’ has traditionally been associated with consolidation of lower order systems into the Network Layer, the same principles are now being applied for managing systems, applications, services, customers or business units etc. Effective management at these higher levels still depends on the collection of data and information from the underlying systems. So the deployment of MoMs continues to gather pace and indeed some existing network centric MoMs are being re-positioned for managing at these higher layers with varying degrees of effectiveness. Undoubtedly, we will see wider adoption of Service Management Systems, SLA Management Systems, Business Management Systems and so on, each a new generation of MoM in their own right. For example, the BSM (Business Service Management) dashboard is a type of MoM albeit designed less for real time operations support than for broader business and technology alignment.

The question remains whether any of the current crop of MoM type technologies is ready to take on the mantle for real time dynamic infrastructure support? More about that in a later blog.

Ian Best


Driving Disruptive Innovation in IT Event & Fault Management

On this community page you can view the latest RiverMuse slide set offering an overview of the company, the event & fault management landscape,  our objectives, architecture and key benefits.  This is in Slideshare format and can be shared and copied.  You can also post comments and questions directly on the page.

Why mega IT vendors’ infrastructure push is a boon for RiverMuse

A recent ComputerWorld article entitled “Big IT is Back, Say HP, IBM, Oracle, EMC, Cisco” discusses the recent formation of the Acadia joint venture between Cisco Systems, EMC and VMware to provide a complete virtual computing environment for enterprises. This alliance pits itself against similar initiatives from HP, IBM and Oracle aiming to offer a complete virtualized infrastructure stack to enterprises. The central argument is that a single-source integrated stack increases productivity in IT operations, strips cost down and offers a higher level of agility. We at RiverMuse agree that a dynamic IT infrastructure based on a strong virtualization fabric is the path forward for IT. Clearly the mega vendors listed above along with Microsoft are best positioned to sway enterprises over for adoption and that’s a good thing.

However when it comes to managing such abstracted and complex fabric their management platforms fall short and the additional vendors in the space including CA, BMC and Compuware seem severely challenged as well due to a lack of genuine innovation. The systematic recourse of all the large vendors for years, even decades has been an acquisition strategy whereby they absorb products, knowledge and engineering talent and swiftly shift their focus from innovation to integration, touting “synergies” as the new panacea.

As the world of IT embraces more advanced computing and networking environments there is a parallel requirement put onto the management platform to harness them effectively and efficiently.  Event and fault management, which is at the very core of any infrastructure and service delivery management offering has fallen prey to this rash of acquisition/integration and witnessed little innovation for a decade. Legacy fault management tools from IBM, CA, HP and EMC are no longer adequate as they were built for a more static world, with a low rate of change, no virtuaiization and a single focus on networks and network elements rather than on a broader computing infrastructure and the IT service delivery chain. Fault management is out, event management is in as it captures, processes and renders all signals coming from the infrastructure whether be network, compute, middleware, service or security-related.

Hence RiverMuse was born out of necessity to address the event and fault management challenges of a dynamic IT infrastructure that mega vendors have so long ignored. Genuine innovation is back and it’s not just technical. See for yourself.

JL Valente


Your IT Infrastructure…… Under New Management
Posted by: Phil Blades on November 5th, 2009
Filed under: Product, Uncategorized, blog
No Comments »

Thursday November 5th 2009, a landmark for RiverMuse – so for those of you who are in the UK today, you may see fireworks and bonfires, as we celebrate the commercial launch of RiverMuse.  So from today we have a shiny new website www.rivermuse.com and a commercial support and professional services side to our opens source software (OSS).  The core software has undertaken a major revision over the past 3 months and the binaries for that are also available for immediate download for RHEL 5 (or Centos).

Surf the corporate site – see if you recognise anyone in the photos, download our white paper and data sheet, or dive into the community, which still has the same great features, but has also had a bit of a face lift to match the style of dot com.

As ever your feedback is welcome, post on our forums or here on the blog.

User driven versus technology driven UI design
Posted by: Irene on September 28th, 2009
Filed under: Technology, blog
No Comments »

User driven UI design is a hot topic at the moment, but is it really the future of user experience? Should technology driven design be completely abandoned?

In answering this question it is important to be aware that these questions are part of a bigger issue. The more general question is whether design should take into account the practical aspects of its application. By looking at this broader question UI design can benefit from the experience of other disciplines.

One argument for design unconstrained by practicalities is that developers suffer from “when you have a hammer every problem looks like a nail”, that is they tend to simply adapt implementations which have been previously used rather than considering usability. The result is often a quick implementation, but poor user experience. There is no need to look to other disciplines for examples of this, simply browsing the internet gives plenty of fodder for this argument.

But if practical aspects are not given due care the resulting design may be difficult to implement, or not withstand the extremes of usage. It could be even worse if implementation is undertaken by those lacking in experience and the design flaw may not be realized until too late. In this case, the experience of other disciplines gives weight to the pracalities aware design proponents. Within IT such designs could lead to abandonment of projects or failures at testing, but visiblilty of this is only at the level of the company in question. In other disciplines the effect is more apparent to the independent observer. For example, many buildings and bridges have collapsed because their design overlooked the physical aspect of potential stresses, be it the action of the wind on the narrow Galloping Gertie or the collapse of buildings when earthquakes strike.

Clearly, designs that do not consider the practical limitations, from what is physically possible, and the effect of edge usage must be avoided. Yet poor usability it not acceptable. Both the user experience and the realities of application within the discipline must be considered during design. In UI design terms both user driven and technology driven design have merit, but they lie at opposite extremes. The compromise is to be aware of limitations of what is possible technologically, while applying usability principles, without forgoing an awareness of the extremes of usage.

Another RiverMuse integration benefit example
Posted by: Chris Needham on September 23rd, 2009
Filed under: Product, blog
No Comments »

Step One: Monitoring free space in a file system
Let’s assume your key corporate documents are all stored on a file system that is on a disk in the corporate SAN. A process is in place to monitor the free space on the file system at regular intervals. When the monitoring process recognises that there is less than 10% (of space) available, it posts an event (to RiverMuse).

If you read my previous blog, using the bird table analogy, this would be the event generated by the web cam creating a video file.

RiverMuse then processes this alert for you, triggering a series of commands causing the disk on the SAN to be enlarged and subsequently, the file system grown to take advantage of this newly allocated space. Once this process is complete the system generates an event to mark the completion.

In the bird table analogy, this relates to the file conversion activity

Step Two: Reporting and recording of events
The completion message causes the following to happen:
1) The space low message is closed.
2) A message notifies the CMDB system that SAN resources have been reallocated.
3) A ticket is created in the helpdesk system, to prompt review of what triggered the expansion.

Relate to bird table story again

If the space warning alert has a count of 2 or more then the external trigger is not fired, as an expansion process has already been initiated.

Our real world example has other business logic, including:
· If free space is OK, we still generate an even, but this time it’s a positive event, RiverMuse uses this to reset the clock on its silent failure protection rules. In the previous bird table example this could be an event based on the absence of a video upload, with the time frame based on the time of year (to account for the hours of darkness).

· Likewise if the space low message has a count that exceeds a predetermined amount, an alarm is triggered to check the file system expand process, against its silent failure. Again the analogy on the bird table is that the video conversion process has failed.

· Finally if the space drops to below 5%, before the expansion is complete, then a further external action is triggered to obtain human interaction as the file system is growing at a potentially unmanageable rate.

All of this can be bundled into a package, using RiverMuse’s transportable business logic, and applied to all the individual file systems and disks across the network. As business logic is updated, or new file systems are added, the changes will be seamlessly propagated through the system, giving you the peace of mind that routine events are managed by the system, leaving you to get on with managing the exceptions.

Failing silently
Posted by: Phil Blades on September 21st, 2009
Filed under: Market, Product, blog
No Comments »

Twice last week I sat in a meeting where there was a casual remark that seemed trivial in nature.  The remark went something like this – “I have to look at my spam folder, I registered for the community but didn’t get the confirmation email”.    Two people saying this within days of each other caught my attention.  After a little investigating (helped by the fact we knew the email addresses of the two people) we discovered about 12 RiverMuse Community membership applications had arrived by a non-standard path sitting in a queue requiring manual authorization.  Of course we fixed the problem and contacted each of the community members who had been in limbo.

This is an example of failing silently, a key concept that we set out to address and mitigate when architecting RiverMuse.  We thought we had our registration process clearly defined, simple and effective with no human intervention and with frequent automated checks that the application and SSO manager were working correctly.  Reporting shows that membership is growing on a daily basis.  What could go wrong…..

In the even more complex world of fault management platforms that have rules based workflows, the opportunities to fail silently increase with the scale of the network and systems being monitored.  An automation rule on a central server looks for a particular string or identifier but for this event to get this far there is often another rule engine sitting at the element management or probe level.  A simple error in rule creation, or an alteration of the string at this layer may well lead to an inability to process or recognize an event as significant enough to create an alert or undertake a trigger action.  Now move forwards to the new challenges of virtualization and grid or cloud infrastructures where critical events and services can be nomadic in nature and what was important yesterday is non-critical today or vice versa.

In my own real world example above I had no idea that it was possible to register using an alternative route that we had not planned, not knowing meant that for those 12 people the process had failed, and but for the chance conversation and a similar statement in an email we may never have realized, because all appeared to be well.  This is failing silently.  In my next post on this subject I will talk about the way we have designed RiverMuse to avoid many of the pitfalls that can lead to this type of scenario.

Meantime I would be interested to hear if you have any experiences of failing silently.

The hypothetical bird table
Posted by: Chris Needham on September 12th, 2009
Filed under: Product, Technology, blog
No Comments »

We’re running a competition at the moment to come up with the most novel use of RiverMuse, see here . This got me thinking back to some of the early uses of the web, such as ensuring a good supply of coffee. I was wondering what I could come up with, on these lines, to demonstrate the flexibility of the system. So here goes…

In my hypothetical system, a web cam watches a hypothetical bird-feeding table in my garden. Whenever the web cam detects motion, it records this as an AVI file, and posts this in a directory on my home PC (web webcam only works with Windows). The network access by the webcam is recorded, in the windows event logs, and this comes into RiverMuse Core as an event. Omosd processes this event and de-duplicates it, which is important, as the next step in my process can only handle one event at once.

If the webcam alert has a count of 1, then yarpd executes an external command to process the AVI file into a low bandwidth format and upload it to a website. When this is done the conversion program sends a syslog message back to RiverMuse. This causes a number of events to happen:

  1. The webcam alert is closed.
  2. I run a shell command to tweet about it so all the fans of my bird table will know there is a new video to watch.
  3. I fire an alert back into RiverMuse which contains the date in it and omosd de-duplicates this so I can have a filtered view of alerts that show me how many videos got uploaded in a day.

If the count of the webcam alert is more than 1 then yarpd executes an external script to delete the extra video clips, as my conversion process can’t handle more than one at a time.

As a phase 2 I’m going to processes the web log’s so I can see which video’s are the most watched.