User driven UI design is a hot topic at the moment, but is it really the future of user experience? Should technology driven design be completely abandoned?
In answering this question it is important to be aware that these questions are part of a bigger issue. The more general question is whether design should take into account the practical aspects of its application. By looking at this broader question UI design can benefit from the experience of other disciplines.
One argument for design unconstrained by practicalities is that developers suffer from “when you have a hammer every problem looks like a nail”, that is they tend to simply adapt implementations which have been previously used rather than considering usability. The result is often a quick implementation, but poor user experience. There is no need to look to other disciplines for examples of this, simply browsing the internet gives plenty of fodder for this argument.
But if practical aspects are not given due care the resulting design may be difficult to implement, or not withstand the extremes of usage. It could be even worse if implementation is undertaken by those lacking in experience and the design flaw may not be realized until too late. In this case, the experience of other disciplines gives weight to the pracalities aware design proponents. Within IT such designs could lead to abandonment of projects or failures at testing, but visiblilty of this is only at the level of the company in question. In other disciplines the effect is more apparent to the independent observer. For example, many buildings and bridges have collapsed because their design overlooked the physical aspect of potential stresses, be it the action of the wind on the narrow Galloping Gertie or the collapse of buildings when earthquakes strike.
Clearly, designs that do not consider the practical limitations, from what is physically possible, and the effect of edge usage must be avoided. Yet poor usability it not acceptable. Both the user experience and the realities of application within the discipline must be considered during design. In UI design terms both user driven and technology driven design have merit, but they lie at opposite extremes. The compromise is to be aware of limitations of what is possible technologically, while applying usability principles, without forgoing an awareness of the extremes of usage.
Step One: Monitoring free space in a file system
Let’s assume your key corporate documents are all stored on a file system that is on a disk in the corporate SAN. A process is in place to monitor the free space on the file system at regular intervals. When the monitoring process recognises that there is less than 10% (of space) available, it posts an event (to RiverMuse).
If you read my previous blog, using the bird table analogy, this would be the event generated by the web cam creating a video file.
RiverMuse then processes this alert for you, triggering a series of commands causing the disk on the SAN to be enlarged and subsequently, the file system grown to take advantage of this newly allocated space. Once this process is complete the system generates an event to mark the completion.
In the bird table analogy, this relates to the file conversion activity
Step Two: Reporting and recording of events
The completion message causes the following to happen:
1) The space low message is closed.
2) A message notifies the CMDB system that SAN resources have been reallocated.
3) A ticket is created in the helpdesk system, to prompt review of what triggered the expansion.
Relate to bird table story again
If the space warning alert has a count of 2 or more then the external trigger is not fired, as an expansion process has already been initiated.
Our real world example has other business logic, including:
· If free space is OK, we still generate an even, but this time it’s a positive event, RiverMuse uses this to reset the clock on its silent failure protection rules. In the previous bird table example this could be an event based on the absence of a video upload, with the time frame based on the time of year (to account for the hours of darkness).
· Likewise if the space low message has a count that exceeds a predetermined amount, an alarm is triggered to check the file system expand process, against its silent failure. Again the analogy on the bird table is that the video conversion process has failed.
· Finally if the space drops to below 5%, before the expansion is complete, then a further external action is triggered to obtain human interaction as the file system is growing at a potentially unmanageable rate.
All of this can be bundled into a package, using RiverMuse’s transportable business logic, and applied to all the individual file systems and disks across the network. As business logic is updated, or new file systems are added, the changes will be seamlessly propagated through the system, giving you the peace of mind that routine events are managed by the system, leaving you to get on with managing the exceptions.
Twice last week I sat in a meeting where there was a casual remark that seemed trivial in nature. The remark went something like this – “I have to look at my spam folder, I registered for the community but didn’t get the confirmation email”. Two people saying this within days of each other caught my attention. After a little investigating (helped by the fact we knew the email addresses of the two people) we discovered about 12 RiverMuse Community membership applications had arrived by a non-standard path sitting in a queue requiring manual authorization. Of course we fixed the problem and contacted each of the community members who had been in limbo.
This is an example of failing silently, a key concept that we set out to address and mitigate when architecting RiverMuse. We thought we had our registration process clearly defined, simple and effective with no human intervention and with frequent automated checks that the application and SSO manager were working correctly. Reporting shows that membership is growing on a daily basis. What could go wrong…..
In the even more complex world of fault management platforms that have rules based workflows, the opportunities to fail silently increase with the scale of the network and systems being monitored. An automation rule on a central server looks for a particular string or identifier but for this event to get this far there is often another rule engine sitting at the element management or probe level. A simple error in rule creation, or an alteration of the string at this layer may well lead to an inability to process or recognize an event as significant enough to create an alert or undertake a trigger action. Now move forwards to the new challenges of virtualization and grid or cloud infrastructures where critical events and services can be nomadic in nature and what was important yesterday is non-critical today or vice versa.
In my own real world example above I had no idea that it was possible to register using an alternative route that we had not planned, not knowing meant that for those 12 people the process had failed, and but for the chance conversation and a similar statement in an email we may never have realized, because all appeared to be well. This is failing silently. In my next post on this subject I will talk about the way we have designed RiverMuse to avoid many of the pitfalls that can lead to this type of scenario.
Meantime I would be interested to hear if you have any experiences of failing silently.
We’re running a competition at the moment to come up with the most novel use of RiverMuse, see here . This got me thinking back to some of the early uses of the web, such as ensuring a good supply of coffee. I was wondering what I could come up with, on these lines, to demonstrate the flexibility of the system. So here goes…
In my hypothetical system, a web cam watches a hypothetical bird-feeding table in my garden. Whenever the web cam detects motion, it records this as an AVI file, and posts this in a directory on my home PC (web webcam only works with Windows). The network access by the webcam is recorded, in the windows event logs, and this comes into RiverMuse Core as an event. Omosd processes this event and de-duplicates it, which is important, as the next step in my process can only handle one event at once.
If the webcam alert has a count of 1, then yarpd executes an external command to process the AVI file into a low bandwidth format and upload it to a website. When this is done the conversion program sends a syslog message back to RiverMuse. This causes a number of events to happen:
- The webcam alert is closed.
- I run a shell command to tweet about it so all the fans of my bird table will know there is a new video to watch.
- I fire an alert back into RiverMuse which contains the date in it and omosd de-duplicates this so I can have a filtered view of alerts that show me how many videos got uploaded in a day.
If the count of the webcam alert is more than 1 then yarpd executes an external script to delete the extra video clips, as my conversion process can’t handle more than one at a time.
As a phase 2 I’m going to processes the web log’s so I can see which video’s are the most watched.
RiverMuse Core 3.4.4 is now available for download. The new release includes fixes and some additional functionality. The details are available in the release note posted at: http://www.rivermuse.org/display/RIVERMUSE/RiverMuse+Core+3.4.4+release+notes
Open Source specialist Roberto Galoppini reviews the background to RiverMuse’s Community and how it hopes to grow both as a service management development and practitioners forum. Read the article…
One month after the launch of our open source event and fault management platform we want to hear how you are using our software and what it is that you are managing with it.
We are offering one lucky community member the chance to win an Apple IPHONE 3GS or HTC Hero Android phone (your choice). All you have to do is tell us how you are using the RiverMuse software and what it is that you are monitoring with it. We are looking for the most interesting and unusual customisation here, so let your imagination run wild in how you can deploy the system to monitor your coffee machine, coke dispenser, highway toll booth or aircraft departure delays! The sky is the limit with our flexible event capture and rules engineering. So show us what you can do!
Check out the rules here.