Several Management tools have an ability for the end user to suppress an alarm or a device but not every management tool provides this feature. By suppress I mean to click something in the native console and make the alarm/device go quiet for a period of time. One example of this is an alarm showing up that a server is down. The first level operator knows that a DBA is working on a problem right now and the server is being rebooted a few times to test out some configuration changes. The operator suppresses the alarm because it is a known outage and not a concern right now.
When you integrate your tools with Operations Center, there is a higher suppress needed. Suppressing a single server is useful, but being able to suppress a Service and/or all of the technologies supporting it is valuable. Suppose you have maintenance windows setup to perform upgrades and fixes. The Operator can suppress the Service (IE: EMail, CRM, etc) in order to ignore any errors identified during the maintenance window which are most likely invalid. One example would be that when servers supporting the Service are rebooted, you do not want tickets automatically opened in the Help Desk tool or have an outage counted against the SLA.
Operations Center provides a Suppress feature within the product. By default the Suppress Subsystem is turned off. There is an example script in the database\scripts\commands directory to turn it on (suppress.fs). The recommended way to enable it is to set up a scriptOnStarted for the server so it automatically starts. (hint: Adapters.ini, [Formula] topic, scriptOnStarted setting)
Once you have the Suppress Subsystem enabled, you will now have the ability to suppress an element or alarm, there is even an Acknowledge capability. Instructions for setting this up is contained in the Configuration Guide section 11.6.
For those with advanced requirements, you can set up Jobs in Operations Center to read in your scheduled maintenance windows from a flat file or a database and pre-suppress items ahead of time. Java scripting is required, but I am aware of a few customers in the past who have done this.
My point on this blog was not to go into significant details on the Suppress Subsystem, it was more about visualizing the items that were suppressed. I figured I needed to provide a lead in, hopefully it is enough details to convey the value and features.
Assuming you are leveraging (or plan to leverage) the Suppress feature, the next step is to provide an easy way to do checkups on what is currently (or even planned) to be suppressed. With about 30 minutes of work, I was able to configure BDI (Data Integrator) to pull in details on items that are suppressed. I then set up the Element Properties Table portlet in the Dashboard to provide a tabular view of each item surpressed, who did it, when and when each expires along with any notes associated to the individual suppression.
Tips for setting this part up.
- Configure a BDI adapter to connect to the Operations Center database.
- Create a folder in the defintion called something like: Suppressed Items
- Add a DBElement underneath the folder
- Set the query to pull back the rows from the mossuppress table.
- You can apply a where clause to only pull back the currently active (ie: not the pre-scheduled ones) by stating: isactive = 1 in the "where clause". (You could even do one break out for active and one for pre-planned, maybe even do breakouts based on the type of supression such as server, application, service, etc)
- I used standard sql functions to break up the dname in order to show the name of the suppressed element. The short name is not part of the suppress record in the table, just the longer dname, so substrings() are required to get the shorten name.
- I also used standard sql functions to break up the dname to make the elementClass dynamic to show the appropriate class name/icon. IE: For postgres, I used this for pulling out the classname: substring( dname from 0 for position( '=' in dname ))
- I created a property page for each of the suppress columns (IE: whoclosed, whoopened, isactive, endtime, etc).
- I set a schedule to requery every 10 seconds so I can see if it works. For production, several minutes,or maybe on the half hour might be adequate, it really depends on your maintenance schedules, windows, process, etc.
- Also had to setup (on the core adapter definition) to remove elements not returned in the query (IE: elements that expire and are no longer suppressed should automatically get removed from this folder).
- After that, it was a matter of deploying the adapter.
- I then set up a Element Properties Table portlet to point at that folder, I added just the suppress related columns to be displayed, assigned easier to read column names and I was done.
Operations Center is providing a suppress/acknowledge capability on top of all of my management tools, my End to End Service views, assisting me with managing my SLA's and providing visibility. I am now able to drill into the Data Integrator adapter to see which elements in my system are suppressed, when they were suppressed, for how long, etc. I am also able to create a nice view of the suppressed elements in the Dashboard in a tabular layout that allows sorting and filtering.
As always, if you find this interesting and want to play with it, please do that in a dev environment.