Oct 12

History of enterprise management paradigms is very seldom given the importance it deserves. Being an industry which is nearly 90 years, should we not retrospect the level of maturity of our industry? Very seldom do we realize that we have been stuck with one protocol for over 22 years. These are some of the questions that almost never come up. So, today I looked up archives for NMS, Service Assurance, BSM, SQM landscapes and put together a brief background of the history of all the aforementioned paradigms. If we look back to the archives; here are some of the key milestones:

1920: the birth of the term “Network Management”. ATT coined the term Network management, wherein supervisors used to roll on skates to manage the network incidents requiring attention.

1962: the first “Network Control Center” is born at ATT.

1977: the first “Network Operations Center” is born at ATT.

1987-8  was one of the most important years for Network management. The birth of SGMP and SNMP Version 1 protocols by IETF. The birth of ITSM by CCTA’ [Birth of Service Assurance concept]

1987: the first “Topology driven NOC”

1991: the birth of “Network Management as we know it”. Monitoring/surveillance/Operations & Administration assumed by NOC.

1993: birth of SNMP V2

1997: birth of SNMP V3

2001: Telecom bubble burst and highest emphasis on “doing more with less”/”Lean”. Birth of Business Service Management [Searching for hard citations, will update soon]

2003-05: Multiple Mergers and Acquisitions era starts with an effort of consolidation on Enterprise management market

2005: 3GPP goes big with the planning for LTE, SAE etc for defining the 4G networks and industry focus on Virtualization starts

2007: Forrester puts forward a study stating ManagedObjects and BMC as leaders of BSM. Lights a fire under IBM, HP and others to put forward better offerings.

References:

http://www.corp.att.com/history/nethistory/management.html

http://140.134.26.20/wbem/eng/ch2.html

http://www.ir.bbn.com/~craig/

http://www.interesting-people.org/archives/interesting-people/200603/msg00182.html

http://www.forrester.com/rb/Research/wave%26trade;_business_service_management,_q1_2007/q/id/38931/t/2

Tagged with:
Aug 22

Having been a firm believer of using formal techniques for building BSM solutions; I have studied SEI documentation guidelines over the years and implemented them time and again. This has been one of the biggest reasons for most success that I have attained in my career. For this reason, I have decided to share a step by step documentation guidelines series with my readers to influence the BSM community towards building solutions BETTER!!

I would like to acknowledge Professor Jamshid Vayghan and Professor Jeff Thompson who have been instrumental while teaching Enterprise Architecture and Software Solutions Architecture which have given me an insight into how delivering Architecture related artifacts better.

This series shall detail the documentation guidelines for BSM solution strategy, architecture planning, implementation and lifecycle management. This series shall not include troubleshooting, production support and maintainance related documentation as these aspects will depend on tool suite, hardware, software and organization enviornment.

Tagged with:
Aug 17

Those who know me also know that I have been a Nagios supporter for a while; but have had my share of complains about the solution. Since past few months, I have been hearing/trying out GroundWorks open source solution as it provides a better integrated value offering when compared to Nagios; so did a study that I would like to share with my readers.

Groundworks 5.3 community edition is hosted on enterprise class application server/database, this came as a big relief after using Nagios which had some scalability issues. JBOSS and Mysql have given a good foundation to GroundWorks and provide some really nice Web 2.0 features like AJAX driven views/role driven dashboards etc… Furthermore GWS also provides some really cool reporting features. Note that the reporting functionality is for both realtime and historical information.

Hardware requirements for GWS are pretty straightforward i.e. 4 Gigs of RAM, 3 GHz CPU speed, 160 Gigs of harddrive, etc.. Net net – Nothing earthshaking.

What was really interesting is that GWS had a lot of plug ins incorporated, intuitive UI driven hostgroup/host setup, servicegroup/service setup, dependency control and service metric reporting features. This was a BIG plus and provided value of the shelf… For the existing Nagios users, GWS provides an seamless port over of existing functionality. Whats more is that installation took only 30 minutes!! Well do not confuse installation with ready solution — PLEASE. I am only referring to base solution.

If all of the aforementioned was not enough, the GWS solution provides a feature to discover  devices based on ping sweep and tcp layer discovery option as well; Auto discovery for free — Neat!!

So to summarize, GWS community edition turned out to be a very useful time investment due to following reasons:

1) Easy to install 20 minutes of initial setup

2) UI driven hostgroup/host/servicegroup/service/commands/dependencies

3) Historical and Realtime reporting features

4) Auto Discovery via ICMP protocol

5) AJAX driven role based views/dashboards

6) MySql archive

7) Checks for SNMP alarms [passive]

8 ) Active service checks [loads of them]

I would be interested to hear the feedback from others who have used enterprises edition on GWS BSM features provided in enterprise version of the solution.

Tagged with:
Aug 08

The success of a business or service is tied to the human factors like relationship management, communication, motivations, aspirations, org. of human chain, delivery models, planning of human resources etc – Yet WE (the community) most often than not neglect, if not completely ignore the human aspect of BSM.

So what are the various “Human” dimensions/perspectives of BSM

  1. Skills: If you thought you can get BSM/Service Assurance implementations done by a bunch of system engineers with expertise on a bunch of tools; you are heading for a disaster. What you need is someone who can envision solution, understand the organization, communicate the ideas and implement them with agility. System engineers are an important part of the implementation, but they have biases towards what they know best which works out ok if what they know can scale to the needs of the organization.
  2. Ability to work with end users is another very important aspect for the success of the BSM solution; a word of caution here – If you heard sales presentations saying BSM is only for VP’s, Directors to know the “global strategic view”; stay away!! BSM belongs as much to the VP’s as it belongs to the technician “Joe” sitting in the Operations Center trying to understand why his walk in the datacenter and him pulling the cable can cause a loss of millions of dollars.
  3. P2P of value chain: No P2P is not peer to peer but people to people communication of “value” of BSM solution and understanding the importance of communication of service impacting information to all expected stakeholders.

So, next time you are working on BSM implementation, try this – WALK AROUND!!  Ensure BSM is a part of every individual goals and objectives, the value that the VP’s and Director’s are expecting out of the solution is communicated to as many folks as possible. Also, the value that the technician Joe is expecting is communicated back completing full cycle of “value” flow. That is what makes the business better!! Have you ever seen all of the aforementioned  happening in a BSM implementation? I can bet that the answer in more than 80% of the cases is “NO”; and we ask why 8 of 10 BSM efforts fail to meet the proposed expected value.

BSM will only be “BS” if WE(the community) miss out on the perspective which ensures the alignment of the most important asset/pillar for the success of a business i.e. “Humans”.

Tagged with:
Jul 18

Managing vs Discarding events  has been a topic of debate for many years in the Network Management community. Both sides have merits and demerits to consider and while the reader may ask for a specific answer, the answer really is that it depends!! The real question rather is that what factors does this debate depend on?

For those of you who are not familiar with this topic, let me give a quick background. Most equipment vendors, provide MIBs/Off the shelf management modules to manage equipment from fault management/service assurance perspective based on standard TCP or UDP protocols like SNMP, TL1, Socket communication etc.  Various Telecom/Financial giants NMS teams debate the feasibility of managing huge number of events often millions in number in terms of volume per day; correlation/deduplication does reduce events to more actionable alarms but it does not solve reduce the actual root causes. So this leads us to a bigger question, what are root causes?

Does a NOC or Front office technician really care for Authentication failure alarms or those annoying informational and warning alarms provided off the shelf by vendor to “effectively manage the network”?

Following are the organizational factors to consider for effective event management:

1) The size and skills of the Layer 1 support NOC/Front Office: Ok, so if the Front Office is 4-5 guys, can they really handle 3000 critical alarms a day? Do they really need those trending alarms indicating that a T1 might be impacted in 4 hours or would they rather focus on the customer impacting outages? [I know that some would argue the very org. structure; but I will not try to influence business decisions which consider multiple dimensions of the picture, technology being one of them.]

The size of the team responsible for incident management is key for the fault management/service assurance team to ensure quality of alarming meets the expectations of the Organization.

2) The size & complexity of Application platform/Network: Size and complexity of the Application platform/Network plays an important role in defining alarms.

Example: For layer 1/core network – Technicians may want to know all trends to mitigate incidents from happening where as for layer2/layer3 network – Technicians may want only events indicating incidents impacting services.

Note: Understanding the network/applications from usage perspective helps immensely.

3) Customers & Services: Provisioned services and customer associations are important to the overall business objective. Understand them!

After understanding the aforementioned, you will know the organizational perspective and volume management perspective of events.

Now for the most important dimension of the debate on quality of alarming which constitutes of  accuracy, completeness and actionable alarms. Considering this factor, one might argue that only if we manage all identified alarms vs. whatever provided off the shelf – we can reach the goal of quality. Yes, i agree.

One the other hand, few might argue that by discarding unknown alarming we let some information which might impact services go unnoticed. Yes, i agree to this too. But the challenge is to balance these discards to the right level showing events which indicate right impact on the service.

That is why the challenge is not in getting the Right tool, its all using the tool Right!!

Tagged with:
preload preload preload