The Cisco – SP Partnership: Best Practices for Lifting Service Quality

By Carlos Cordero, Director, Service Provider Internet Business Solutions Group

Service providers (SPs) often face a number of service quality challenges. These challenges, more often than not, result from hardware failures, software bugs, network outages, packet loss, and capacity issues. The majority of these challenges may not be new, and may have already been resolved by SPs’ technology partners, or by other operators. Indeed, SPs could capture significant operational benefits simply by adopting well-established best practices.

However, adopting these best practices requires a proactive and open relationship between SPs and their technology partners. Without open cooperation, adopting these best practices and continuous improvement will always prove to be a challenge.

To explore the relationship between an SP’s culture and the adoption of best practices, I will be writing a series of articles on the SP360 blog covering operational and engineering best practices, challenges, and benchmarks observed in the course of working with major service providers worldwide. The specific topics I will cover include: operational practices such as testing, certification, engineering rules, go-live, and incident management; as well as organizational capabilities (planning, program management, culture, management practices, IP skillsets, and staffing levels).

A good place to start is testing. Testing is critical, as any complex system will always have bugs. The way in which new network elements and software are tested prior to their integration into a production network can heavily influence that network’s quality. We have found that leading SPs test new software extensively both within their own labs as well as Cisco’s. Testing typically lasts eight to ten weeks and includes functional, scale, integration, and regression testing. However, there can be significant differences in how a given SP coordinates the testing with Cisco, configures the test environment, and performs the actual testing. SPs with the best service quality often develop common test plans with Cisco. Both the SP and Cisco will then use the common test plans to coordinate and perform the testing in their respective environments.

In most instances, the SP and Cisco will each follow the same test procedures indicated by the common test plan to the letter, and then compare results with one another. That means that everything running on the SP’s production network is shared with Cisco. Furthermore, every change to the network, or new added feature, is updated on the test plan.

In one instance, the SP and Cisco actually divided up testing responsibilities 50-50, essentially reducing the testing cycle by 50%. This was possible only because their respective labs closely resembled the SP’s actual production network, in several ways. First, they reproduced the network topology and routing architecture, using a sufficiently large number of routers to simulate real traffic flows. Second, the labs’ networks reproduced the feature functionality configured on the routers in the production network. This was critical because service-impacting bugs can be perpetrated by individual features, as well as by unintended interactions among them. Third, any external software, such as scripts and MIBs, interacting with routers were simulated, as these could conceivably impact service availability. Last, they reproduced absolute traffic levels on the production network and simulated multi-user environments, because some bugs only emerge when a router’s CPUs or interfaces are under heavy loads. Other SPs went so far as to simulate adverse effects, such as route flaps, because these can potentially trigger non-linear effects, which in turn can lead to a service outage.

On the other hand, SPs with the worst service quality record tend to not follow many of these aforementioned testing practices. Therefore, chances are good that they miss a significant number of software bugs which then manifest themselves on the production network, leading to service outages.

In closing, the need for extensive testing highlights that network-related issues are indeed a significant challenge for SPs. But, through the use of industry best practices and collaboration with technology partners, SPs can take a more proactive approach to better managing these issues and can dramatically improve operational quality. I will cover more best practices in upcoming columns.

For a deeper dive into operational excellence, please read about our success with Vodafone and our survey findings gleaned from multiservice operators (MSOs).

4 Comments.

Scott:
Would love to see some verbage around what exactly a "Service-Provider" is these days. I believe this term has grown to encompass any company whose business is delivering IP enabled applications or services over a network. This could be the internet, but it could also be their customers' private networks.
With that in mind, these sorts of practices apply to these companies as well as traditional SPs.
It would be great to hear how companies can transition from a more "enterprise" centric mode of operation to that of a Service-Provider by embracing these practices.
Thanks!
Derick

Hi,
Many thanks for this! From experience of working in test labs for SP's for a decade who were close Cisco partners, I can safely vouch for the importance of testing to ensure high quality service. As technology and customer demand and expectation got more and more sophisticated, MPLS based networks delivering services such as IPTV, VoIP etc got more and more susceptible to unforeseen failures. With more vendors coming into play in the SP arena, testing methodology, which, prior to circa year 2000, was largely focussed Cisco's IOS behaviour in a given lab based simulation of customer network, became also a challenging test of vendor integration.
As such, as pointed out in this piece, the importance of vendor involvement and guidance in lab based tests of simulated customer networks and services offered are paramount. It is also unrealistic, speaking from experience, to be able to detect all possible failure scenarios and/or bugs in an SP lab testing exercise. There are bugs that surface only after live deployment with a combination of time, cpu load and anomalies of real life traffic as a trigger.
However, IMHO, it is of utmost importance from a point of view of responsibility towards customers that SP's do try their very best to ensure that all the failures and bugs that could be eliminated before actual live deployment, indeed has been eliminated. Vendor guidance in this case is of very high importance.
The importance of testing cannot be stressed enough.
Thanks!
santanu

Derick,
Thank you for your comment. In the context of this article, the term "Service Provider" means a provider or carrier of telecommunications services (video, voice, data) such as Verizon, AT&T, Comcast, etc. However, you are correct in observing that enterprises could benefit greatly from adopting best practices such as those in my blog. In addition to enhanced testing, they would also benefit by implementing other carrier-grade measures such as service-level agreements and program management. I will write more on these topics in futue blogs. Stay tuned!
Regards,
Carlos

Dear Santanu,
You are correct in pointing out that SPs need to work with vendors to do everything possible to prevent bugs from emerging in their production environment. From experience, I can attest that both vendors and SPs do expend a great effort prior to launching a new service or implementing new software or hardware upgrades on the network. However, the level of cooperation, and quite frankly, trust between the SP and the vendor is key. Trust enables them to work more closely and thus ferret out any potential deleterious effects, not only in testing, but also during the planning, requirements and implementation phases as well. In turn, trust is predicated on several factors such as culture, prior history, and how the vendor (and the SP) behave in a crisis situation.
Regards,
Carlos

Some of the individuals posting to this site, including the moderators, work for Cisco Systems. Opinions expressed here and in any corresponding comments are the personal opinions of the original authors, not of Cisco. The content is provided for informational purposes only and is not meant to be an endorsement or representation by Cisco or any other party. This site is available to the public. No information you consider confidential should be posted to this site. By posting you agree to be solely responsible for the content of all information you contribute, link to, or otherwise upload to the Website and release Cisco from any liability related to your use of the Website. You also grant to Cisco a worldwide, perpetual, irrevocable, royalty-free and fully-paid, transferable (including rights to sublicense) right to exercise all copyright, publicity, and moral rights with respect to any original content you provide. The comments are moderated. Comments will appear as soon as they are approved by the moderator.