The dilemma of implementing NFV services: Why is it taking so long?

IBM’s Craig Farrell looks at making NFV implementation a reality. In this first part he examines why NFV is taking so long to move from trials into production services.

Craig Farrell, IBM

14 Nov 2016

The dilemma of implementing NFV services: Why is it taking so long?

In this two-part series, IBM’s Craig Farrell looks at making NFV implementation a reality. In this first part he examines why NFV is taking so long to move from trials into production services. In Part 2 he address the crucial question of what can be done about it.

The tantalizing promise of software defined networks (SDN) and network function virtualization (NFV) is that it will allow carriers and service providers (CSPs) to purchase and deploy network functionality as software without the need for expensive, often proprietary, hardware. This can lead to cost savings, shorter times to deliver new services, and the ability to scale and adapt network services on demand and implement closed loop controls. Over the last few years many CSPs have engaged in trials, proof of concepts, and demonstrations, but there are very few wide scale NFV based network services in production and there are almost no NFV implementations with multi-vendor NFV functions deployed.

For the NFV services that can be found, it is difficult to find anything beyond anecdotal examples of return on investment (ROI). Many CSPs are still quoting ‘expected’ cost savings and ‘anticipated’ reductions in the time to deploy new services. These observations point to an industry-wide slowness to move from trials to production SDN and NFV-based network services. Given that the value proposition for NFV-based network services is compelling and trials have been going on for nearly five years and are often described as highly successful, what has caused the delay in transforming to production NFV network services and what can be done about it?

What is delaying the transformation?

I believe that the sluggish pace of NFV transformation has been the result of a handful of factors that need to be addressed as part of the implementation planning for an NFV transformation project. With the recent increase in the number of SDN and NFV transformation projects being announced by CSPs, the need to address these factors is crucial since some CSPs have published estimated performance improvements, cost savings and timelines for many of the recently announced projects. So what are these factors?

Dilemma of transformation

The first issue that has been hampering efforts towards delivering SDN and NFV network services is the dilemma of transformation. This dilemma predicts two major factors that cause telecom NFV transformation projects to stall. The first factor is the need for a very compelling business case. Moving to cloud-based SDN and NFV networks may necessitate a transformation that replaces legacy services that are generating significant, but declining or flat revenue at high profit margins. After seeing the business case, many CSPs still struggle with the need to modify a service that is working and highly profitable. Even for a brand new service, why take the risk on an unproven new technology without a really compelling business case?

The second factor is that transforming to NFV services requires a migration strategy that maintains customer satisfaction and service levels throughout the move so that churn does not increase during the change-over.

Lack of a clear, immediate trigger

The next factor that appears to hold back NFV transformation projects is also related to the business case. For many CSPs the business case for NFV transformation lacked a clear trigger that necessitated transforming immediately. Despite published timelines for transformation, many network staff remarked that the risk on legacy services was low and these services were very profitable, so why spend money transforming them right away? Why not wait until services become unprofitable before transforming them? Many argued that given the current environment of cost cutting and fiscal restraint, any investment dollars in the current budget would be better spent on other priorities such as optical build out and radio access network (RAN) upgrades. Given that optical build-out and RAN upgrades represents a much larger percentage of the network operations budget, the trigger to prioritize NFV transformation in the current budget cycle was not clear and pressing. Even for new services, the benefits in the NFV business case were not compelling enough to trigger using NFV to build new services right away. If we contrast this situation in the network with the earlier move to virtual environments in information technology (IT) data centers, we notice that server farms and storage represented a much larger percentage of the IT spend, so the financial trigger for IT data centers to transform to virtual environments was much clearer.

Cultural issues

A move to SDN and NFV for network services requires network organizations to change the way they design, build and operate network services. The multi-tenant cloud environment assumed by NFV is very different to a traditional network operating environment. Network architects, planners and operations teams are used to designing a solution for a particular piece of equipment operated in a specific way. The move to cloud-based multi-tenant SDN and NFV operating environments for network services introduces new questions such as:

When to use public, private or hybrid cloud environments?
Who will buy and manage the hardware and software?
Who does capacity planning?
Does the network operations team have to give up control to another operations team?
Who is liable if service level agreements are violated?
Can we trust an automated, self-service environment?

Trials of long-term evolution (LTE) evolved packet cores (EPCs) were successfully conducted more than five years ago. Despite this, if we look at production EPCs today, we see they are deployed and managed on equipment much as they have always been. Questions around changes to standard operating processes such as fulfillment and service assurance as well as questions around service level agreements and the fear of outages convinced network staff to take a conventional, conservative approach when building production LTE networks despite the successful trials.

As service providers continue to move their applications to cloud and look to migrate network functions to the cloud, there is still a lack of trust in how these new services can be made available on demand. While bringing new services to market, and offering network capabilities on demand, network engineering, operations, and suppliers who have been responsible for delivering 99.999 percent reliability have to redesign and rethink deployment in a cloud that expects applications to adapt when a VM goes down. The cloud mentality of having “cattle” versus “pets” directly conflicts with the traditional way of deploying networks with extremely redundant hardware for single functions.

‘Fast follower’ business strategy

Telecommunications operators tend to be conservative by nature, which manifests itself in a number of ways including the widespread use of the ‘fast follower’ business strategy. The fast follower strategy means that few CSPs are prepared to take the risks associated with being the leader in implementing an idea or technology. Many CSPs would rather focus on being quick to replicate an idea once it has proven itself. For NFV this has translated into a situation where a number of CSPs have done trials but are waiting to see the results of production implementations by the few CSPs prepared to take the lead before they themselves implement production services.

Contrast this with the approach of telecom disruptor companies such as Facebook, Netflix, Amazon and Google. The disruptor companies are very comfortable testing new ideas and quickly moving them into production, irrespective of what others are doing. This highlights a clear difference in the culture between them and more traditional service providers.

Vendor community

CSPs are not solely responsible for the slower-than-expected appearance of production NFV services -- vendors too have contributed. The slow pace of NFV adoption has certainly been affected by the maturity of solution offerings from the vendor community. Many vendors are in early releases of NFV solutions meaning that few have been tested in production networks. NFV features such as dynamic provisioning translated into significant scalability requirements for vendors and although these requirements appeared on product roadmaps, few vendors would claim they had mature solutions. There was also some thought that incumbent vendors might not fully support the transition to NFV because it could reduce revenue from existing products and enable new competitors. This goes back to the dilemma of transformation for the network function vendors who are legacy network equipment providers, and their internal conflict of cannibalizing their own revenue streams.

If we return to the example of EPCs for LTE networks, CSPs fears around the maturity of vendor solutions raised questions such as manageability, scalability and reliability (particularly the fear of outages) because of immature solutions from vendors. The long relationship between CSPs and vendors meant that many CSPs were conditioned to wait for vendors to deliver ‘carrier grade’ solutions before implementing them into a production network.

Contrast this with the culture at telecom disruptor companies such as Facebook, Netflix and Google – they are very comfortable implementing solutions they build themselves, rather than waiting for a mature a solution from a vendor.

The author would like to thank the following people for their insights, reviews, comments and updates. Lloyd Switzer,Telus; Kevin Salvadori, Clearbay Technology; Zygmunt Lozinski, Steve Teitzel, Marcus Buckle, Charlie Hale, Stephen Laufer and Sarah Dudley -- all from IBM.

Read Part 2 for Craig's take on how these issues can be addressed.

Craig Farrell

CTO, IBM