logo_header
  • Topics
  • Research & Analysis
  • Features & Opinion
  • Webinars & Podcasts
  • Videos
  • Event videos

Building a hybrid ML model to predict CDN capacity

07 May 2019
Building a hybrid ML model to predict CDN capacity

Building a hybrid ML model to predict CDN capacity

In today’s information-driven world, technology is advancing exponentially and so is the growth rate of data traffic over worldwide IP networks. As per a Cisco report, the annual global IP traffic has already crossed zettabyte (ZB) threshold and is expected to reach 4.8 ZB per year by 2022, which is a threefold increase over the current period. Further, it has increased the role of the content delivery network (CDN) in data delivery, which is expected to carry 72% of total Internet traffic by 2022.
Figure1: Increase in Global IP Traffic (Cisco VNI Forecast: 2017 – 2022)
Figure 2: Global content delivery network Internet traffic (Cisco VNI Forecast: 2017 – 2022)
This exponential increase in CDN traffic is putting a lot of pressure on service providers to find new methodologies to optimize existing capacity and accurately predict future capacity requirements. This article discusses what are the challenges service providers face in CDN capacity planning and how using a hybrid machine learning (ML) models help to confront these challenges.

Service providers’ in capacity planning challenges


Traditionally, CDN capacity planning was mostly a manual process which used basic statistical tools to collect data and set an alert on a static threshold. However, with big bandwidth growth, these traditional approaches are sure to fail.

  • Network traffic is becoming extremely dynamic due to increased mobility and heterogeneous demands from users.

  • With the constant growth in media consumption, operators are experiencing significant fluctuations in capacity demands based on seasonal changes.

  • External influencing factors such as sports events, political campaigns or sensational videos can cause a sudden increase in network traffic, causing the network planning to go haywire.

  • Internal influencing factors such as new product launches, new offers or sales promotions can also cause a significant fluctuation in capacity requirements.


Using a traditional approach in the current landscape can lead to incorrect forecast conclusions. With this, service providers often run out of capacity due to the pressure of increasing data consumption and changes in consumption patterns that are not identified during capacity planning.

Building a robust capacity modeling tool to forecast capacity


Using a robust capacity modeling tool can help service providers to confront the above challenges and better plan out their network capacity.

As depicted in figure 3, a capacity modeling tool collects data from different sources to predict when demand for a given resource may outweigh the capacity to deliver it. Building such a model makes it possible to determine which infrastructure upgrades will deliver a return in increased customer business and reduced churn.
Figure 3: Capacity Modeling Tool
The CDN data traffic has a huge number of datasets and choosing one forecasting model which performs best with all the datasets is not straightforward. There are various classical time series forecasting models as well as modern machine learning models to make the capacity prediction. However, before narrowing down to right set of forecasting model, we must analyze some key factors such as context of the forecast, relevance and availability of historical data, desired degree of accuracy, time period to be forecasted, cost/benefit (or value) of forecast to the company and time available for making analysis.

Evaluating the classical time series forecasting models


When it comes to predicting network capacity in a CDN network, the classical approach does not give the chosen result. This is mostly because of the dynamic nature of video consumption pattern. About a decade ago, TV was the major source for users to consume video traffic. For this, the statistical formulas worked well to predict future capacity. However, in the present context, the video consuming pattern has changed completely. Users now have the flexibility to watch videos across various digital devices on the go across different network and bandwidth. The classical approach mostly fails to incorporate this dynamically changing trend of current traffic usage.

Evaluating modern machine learning models


Machine learning and deep learning methods are often considered as a key solution to complex predictive problems. However, using only complex ML models may have the following drawbacks:

Financially and computationally expensive


Deep networks require high-end, expensive GPUs to be trained for a reasonable amount of time with big data. Without this, high performance would be hard to achieve. Thus, firms need to carefully consider whether the cost of further gold plating is justified to achieve better accuracy.

Difficult to incorporate input datasets for external and internal influencing factors


Predicting future demands only using past data doesn’t work anymore. Thus, it is crucial to feed external and internal influencing factors into the model. However, using only the complex modern ML models may not give that flexibility.

A sophisticated machine learning method can be a costly affair yet may or may not be effective. Thus, a customized approach using hybrid ML methods is required to make the capacity prediction more robust and inclined to business needs.

Building a customized hybrid machine learning model


A customized hybrid model as shown in figure 4 highlights how a combination of different classical forecasting methods and modern ML models can help to improve overall prediction, thereby overcoming the existing limitations. This method can be a 3-step process. Firstly, the time series model can be decomposed into its components (i.e. trends, seasonality and noise). Next, each of the components can be modeled separately using the model best suited for the component. Finally, all the individual outputs can be combined to produce the final forecast.
Figure 4: Customized hybrid machine learning model

Business benefits achieved by a leading service provider in Europe


One of the leading service providers in Europe faced a critical challenge of running out of CDN capacity due to the pressure of increasing data consumption, and changes in consumption patterns that were not identified during capacity planning. To overcome this challenge, the operator implemented a customized hybrid machine learning model for CDN capacity planning built with efficient pattern recognition and self-learning capability, achieving the following benefits:

  • Zero capacity failures observed in the past year for the service provider who has a presence in 15+ countries and a 25 million+ customer base.

  • New areas of capacity risks in different parts of the network were identified in advance. Helped the operator to timely accommodate and plan for the required future capacity.