logo_header
  • Topics
  • Research & Analysis
  • Features & Opinion
  • Webinars & Podcasts
  • Videos
  • Event videos

Turk Telekom deploys AI and ML in network operations

Learn how Turk Telekom deployed AI and ML on a big-data platform using TM Forum's AI Toolkit and Business Process Framework, enabling data sharing and improved network performance.

07 May 2020
Turk Telekom deploys AI and ML in network operations

Turk Telekom deploys AI and ML in network operations

Who: Turk Telekom (Turk Telekomunikasyon AS) & Innova, both part of Turk Telekom Group

What: Deployed AI and ML on a big-data platform to identify anomalies and reduce potential customer-impacting problems on the network backbone How: Using TM Forum's AI Toolkit and AI-based anomaly detection and network measurement tools within the Business Process Framework, part of the Open Digital Framework, enabled data sharing and improved network performance

Results: Turk Telekom is a converged operator providing services in the fixed, broadband and mobile domains to nearly 48 million customers. Like other such providers, Turk Telekom faces competitors, threats and rising customer expectations. Some of these challenges translate into network requirements. Growth in mobile gaming, increased streaming of HD and 4K video and the emergence of 5G all call for reduced delays in end-to-end connections. Preventing excessive use of its resources, whether malicious or not, calls for vigilant awareness of network traffic. And any operator running a multi-protocol label switching (MPLS) network needs to keep close watch on how long it takes service tunnels to recover.

  • Realized reductions in RTT delays of 8%, 5% and 9%, respectively, in peering, international and mobile links
  • Detected 1800 anomalies in network traffic, leading to improved customer experience and monetization opportunities
  • Reduced the number of MPLS network devices reporting service tunnel convergence anomalies by over 60%

Solution overview

To improve on its existing rules-based reports and controls, Turk Telekom worked with Innova on several artificial intelligence (AI) and machine learning (ML)-based scenarios. The project, which began at the end of 2017, drew on Turk Telekom for telecom domain expertise and Innova for most of the required AI/ML expertise. The plan was to collect resource performance, service performance and fault management data from various infrastructures; then analyze those data in near-real time to detect anomalies and address the challenges mentioned above. In addition to a half dozen AI/ML algorithms, the project had other requirements. Limited by its legacy relational database, Turk Telekom needed to build a new multi-cluster, big data platform. Converting data into a structural and noise-free form was another and surprisingly burdensome prerequisite. Analyzing service quality KPIs and resource performance data and then directing operations to improve resource and service performance also called for tremendous in-house coordination. Having implemented TM Forum’s Business Process Framework (also called eTOM), part of the Open Digital Framework, Turk Telekom was well positioned to collect and share data internally.

“The service management side works very closely with resource performance management,” Ayhan said. “The Business Process Framework is able to collect those two departments under the same roof, which helps us very much with collecting the data we need for these analyses.”

In the design phase of the project, Turk Telekom also depended upon the Forum’s AI Toolkit. Going forward, it anticipates applying TM Forum’s AI Lifecycle Management Model to its anomaly detection scenarios.

Three scenarios

Turk Telekom implemented anomaly detection scenarios that fall into three main categories: round trip time (RTT), network traffic and alarm duration. All three provide enhanced visibility into network performance and opportunities for proactive network maintenance.

RTT delays Turk Telekom deployed six RTT delay anomaly scenarios starting in April 2019 using three measurement applications (Sigos, Gezgin and TWAMP). The lead Turk Telekom department analyzed approximately 1 GB RTT data over the next eight months, sharing a total of 1,105 detected anomalies with net ops, which made improvements accordingly. As a result, Turk Telekom realized 8 percent, 5 percent and 9 percent decreases in RTT delay values, respectively, in peering, international, and mobile links.

The diagram below shows the changes in RTT measures.

TTfigure_1

Such reductions in RTT delay ultimately benefit customers. Research indicates that delays of more than 100 ms negatively impact mobile gaming. That number drops to 50 ms for HD, and 20 ms for 4k video streaming. For 5G, experts recommend latencies of less than 10 ms. Being alerted to even minor anomalies through AI/ML-driven analytics enables net ops teams to take proactive maintenance before RTT delays reach critical thresholds and service deteriorates

Network traffic Turk Telekom also deployed three anomaly detection scenarios focused on device, service and application-based network traffic values. Beginning in May 2019, the lead department analyzed approximately 20 GB of network traffic data, detecting in total 1,800 anomalies, which it shared with respective operations groups in order to improve customer experience. The traffic anomalies are shown in the table below.

TTtable_1

The particular anomalies determine the response. When application-based traffic drops, net ops first checks to see if the application server is down; then it looks into whether something in the network has gone awry. Major increases in network traffic, on the other hand, often indicate a malicious attack. They could also point to a customer exceeding agreed-upon limits. Such cases are an opportunity for sales and marketing to offer customers new services, such as network monitoring or higher throughput.

Service tunnel alarms Finally, Turk Telekom focused on service tunnel convergence alarms and calculated the incident resolution time for its MPLS network. By detecting anomalies in service tunnel convergence, it achieved significant healing in the overall picture of MPLS network health. At the start of the project in March, the lead department reported anomalies on 88 different devices among more than 1,000 devices on which service tunnel alarms occurred. With constant monitoring and proactive measures taken by net ops teams, this value decreased to 34 devices by December, as shown in the diagram below.

TTfigure_2

In normal conditions, convergence time is stable. When that number jumps, there is likely a problem. Sometimes it’s with the network device, sometimes with a line card. In any case, AI/ML-based analytics can help detect the variability and drive the corrective action required.

Benefits and lessons

Turk Telekom is transforming its operations. Whereas they had relied entirely on rules-based reports to conduct health checks and set proactive controls, now they are using AI/ML-based analytics to guide operational maintenance. The shift away from reporting is significant. “In most cases, the reports just rain on our network ops teams, maybe 20 on some days,” Ayhan said. “They don’t have much time to check them, and if the trend changes, it’s difficult to see that in a report. But when the new system detects that something has gone wrong, they can identify the problem much more easily.”

Adoption of TM Forum’s Business Process Framework facilitated Turk Telekom’s collection and sharing of data that has driven the operational improvements. The AI/ML-based solutions have given net ops teams more consistent and accurate network visibility. Going forward, Turk Telekom plans to extend its AI/ML approach to include predictive maintenance and assisted AI use cases, adapting their processes to the TM Forum AI Lifecycle Management Model.

One of the biggest lessons-learned involved data preparations. Collecting data at the pace needed for near real-time detection without impacting source system performance was a challenge. Thanks to TM Forum’s Business Process Framework, the project’s lead department was able to collaborate with systems’ owners on fine-tuning the collection frequency.

The algorithms used in this deployment included: Density-Based Spatial Clustering of Applications with Noise (DBSCAN); K-Nearest Neighbor (KNN) for easy implementation; Gaussian Mixture Model (GMM) for a probabilistic approach; statistical methods, such as Z score; Piecewise Aggregate Approximation (PAA) for pattern recognition; and Isolation Forest, for unsupervised outlier detection. ii Peering includes three measurements by Sigos; international is the measurement by Gezgin; and mobile includes two measurements by TWAMP.