The NOC engineer’s new teammate: the Incident Co-pilot solution for faster incident resolution
The Incident Co-pilot project uses advanced genAI technologies and a multi-agent architecture to revolutionize NOC operations. By automating routine tasks and enhancing decision-making, the solution significantly improves incident management efficiency and effectiveness in complex telco networks
The NOC engineer’s new teammate: the Incident Co-pilot solution for faster incident resolution
Commercial context
In today's complex network environments, incident management is a critical task for ensuring business continuity and minimizing downtime. However, the root cause of incidents is often difficult to determine – especially when multiple sources of data, such as alarms, tickets and customer feedback, are involved. Traditional incident management tools can struggle to correlate this disparate data and provide actionable insights, and these tools also often lack the ability to suggest next-best actions, leaving incident management teams to rely on their experience and intuition.
In addition, NOC operations place almost the entire workload on the network operations center (NOC) engineer. Each task requires human input, often through multiple interfaces, typically taking between 4 and 8 hours to resolve, with some issues involving more than one engineer. Human skills of perception, analysis, and decision-making are maxed out, and dependencies on other engineers and teams introduce lag and delay.
The solution
With the Incident Copilot, this changes dramatically. Routine tasks that once took hours are now completed in just minutes by a single engineer. Human involvement is minimized, allowing NOC engineers to focus on more complex issues. The Incident Copilot automates processes and provides swift, accurate incident management, enhancing overall efficiency and resilience.
The Incident Co-pilot uses NOC domain knowledge and advanced AI technologies for incident management. Using techniques like chain-of-thought prompting and RAG, it also showcases a collaborative multi-agent approach to problem-solving. Agents such as Incident Agent, Healing Agent, and Optimization Agent collaborate to diagnose, resolve, and optimize network issues. The Incident Co-Pilot also analyzes historical incident data and current network conditions to recommend the most effective course of action to resolve incidents.
The system is designed to bridge the gap between AI algorithms and human trust by providing transparent and understandable explanations of its recommendations, thus maintaining essential human control over incident resolution. "A good Co-Pilot knows it’s not the pilot – it’s still the NOC engineer who is in charge," states Kevin McDonnell, Catalyst Lead and Senior Director at Huawei. The Catalyst is employing the TM Forum Incident API (TMF724) and associated APIs to connect the CSPs’ and vendors’ platforms. This approach allows CSPs to leverage the strengths of multiple vendors’ tools while maintaining interoperability.
Applications and wider value
The Catalyst aims to reduce incident resolution times by up to 30%, lowering costs and boosting customer satisfaction as timely resolutions minimize service disruptions. "From the customer’s perspective, the primary improvement lies in network and service quality due to the reduction in instances of unavailability or degradation,” says Pedro Garcia Parra, Manager of Autonomous Network & Operations Transformation at Telefónica. “These solutions enhance the accuracy of diagnoses and resolutions, mitigating errors caused by human decision-making.”
Telefónica values the system for training, aiming to make novice engineers experts three times faster than before. “It also promotes the adoption of AI within the company,” adds Pedro Garcia Parra. Implementing tools that abstract the complexity of AI model development makes these technologies more accessible to employees, and reduces their initial mistrust.
There are also significant productivity improvements. “Our NOC engineers need to deeply understand the incidents in our customer environments, write up a narrative about each incident and proofread the narrative,” explains Massimo Banzi of TIM. “This ensures we have a clear concise and crisp message to send to our customers or our field engineers. Incident Co-pilot allows us to accomplish this much more quickly. In some cases, it's up to 90%, with high-quality text summaries that surpass most of our engineers’ capability. It's incredibly powerful, saving time so our engineers can move on to the next case.”
By reducing the time spent in incident resolution (MTTR) and optimizing performance of network (post resolution), the solution contributes to improved connectivity and accessibility of telecoms services for individuals and businesses. For CSPs, long-term network stability means smoother operations, more revenue and improved business reputation, with a positive impact on customer retention and revenue generation. By avoiding ineffective actions and unnecessary interventions during incident resolution, the solution can also help reduce CSPs’ carbon footprint and save energy. Furthermore, once the Incident Co-Pilot system is handling routine incident resolution tasks autonomously (using agents), incident resolution teams can focus on strategic initiatives, rather than mundane operational tasks.
To learn more about how this Catalyst is using advanced genAI and a multi-agent architecture to revolutionize NOC operations, please click here to view the project space on the TM Forum website.