As network function virtualization (NFV) continues to evolve, the understanding of closed-loop use cases is improving. But instead of a single approach, an entire assortment closed-loop cases have come to light, each with their own unique factors and requirements. We are seeing a variety in scale, in complexity of the decision making, and in the speed required to execute actions. As seen in figure 1 below, creating the typical closed-loop requires four critical steps, but within each of these steps are many decisions that need to be made:
Figure 1. Steps for closed-loop automation
Standards organizations such as TM Forum, Open Source organizations such as ONAP, in collaboration with communications service providers (CSPs), have designed an underlying architecture to support these use cases. Several architecture options were designed to determine when each of the “collect, analyze, decide and execute” steps should take place. These options mainly vary on the following aspects:
- Centralized vs. distributed analysis for problem identification
- Centralized vs. distributed decision making and policy management
Centralized vs. distributed analysis for problem identification
There are two main options for where analysis should be conducted:
- Distributed (intra-layer) analysis: By taking this best-practices approach, the assurance and analysis function is embedded in each layer of the OSS solution (customer, service, network function, infrastructure). In addition, in cases where low latency is required, it is better to localize any possible function.
- Centralized (global) analytics – Here, the OSS system includes a single analytics module that can analyze data cross-layers. The main advantage of this option is the support of cross-layer data sources, using analytics that may be based on advanced algorithms and artificial intelligence (AI).
While both options can co-exist as a hybrid analysis architecture option, I would argue that in most cases it is better to go with the second option above – the centralized, or global, approach.
Let’s examine why. A common use case that might appear simple at a first glance, is when too little processing power is being applied to a virtual machine (VM) or a virtual network function (VNF). The simplistic approach would be to allocate more resources. Yet, a more holistic approach will consider the root cause and apply logic, like the following:
- If the need for more processing power is ‘natural’ (and that is to be defined, or self-learned), additional processing power should be allocated.
- If the need is coming from a sudden exceptional behavioral of a specific VM/VNF, the resolution may be to reset/restart/reconfigure the VM/VNF
- If the need is coming from a global phenomenon (such as a weather storm or a large sporting event), the course of action may be totally different, and part of a larger system response to the overall need. As the whole system may be very busy, you can make the decision to preserve only vital VMs/VNFs, to keep the system running in tough conditions.
As with many areas of modern life, problems don’t typically live in isolation. In the case of a more global phenomenon, handling each VM or VNF separately may cause even greater system problems, demonstrating why distributed analysis is often the wrong approach.
Figure 2. Centralized Assurance & Analytics
Centralized vs. distributed decision making and policy management
Policy management and decision-making are not the same tasks. Policy management is the mechanism to manage decisions, while the decision-making component is likely to be managed by software modules that are specialized in this function and may even use AI for ‘self-learning’. In these cases, the policy manager delegates the decision to each specialized module.
Decision making can vary in its complexity, starting from simple, automatic pre-determined responses (e.g. when the temperature gets too high, the air-conditioning should be restarted) or ad-hoc responses that rely on AI (e.g. plan network changes that will provide optimal coverage to a specific geographical area).
There are two possible alternatives for a policy-based architecture. The first is to use a centralized policy manager for OSS decisions at all layers (e.g. the ONAP approach). The second option is to use distributed policy management with cascading policies – in which each orchestrator/manager has its own policy component.
We need to keep in mind that each management system oversees a different layer, and possibly different aspects as well. So, even if policies are defined using the same tools, they are very different in their content. Additionally, when decisions are simple, but low latency is required, it makes sense to have a policy decision point (PDP) as close as possible to the enforcement point (PEP).
In the coming years, all signs are pointing to systems using both options, and possibly a hybrid version as well. This is where a centralized policy manager would be implemented alongside specific ‘islands’ of local policy management, possibly using a hierarchical approach.
The need for zero-touch deployments and zero-touch operations raises the requirement for cascading policies, as shown in figure 3 below. Some automation use cases dictate flows, where a policy that resides at a higher orchestration and management level is automatically translated, or cascaded down, to lower level policies.
Figure 3 – Cascading Policies
When cascading policies are implemented in a distributed policy management system, it is important to support the administration of policies by using APIs. Building the necessary APIs is one of the areas of focus that the TM Forum Interfaces group is currently working on. For example, the newly published PM Threshold API and the upcoming SLA management API will support assurance policies in a distributed policy management architecture.
As you can see – for closed-loops, one size does not fit all. Instead, closed-loops will have different architecture implementation options. Fortunately, groups like TM Forum and its member companies, including TEOCO, are working to resolve these issues and help service providers around the globe create networks that will enable tomorrow’s services.