Internet-Draft | Distributed SFC control operation | March 2022 |
Bernardos & Mourad | Expires 22 September 2022 | [Page] |
Service function chaining (SFC) allows the instantiation of an ordered set of service functions and subsequent "steering" of traffic through them. In order to set up and maintain SFC instances, a control plane is required, which typically is centralized. In certain environments, such as fog computing ones, such centralized control might not be feasible, calling for distributed SFC control solutions. This document describes a general framework for distributed SFC operation.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 22 September 2022.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Virtualization of functions provides operators with tools to deploy new services much faster, as compared to the traditional use of monolithic and tightly integrated dedicated machinery. As a natural next step, mobile network operators need to re-think how to evolve their existing network infrastructures and how to deploy new ones to address the challenges posed by the increasing customers' demands, as well as by the huge competition among operators. All these changes are triggering the need for a modification in the way operators and infrastructure providers operate their networks, as they need to significantly reduce the costs incurred in deploying a new service and operating it. Some of the mechanisms that are being considered and already adopted by operators include: sharing of network infrastructure to reduce costs, virtualization of core servers running in data centers as a way of supporting their load-aware elastic dimensioning, and dynamic energy policies to reduce the monthly electricity bill. However, this has proved to be tough to put in practice, and not enough. Indeed, it is not easy to deploy new mechanisms in a running operational network due to the high dependency on proprietary (and sometime obscure) protocols and interfaces, which are complex to manage and often require configuring multiple devices in a decentralized way.¶
Service Functions are widely deployed and essential in many networks. These Service Functions provide a range of features such as security, WAN acceleration, and server load balancing. Service Functions may be instantiated at different points in the network infrastructure such as data center, the WAN, the RAN, and even on mobile nodes.¶
Service functions (SFs), also referred to as VNFs, or just functions, are hosted on compute, storage and networking resources. The hosting environment of a function is called Service Function Provider or NFVI-PoP (using ETSI NFV terminology).¶
Services are typically formed as a composition of SFs (VNFs), with each SF providing a specific function of the whole service. Services also referred to as Network Services (NS), according to ETSI terminology.¶
With the arrival of virtualization, the deployment model for service function is evolving to one where the traffic is steered through the functions wherever they are deployed (functions do not need to be deployed in the traffic path anymore). For a given service, the abstracted view of the required service functions and the order in which they are to be applied is called a Service Function Chain (SFC). An SFC is instantiated through selection of specific service function instances on specific network nodes to form a service graph: this is called a Service Function Path (SFP). The service functions may be applied at any layer within the network protocol stack (network layer, transport layer, application layer, etc.).¶
The concept of fog computing has emerged driven by the Internet of Things (IoT) due to the need of handling the data generated from the end-user devices. The term fog is referred to any networked computational resource in the continuum between things and cloud. A fog node may therefore be an infrastructure network node such as an eNodeB or gNodeB, an edge server, a customer premises equipment (CPE), or even a user equipment (UE) terminal node such as a laptop, a smartphone, or a computing unit on-board a vehicle, robot or drone.¶
In fog computing, the functions composing an SFC are hosted on resources that are inherently heterogeneous, volatile and mobile [I-D.bernardos-sfc-fog-ran]. This means that resources might appear and disappear, and the connectivity characteristics between these resources may also change dynamically. These scenarios call for distributed SFC control solutions, where there are SFC pseudo controllers, enabling autonomous SFC self-orchestration capabilities. The concept of SFC pseudo controller (P-CTRL) is described in [I-D.bernardos-sfc-distributed-control], as well different procedures for their discovery and initialization.¶
This document introduces a framework for local distributed SFC operation, by allowing P-CTRLs to temporarily substitute the central controller (C-CTRL) in its task to carry out the global lifecycle management of the given SFC.¶
The following terms used in this document are defined by the IETF in [RFC7665]:¶
The following terms are defined and used in this document:¶
Mobile network architectures are evolving to support network virtualization and service function orchestration. Current Service Function Chain (SFC) architectures [RFC7665] rely on a centralized controller/orchestrator (C-CTRL) which shall be connected to all the hosts participating in a given SFC. This poses issues and inefficiencies in fog computing environments especially because of the mobility and volatility of some hosts, as well as the associated signaling overhead.¶
These problems can be alleviated by enabling autonomous SFC self-orchestration (SOC), based on the concept of SFC pseudo controller (P-CTRL) introduced in [I-D.bernardos-sfc-distributed-control]. A pseudo controller is capable of substituting (at least temporarily and partially) the centralized SFC controller in situations where the centralized controller may not be able to perform its functions (e.g., when the connectivity with some hosts is broken).¶
[I-D.bernardos-sfc-distributed-control] introduces the role of the SFC pseudo controller and describes mechanisms to select and initialize a service-specific SFC pseudo controller among host nodes which are participating in the SFC. This document specifies mechanisms to enable an SFC pseudo controller trigger and control NS lifecycle management operations, such as migration of NS functions, chains or parts of a chain.¶
Figure 1 shows an exemplary scenario where a host UE makes use of an NS composed of the chain of SFs F1-F2-F3. These functions may be application functions -- using 3GPP jargon -- network functions or over-the-top functions. Non-limiting examples of these functions are: load balancers, traffic steering, performance enhancement proxies (PEPs), video transcoders, firewalls, etc. In this example, F1 instance runs on a first UE (node A), F2 instance runs on a second UE (node B), and F3 instance runs on a gNB (node D). SFC pseudo controller instances are assumed running on UE node A and D (which is a gNB). Node A and B are connected via D2D communications. If all the UEs move out of the coverage of the gNB node D, the service chain will then need to be reconfigured to maintain service continuity as gNB node D is hosting one function (F3) of the chain and would become disconnected. Since gNB node D is also providing the UEs with connectivity to the network infrastructure where the SFC central controller is hosted, this type of event cannot be resolved by the SFC controller, as the nodes hosting the functions would be disconnected from the central controller. Similar problems arise in highly mobile/volatile and/or latency-demanding scenarios, where centralized lifecycle management becomes unsuitable.¶
In these scenarios, an SFC pseudo controller can substitute (at least temporarily and partially) the centralized SFC controller when the latter is not available or able to perform a given task. This document proposes solutions addressing the following problem: How to enable SFC pseudo controllers to perform NS lifecycle management operations, such as migration of functions, service chains or parts of a chain? This requires solving the sub-problems listed below.¶
A key concept is to allow a P-CTRL to take over temporarily and partially from the C-CTRL to perform NS lifecycle management decisions. The definition, selection and initialization of a P-CTRL is covered in [I-D.bernardos-sfc-distributed-control].¶
Using Figure 1, we can think of an example where a function (F3) is migrated from node D to node C, triggered by the move of nodes A and B hosting F1 and F2 away from the coverage of node D hosting F3 (nodes A nd B are UEs within coverage of node D which is a gNB). The P-CTRL in node B performs OAM operations locally and monitors the NS-specific SLAs. Upon detecting or predicting that the NS-specific SLAs may not be met in the near future, P-CTRL A takes actions to temporary and partially substitute C-CTRL, and starts performing local NS lifecycle management operations (e.g., instantiating F3 on node C, since current hosting node -- node D --is predicted to become unreachable soon).¶
Note that, in the previous example, the prediction and local NS lifecycle management operations could have been performed by P-CTRL running at node D as well. We have assumed that the active (designated) P-CTRL is running at node B, but could have been at node D as well, which would imply the need to also migrate the active P-CTRL role to node B.¶
Thanks to enabling P-CTRL B to perform local NS lifecycle management decisions, service continuity will be guaranteed when C-CTRL fails or is out of reach.¶
The "activation" of P-CTRL operation only occurs when C-CTRL cannot properly operate (e.g., it is disconnected from the SFC or it is not reacting fast enough to the local changing conditions). For example, P-CTRL can send a scaling command to a given node, in order to adapt the resources to the current NS demands. P-CTRL would also notify this to C-CTRL, as soon as the connection to C-CTRL is recovered so that both are synchronized.¶
In order to support the operation of P-CTRLs complementing or replacing the operation of the C-CTRL, the following operations are needed:¶
Currently defined mechanisms assume a semi-static environment and the standardized message flows do not support dynamic migration of the SFC controller role to other entities. Therefore, new signaling flows need to be defined between C-CTRL and P-CTRLs and in-between P-CTRLs: (i) allowing prediction of events via local monitoring and faster reaction, (ii) enabling orchestration when C-CTRL is temporarily unreachable, and (iii) supporting migrating CTRL role to P-CTRLs.¶
There are two main triggers for a P-CTRL to take over from the C-CTRL: a local monitoring event or a C-CTRL failure. We specify next the procedures for each of these two triggers.¶
In this case, the C-CTRL has delegated some monitoring actions to the P-CTRL, as indicated in the OAMD sent by the C-CTRL to the P-CTRL.¶
A detailed message sequence chart is shown in Figure 2. The different steps are described next:¶
P-CTRLs are running service specific OAM monitoring actions, as indicated in the OAMD sent by the C-CTRL in the network service instantiation procedure. This requires signaling procedures. Various non-limiting example options are possible:¶
The interface between the P-CTRL and the SFC functions running on the UE to obtain OAM metrics may be a local API, or standard interface like IETF SFC OAM, or like the interface between 3GPP NWDAF and an NWDAF service consumer.¶
In this case, the P-CTRL detects/predicts a C-CTRL failure (e.g., it becomes unreachable).¶
A detailed message sequence chart is shown in Figure 3. The different steps are described next:¶
When a C-CTRL failure is detected, the designated backup P-CTRL takes over the orchestration of the network service, by:¶
We describe next, using an example, how a C-CTRL may get back the orchestration control, temporarily delegated to a P-CTRL.¶
A detailed message sequence chart is shown in Figure 4. The different steps are described next:¶
In scenarios with no C-CTRL reachability, it might be needed to transition from one P-CTRL to another one (e.g., because of mobility of the nodes while the C-CTRL is not reachable).¶
Reactive transition is supported as for the case of C-CTRL failure. Proactive/seamless transition is addressed as follows.¶
A detailed message sequence chart is shown in Figure 5. The different steps are described next:¶
N/A.¶
The work in this draft has been partially supported by the H2020 5Growth (Grant 856709) and 5G-DIVE projects (Grant 859881).¶