Subscriber Traffic Steering at the Telco Edge Cloud
Edge computing is not new but has taken on a new level of importance with digitization of operator networks. This opportunity depends on the efficient steering of enterprise or subscriber traffic to the correct edge application or service. Operators and new independent software entrants are exploring innovative use cases and services, and the operator community is developing new architectural approaches via the standard bodies (such as the Broadband Forum) to enhance their current fixed access network assets to take advantage of these rapidly emerging opportunities.
The industry is converging on defining the “edge” as locations with a maximum round trip time (RTT) to the end user of 20 milliseconds (ms).1 This type of access latency addresses most of the edge use cases such as augmented and virtual reality (AR/VR), edge video analytics and vehicle-to-everything (V2X) communications.
These edge service locations will include:
- Communication service provider (CoSP)-operated sites including central offices and regional data centers (DCs) or leased space in colocation or neutral host provided DCs.
- Cloud service provider (CSP)-operated sites including leased space in colocation provider DCs.
- The enterprise edge, sites including branch offices, industrial sites, regional DCs, and leased space in colocation provider DC.
What is clear, and the industry has only begun to explore different use cases, is that it is becoming increasingly apparent that the latency and bandwidth advantages bestowed by the fixed access fiber edge will be critical to the rollout of the more high-value edge services such as edge internet of things (IoT), AR/VR, and video analytics. These attributes drive the role of the CoSP-operated edge very much to the front and center of the edge service delivery value chain.
The pandemic has also placed a new emphasis on robust home broadband. Working from home, hybrid working conditions, and home schooling put pressure on fixed access network. Broadband connectivity is no longer considered the luxury it once was for basic internet or IP TV access. Due to the COVID-19 pandemic it has become the de facto means by which many people access their back-office applications and indeed fixed broadband access became the means by which school children accessed the classroom via online video collaboration tools. Through 2020 to 2021, fixed traffic grew at an unprecedented 40 percent CAGR.2 During the same period there were also dramatic changes in the symmetry of the traffic, with uplink traffic (online collaboration) also growing dramatically. Prior to the pandemic, the uplink-to-downlink ratio was 1:8 however during the pandemic, this went as low at 1:5,3 placing new demands on the fixed access network architecture.
The larger CSPs are also on the march. Amazon launched Outpost, Microsoft launched Azure Stack for network and Google has launched Google Anthos for the edge. These approaches are based on the premise of deploying edge compute platforms which are connected to and compatible with the edge cloud strategies offered by the CoSPs. New application and service outcomes coupled with quality of experience perception will become more important than traditional bandwidth speed tests when it comes to selecting a broadband supplier.
A more flexible approach and the ability to dynamically provision access technologies to offer more bandwidth or lower latencies over mobile or fixed access technologies gives the CoSP a new capability which can be tuned to the edge application or service being offered.
It is a similar landscape for enterprises use cases. Edge IOT will demand more flexible application-specific provisioning which can be enabled through new fixed-access technology and a complimentary evolution in broadband standards.
The answer must be a merging of technical and operational approaches to offer both cloud application and network services on the same platform in a variety of edge locations. This will be enabled using the same distributed cloud central office (CO) technologies emerging in the broadband forum today coupled with application specific steering capabilities which will allow the operator to automatically provision application specific characteristics through their access network.
With modern application development cycles, the new digital fixed access network must be able to rapidly adjust to new application demands in real time. The implication is that the classic time to develop a new network-based service is no longer viable and new edge traffic steering and deployment methods must be adopted in network design and deployment for next generation edge service enabling.
Not all services are equal. Going forward, some will add value to customers with specific requirements. Subscribers will be connected to the requested services deployed in the location that can best meet the service latency and bandwidth requirements to deliver the right end user experience.
Upgrading a traditional fixed access network can be a time consuming and difficult process. These networks have often been deployed with statically configured connectivity between the customer and the broadband network gateway (BNG) which connects the customer. This static provisioning means that maintenance activities such as network upgrades and adding new capacity are often executed late at night and with long planning and deployment cycles often involving “man in the van” or forklift upgrades. Consequently, networks are often overprovisioned to minimize the number of maintenance activities which results in power, capacity, and space inefficiencies.
An evolution of the fixed access architecture is required, one which enables continuous integration and continuous delivery (CI/CD) development cycles, and zero-touch upgrades operations. Live subscriber sessions are dynamically moved without incurring service impact and the resources consumed by the system can scale up and down to meet demand and power usage constraints.
The final, but not insignificant consideration, is network convergence. Since 2017, the Broadband Forum (BBF) and 3rd Generation Partnership Project (3GPP) have been collaborating on a series of standards that will enable fixed and mobile convergence to happen in the coming years. In effect, this means the fixed access and mobile access networks will converge. The evolution of the fixed access network to a more mobile centric control platform means the fixed network operators must embrace the same cloud native approaches deployed for 5G today.
Architecture Fundamentals of Fixed Access Edge Traffic Steering
The architecture above is a simplified view which integrates the key work being done by the Broadband Forum in its Cloud Central Office, Disaggregated BNG, and Subscriber Session Steering Projects. The architecture seamlessly enables the evolution to greater fixed/5G network convergence as the Service Gateway can also be an Access Gateway Function (AGF), as defined in the BBF Wireless Wireline Convergence work.
A key advantage in this change in architecture is that it moves away from the traditional statically provisioned connectivity between a subscriber and services. Instead, the network provides a dynamic ingress service selection and session load balancing capability that simplifies network operations, improves resilience, and ensures that subscriber sessions are connected to service gateways that can meet the customers service needs, including low-latency edge services.
The architecture also enables a cloud-native and disaggregated implementation, where the functions are deployed in software on cloud infrastructure (in geographically separate locations), enabling improved resiliency and more rapid development and deployment of new capabilities and connected services.
The major components in this architecture are described in the following paragraphs.
Service Gateway (SG) is a generic name for the function responsible for providing the required network to the subscriber and providing access to application services. Examples of a Service Gateway include Broadband Network Gateway (BNG), Access Gateway Function (AGF), and Provider Edge Router (PE). The Service Gateway is decomposed into control plane and user plane functions based upon Control User Plane Separation (CUPS) protocols.
Service Gateway Control Plane (SG-CP) is the control component of the Service Gateway user plane with each control plane capable of controlling many service gateway user planes. It is responsible for functions such as authenticating subscribers and allocating IP addresses.
Service Gateway User Plane (SG-UP) is the user plane component of the SG and is responsible for forwarding traffic and providing access for the user to the required network and application services. User planes will be deployed in both edge and core locations0 allowing services to be provided with appropriate latencies to deliver the required end user application experience. Furthermore, user planes may be implemented as hardware or software functions as required to meet the flexibility and traffic forwarding requirements of the operator.
Session Steering Control Plane is responsible for identifying the user plane to which any one subscriber should be connected. Key to this is the User Plane Selection Function (UPSF) which is queried whenever a new subscriber session is brought up, identifying the Service Gateway and user plane that can meet the subscribers service and latency requirements whilst also maintaining balanced load across the domain. The UPSF is also responsible for proactively and reactively identifying any required change in service gateway and user plane mapping in response to network changes and maintenance activities.
Traffic Steering Function (TSF) is responsible for directing the packets for a particular session to and from the correct user plane. This is a relatively simple cross-connect function that can be built into the Physical Access Node, or an aggregation switch or router.
Access Session Detection (ASD) is the function that recognizes that a new session is active and needs to be connected to the correct Service Gateway and user plane. The first sign of life for a new session will depend upon the type of subscriber session but will be activity such as a port coming up on the physical access node or session request packets being received from a home or business site. As with the Traffic Steering Function, the function of Access Session Detection Function may be provided by the Physical Access Node, or indeed by any other element that can recognize a new session first sign of life.
Access Node Control Functions are the result of disaggregation of the traditional Access Node, where the control plane can be deployed in the cloud environment and provide access technology specific control of the Physical Access node.
Physical Access Node is responsible for terminating the fiber or copper access connections from homes and businesses.
What has Changed? Supporting Demonstration and Major Use Cases
In order to prove out the architecture, Intel, Vodafone, and BISDN embarked on building a functional lab prototype.
Intel worked with BISDN to provide the BNG Session CP (SG-CP) extensions to enable interactions with the user plane selection function (UPSF) which was designed and built by Vodafone. Intel also provided the traffic steering function (TSF) implemented on an Intel® Tofino™ 64 x 100G ports P4 programmable switch using the Programming Protocol independent Packet Processors (P4) programming language which enabled a programmatic and flexible approach to fixed access traffic classification. The Intel Tofino switch directs the traffic related to a specific subscriber context from the access network to the specific BNG User Plane instance that has been identified through the steering process, and in the reverse direction from the BNG User Plane to the Access Network.
The demo was executed in Intel's lab and was recorded for Broadband World Forum (BBWF) 2021. The emphasis of the demo was to show how some of the issues identified in the introduction can be addressed and solved using this approach.
The first demo story addressed shows how the full system, including all the key control plane and user plane components can be instantiated on a Kubernetes edge cloud, subsequently new subscribers are added to the system. The major difference here is the involvement of the UPSF which the SG-CP now queries, to understand current SG-UP loading, and to decide on which SG-UP a new subscriber should be instantiated on. This gives the operator a much more dynamic control over resource usage and lowers the impacts of an outage in which an overloaded SG-UP, goes out of service, and disconnects many active subscribers. This provisioning approach is more dynamic and cloud like using the same approaches used today in many 5G core deployments.
The second demo story was around service based selection and demonstrates how the system can be used to dynamically connect subscribers to new edge services. In this case, the SG CP creates a linkage in the UPSF which contains a list of service group IDs and the SG-UPs can provide access to these services. The new session setup is similar to the sequence described above but, in this case, the operator policy (e.g., Radius server) will include additional information about the services (service group IDs) required by the new subscriber. Again, the UPSF makes a decision based on current loading as to which SG-UP to connect the subscriber and enables access to the requested edge services. This allows the operators to dynamically create and assign new SG-UPs based on the characteristics of the new services being offered.
The access network becomes more flexible and service aware in nature allowing the operators to match their network provisioning and spend with revenue generating services.
The next use case addressed is in field maintenance: the removal of an in-service SG -UP or the upgrading of a new SG-UP to a later software version containing new features or enhancements. This frequently occurs today when operators need to rollout new features (e.g. IPv4 to IPv6 services) or bug fixes that can require man in the van interventions or downtime late at night. Here we introduce the concept of a shard. A shard is a group of subscribers which are treated in a similar manner. The operator will initiate a SG-UP deletion request and the UPSF then identifies another in service SG-UP that can support the currently active shards/subscribers. Then via the correct SG-CP, will install the existing subscriber states onto the newly selected SG-UP. The UPSF then notifies the TSF to redirect the affected shards to the newly selected SGUPs and also reconfigures the downstream traffic toward the end user. The shards are moved from the under-maintenance SG UP to the newly selected SG-UP. When this move is completed, UPSF sends the final deletion message to the SG CP, and that SG-UP is removed from service and can then be upgraded with new software and subsequently re-instantiated in the K8 edge cluster.
Another use case is around green strategy and power optimization. The UPSF is configured to periodically check the load on each of the in-service SG-UPs, and by doing so takes time-of-day traffic heuristics into account. At busy hours, the UPSF will “scale out” SG UP instances to accommodate the peak-hour traffic demand which usually arises in the evenings. Conversely, as homes “switch off,” the UPSF can re-balance the subscriber shards and “scale in” the SG-UP nodes turning off the underlying compute resources and saving on the associated power.
To enable this new flexibility and scale the service user plane nodes (SG-UP) are implemented in a cloud native micro service fashion. This allows the user planes to be deployed onto multi-locational Kubernetes (k8) clouds and sized / scaled appropriately for the throughput and latency needs of the services being hosted at these locations.
This cloud native-based user plane architecture is similar in approach to that being taken in 5G and implements each BNG micro service in software using the vector packet processing (VPP) technologies available in the FD.IO project. Each BNGUP is implemented as a two-instance docker container POD, each of which consume two of the latest generation Intel® Xeon® SP 6338N processor cores.
For the BNG application, using the Intel® Ethernet Network Adapter E810, the Telecommunication (Comms) Dynamic Device Personalization (DDP) Package is used. Once added this package allows the Ethernet Controller to steer traffic based on Point-to-Point Protocol over Ethernet (PPPoE) header fields thus supporting Control Plane offload. The DDP PPPoE profile enables the network adapter to route packets to specific virtual functions/queues based on the unique PPPoE header fields, namely the protocol ID.
The Cloud BNG-UP instances used in the demonstration were developed by the Berlin institute of Software Defined Networking (BISDN) and Intel (See Link to paper).
The throughput of an Intel® Xeon® processor-based server with two 3rd Generation Intel Xeon Scalable Processors 6338N running vBNG container instances. The throughput scales linearly as we deploy from four through thirty two vBNG instances with increment of four instances. With thirty two instances deployed, the throughput is 661 Gbps when using RFC2544 test methodology with 0.001% packet loss. This is achieved using 96 data processing cores (1.5 cores per instance for thirty-two instances). All resources used by the BNG application are local to the socket. It is found to be I/O bound but not CPU bound.
The flexibility of the approach lends itself well to the fundamentals and flexibility required of the edge traffic steering on a multi locational architecture.
This diagram is an extract from the Vodafone, Intel and BISDN demo which was demonstrated at BBWF 2021. https://youtu.be/k9P6a71FwNo.
The graphic clearly shows the various components involved, the subscribers and shards, the traffic steering function and the active user planes providing access to the edge services.
Conclusion
Fixed access service providers have a huge opportunity to differentiate their networks by moving away from the traditional static mapping of subscriber to service and allowing individual subscriber sessions to be steered to the right location to support their service needs, including appropriate latency for edge applications. At the same time, there is a need to support more continuous deployment of new CI/CD-like software capabilities into the network without requiring long planning cycles and network outages.
A new, more dynamic Fixed Access Architecture is being defined that supports cloud native principles to enable the control and user plane of the network to be scaled up in rapid response to increases in customer traffic load or service needs and allows new features to be rolled out to the control and user planes in a continuous deployment approach without costly and time-consuming outages for maintenance.
From an individual subscriber perspective, the network can now dynamically connect their sessions to a service gateway at the right location to meet their application requirements, including emerging applications that can benefit from deployment at low latency edge-compute locations.
Appendix
vBNG Server | |
---|---|
Platform |
Intel® Server System M50CYP Family |
CPU |
2x Intel® Xeon® Gold 6338N Processor, 2.2 GHz, 32 Cores |
BIOS, Microcode |
SE5C6200.86B.0020.P24.2104020811, 04/02/2021, 0xd0002c1 |
Memory |
16x 32GB DDR4 |
Hard Drive |
Intel® SSD DC S4600 Series SSDSC2KG96(960GB) |
Network Interface Card |
4x Intel® Ethernet Network Adapter E810 -2CQDA2 (previously called Chapman Beach) |
Software | |
Host OS |
Red Hat Enterprise Linux 8.2 (Ootpa) |
vBNG |
vBNG 20.11 |
Linux Container |
Docker version 20.10.5, build 55c4c88 |
DPDK |
DPDK-v20.11 |
BIOS Settings |
P-state Disabled, Intel® Hyper-Threading Technology enabled, Enhanced Intel SpeedStep® Technology disabled, Intel® Turbo Boost Technology disabled, C-States Disabled, SRIOV and VTd enabled |
Application Configuration per Instance
Uplink | |
---|---|
Frame Size: 650B*; Subscribers: 4K/Instance; 1x vCPU per Instance | |
ACL |
SE5C6200.86B.0020.P24.2104020811, 04/02/2021, 0xd0002c1 |
Flow Classification |
16x 32GB DDR4 |
Policer/Metering |
Intel® SSD DC S4600 Series SSDSC2KG96(960GB) |
Routing |
|
Downlink | |
Frame Size: 504B*; Subscribers: 4K/Instance; 2x vCPU per Instance | |
ACL |
Reverse Path forwarding – One Rule per Subscriber (4k) |
HQoS |
4 Level HQoS – Port, Pipe, Traffic Class, and Queue |
Routing |
One Route per Subscriber (4K) |
*Frame size quoted is max size of frame at any point in processing. (e.g. uplink 128Byte =120byte +{2x4Byte access vlan tags}) | |
Test Environment Configuration Information and Relevant Variables | |
Traffic Generator |
IXIA* NOVUS* 100GE8Q28 |
Connection Details |
Ixia Ports and DUT Ports Connected Back-to-Back (Eight Connections) |