Generations of Mobile Standards

AI/ML Management for 5G Systems

Sep 11,2023

By WG SA5, Authors: Yizhi Yao (SI/WI rapporteur, Intel), Hassan Al-kanani (SI/WI co-rapporteur, NEC), Stephen Mwanje (Nokia)

The Artificial Intelligence/Machine Learning (AI/ML) techniques and relevant applications are being increasingly adopted by the wider industries and proved to be successful. These are now being applied to the telecommunication industry including mobile networks. Clearly the adoption of AI/ML technology is opening a new era for creating more business value in terms of improved system performance, higher efficiency, enhanced end use experience as well as creating new business models and use cases for 5G and future generation mobile networks, AI/ML capabilities are used in various domains in 5GS, including management and orchestration (e.g., MDA), 5GC (e.g., NWDAF), and NG-RAN (e.g., RAN intelligence).

Figure1 AI ML in 5GS

Figure 1: AI/ML in 5GS

Almost all the 3GPP WGs including both SA and RAN WGs are now somehow engaged in AI/ML relevant features/capabilities standardization activities. To support and facilitate the efficient deployment and operation of AI/ML capabilities/features with the suitable AI/ML techniques in 5GS, the ML model and AI/ML inference function need to be managed throughout its entire lifecycle.

The ML models, once deployed into the functions throughout the system, could potentially have a direct impact on the behaviour of the system, which may lead to improved or degraded performance. The capability and performance of the ML model are determined by every operational phase of its lifecycle (including training, emulation, deployment, and inference phases), therefore the AI/ML capabilities need to be operable, manageable, and accountable for each phase.

The 3GPP WG SA5 (or in short referred to as "SA5") has initially started the work to develop specifications for AI/ML management as part of the Rel-17 Management Data Analytics (MDA) work item and concluded then that a generic mechanism for managing the ML training for any kind of AI/ML enabled capabilities is needed (i.e., not just restricted to Management Data Analytics. The Rel-17 MDA specifications are documented in TS 28.104 while the specifications for AI/ML management are documented separately in TS 28.105. In Rel-18, SA5 continued the AI/ML management specifications development work starting with a dedicated comprehensive study which has recently been completed and documented in TR 28.908. The study describes the concepts and operational workflow, as well as addresses a wide range of use cases (capabilities) along with the corresponding potential requirements and solutions. As a progression from the successfully completed and detailed study, SA5 moved forward and started normative work which is currently progressing.

SA5 has already made good progress in specifying management services for managing AI/ML capabilities. The specification work continues to be documented in TS 28.105 in Rel-18. The Release 18 version of the 3GPP TS 28.105 is planned to be completed and published during the first quarter of 2024.

Multi-vendor interaction with flexibility of solutions

Deploying an ML model from a vendor into a function of another vendor is concerning and may require future further investigations and discussions which are beyond the scope of current SA5 specifications. The concerns are that this deployment may cause unpredictable complexities, including e.g., exposure of the proprietary details of ML model and associated algorithm which could lead to potential security risks to both the ML model and AI/ML inference function, and open the door for model re-engineering. Therefore, SA5 agreed to define a manageable entity in a more abstract manner around the ML model, which is called “ML entity”. The ML entity is manageable as a single composite entity, it is either an ML model or an entity containing an ML model and the ML model-related metadata. This abstraction facilitates and brings implementation flexibility while still enabling the required management capabilities.

AI/ML operational workflow

SA5 has agreed on the generic AI/ML operational workflow for an ML entity in the normative phase,  as depicted in Figure 2.

Figure 2 AI ML operational workflow

Figure 2: AI/ML operational workflow

The workflow involves four main phases; training, emulation, deployment, and inference phase. The main tasks involved in each phase are briefly described below:

Training phase:

  • ML training: training, including initial training and re-training, of an ML entity or a group of ML entities. It also includes validation of the trained ML entity to evaluate the performance variance when the ML entity performs on the training data and validation data. If the validation result does not meet the expectation (e.g., the variance is not acceptable), the ML entity needs to be re-trained. The ML training is the initial phase of the workflow.
  • ML testing: testing of the validated ML entity to evaluate the performance of the trained ML entity when it performs on testing data. If the testing result meets the expectation, the ML entity may proceed to the next phase, otherwise the ML entity may need to be re-trained.

Emulation phase:

  • ML emulation: running an ML entity or AI/ML inference function for inference in an emulation environment. The purpose is to evaluate the inference performance of the ML entity or AI/ML inference function in the emulation environment prior to applying it to the target operational network or system.

NOTE:   The emulation phase is considered optional and can be skipped in the AI/ML operational workflow.

Deployment phase:

  • ML entity loading: loading of a trained ML entity into the target AI/ML inference function which will use it for inference.

NOTE:   The deployment phase may not be needed in some cases, for example when the training function and inference function are co-located.

Inference phase:

  • AI/ML inference: performing inference using the ML entity by the AI/ML inference function.

AI/ML management capabilities

The AI/ML management study (TR 28.908) discussed and concluded more than 40 use cases related to the management of AI/ML, which are being categorized into the management capabilities for the four corresponding operational phases in the AI/ML workflow. The management capabilities are briefly described below for each of the corresponding operational phases while the concrete list of all use cases is listed in Table 1.

Management capabilities for training phase

  • ML training management: allowing the MnS consumer to request the ML entity training, consume and control the producer-initiated training, set a policy for the producer-initiated ML entity training (e.g., conditions to trigger the ML (re)training based on the AI/ML inference performance), manage the ML entity training/retraining process, training performance management, and training data management.
  • ML validation: ML training capability also includes validation to evaluate the performance of the ML entity when performing on the validation data, and to identify the variance of the performance on the training and validation data. If the variance is not acceptable, the entity would need to be tuned (re-trained) before being made available to the consumer and used for inference.
  • ML testing management: allowing the MnS consumer to request the ML entity testing, and to receive the testing results for a trained ML entity. It may also include capabilities for selecting the specific performance metrics to be used or reported by the ML testing function. MnS consumer may also be allowed to set a policy for a producer-initiated ML entity testing after training and validation, and report back on the outcome, or trigger ML entity re-training based on the ML entity testing performance requirements.

Management capabilities for emulation phase

  • AI/ML inference emulation: a capability allowing an MnS consumer to request an ML inference emulation for a specific ML entity or entities (after the training, validation, and testing) to evaluate the inference performance in an emulation environment prior to applying it to the target network or system.
  • ML inference emulation management: this capability allows an authorized MnS consumer (e.g., an operator) to manage or control and monitor a specific ML inference emulation process, e.g., to start, suspend or resume the inference emulation, and to receive the emulation

Management capabilities for deployment phase

  • AI/ML deployment control and monitoring: capabilities for loading the ML entity to the target inference function. It includes informing the consumer when new entities are available, enabling the consumer to request the loading of the ML entity or to set the policy for such deployment and to monitor the deployment process.

Management capabilities for inference phase

  • AI/ML inference control: allowing an MnS consumer to control the inference, i.e., activate/deactivate the inference function and/or ML entity/entities, including instant activation, partial activation, schedule-based or policy-based activations, configure the allowed ranges of the inference output parameters, or the context for performing inference.
  • AI/ML inference performance evaluation: allowing the MnS consumer to monitor and evaluate the inference performance of an ML entity or an AI/ML inference function.
  • AI/ML inference orchestration: to enable the MnS consumer to orchestrate the AI/ML inference functions given e.g., aspects such as the knowledge of capabilities of the inference functions, the expected and actual running context of ML entity, the AI/ML inference performance, the AI/ML inference trustworthiness, etc. For example, the MnS consumer may set the conditions to trigger specific inferences based on the expected outcomes of those inferences.

Common management capabilities for all phases

  • AI/ML trustworthiness management: allowing the MnS consumer to configure, monitor and evaluate the trustworthiness of an ML entity. This applies to the ML entity at every operational phase of the AI/ML operational workflow.

Table 1: List of use cases for AI/ML management in the operational phases:

Category

Use cases

Management Capabilities for ML training phase

Event data for ML training

Pre-processed event data for ML training

ML entity validation

ML entity validation performance reporting

ML entity testing

Consumer-requested ML entity testing

Control of ML entity testing

Multiple ML entities joint testing

ML entity re-training

Producer-initiated threshold-based ML Retraining

Efficient ML entity re-training

ML entities updating initiated by producer

ML entity joint training

Support for ML entity modularity – joint training of ML entities

Training data effectiveness

Training data effectiveness reporting

Training data effectiveness analytics

Measurement data correlation analytics for ML training

ML context management

ML context monitoring and reporting

Mobility of ML Context

Standby mode for ML entity

ML entity capability discovery and mapping

Identifying capabilities of ML entities

Mapping of the capabilities of ML entities

Performance evaluation for ML training

Performance indicator selection for ML model training

Monitoring and control of AI/ML behavior

ML entity performance indicators query and selection for ML training

ML entity performance indicators selection based on MnS consumer policy for ML training

Configuration management for ML training

Control of producer-initiated ML training

ML Knowledge Transfer Learning

Discovering sharable Knowledge

Knowledge sharing and transfer learning

Management Capabilities for ML emulation phase

ML Inference emulation

AI/ML Inference emulation

Orchestrating ML Inference emulation

Management Capabilities for ML entity deployment phase

ML entity loading

ML entity loading control and monitoring

Management Capabilities for AI/ML inference phase

AI/ML Inference History

Tracking AI/ML inference decisions and context

Orchestrating AI/ML Inference

Knowledge sharing on executed actions

Knowledge sharing on impacts of executed actions

Abstract information on impacts of executed actions

Triggering execution of AI/ML inference functions or ML entities

Orchestrating decisions of AI/ML inference functions or ML entities

Coordination between the ML capabilities

Alignment of the ML capability between 5GC/RAN and 3GPP management system

Performance evaluation for AI/ML inference

AI/ML performance evaluation in inference phase

ML entity performance indicators query and selection for AI/ML inference

ML entity performance indicators selection based on MnS consumer policy for AI/ML inference

AI/ML abstract performance

Configuration management for AI/ML inference

ML entity configuration for RAN domain ES initiated by consumer

ML entity configuration for RAN domain ES initiated by producer

Partial activation of AI/ML inference capabilities

Configuration for AI/ML inference initiated by MnS consumer

Configuration for AI/ML inference initiated by producer

Enabling policy-based activation of AI/ML capabilities

AI/ML update control

Availability of new capabilities or ML entities

Triggering ML entity update

Common management capabilities for ML training and AI/ML inference phase

Trustworthy Machine Learning

AI/ML trustworthiness indicators

AI/ML data trustworthiness

ML training trustworthiness

AI/ML inference trustworthiness

Assessment of AI/ML trustworthiness


Further reading

AI/ML management related SIDs/WIDs in Rel-17 and Rel-18:

Release

UID

Work item title

Report/

Specification

Work item document

Acronym

Rel-17

850028

Study on enhancement of Management Data Analytics Service

TR 28.809

SP-190930

FS_eMDAS

Rel-17

910027

Enhancements of Management Data Analytics

TS 28.104

TS 28.105

SP-210132

eMDAS

Rel-18

940039

Study on AI/ML management

TR 28.908

SP-211443

FS_AIML_MGMT

Rel-18

990119

AI/ML management

TS 28.105

SP-230335

AIML_MGT

 

 Relevant specifications:

  • 3GPP TS 28.105: "Management and orchestration; AI/ML management".
  • 3GPP TR 28.908: "Study on AI/ML management".
  • 3GPP TS 28.104: "Management and orchestration; Management Data Analytics".
  • 3GPP TS 23.288: "Architecture enhancements for 5G System (5GS) to support network data analytics services".
  • 3GPP TS 38.300: "NR; NR and NG-RAN Overall description; Stage-2".
  • 3GPP TS 38.401: "NG-RAN; Architecture description".