By WG SA5, Authors: Yizhi Yao (SI/WI rapporteur, Intel), Hassan Al-kanani (SI/WI co-rapporteur, NEC), Stephen Mwanje (Nokia)
The Artificial Intelligence/Machine Learning (AI/ML) techniques and relevant applications are being increasingly adopted by the wider industries and proved to be successful. These are now being applied to the telecommunication industry including mobile networks. Clearly the adoption of AI/ML technology is opening a new era for creating more business value in terms of improved system performance, higher efficiency, enhanced end use experience as well as creating new business models and use cases for 5G and future generation mobile networks, AI/ML capabilities are used in various domains in 5GS, including management and orchestration (e.g., MDA), 5GC (e.g., NWDAF), and NG-RAN (e.g., RAN intelligence).
Figure 1: AI/ML in 5GS
Almost all the 3GPP WGs including both SA and RAN WGs are now somehow engaged in AI/ML relevant features/capabilities standardization activities. To support and facilitate the efficient deployment and operation of AI/ML capabilities/features with the suitable AI/ML techniques in 5GS, the ML model and AI/ML inference function need to be managed throughout its entire lifecycle.
The ML models, once deployed into the functions throughout the system, could potentially have a direct impact on the behaviour of the system, which may lead to improved or degraded performance. The capability and performance of the ML model are determined by every operational phase of its lifecycle (including training, emulation, deployment, and inference phases), therefore the AI/ML capabilities need to be operable, manageable, and accountable for each phase.
The 3GPP WG SA5 (or in short referred to as "SA5") has initially started the work to develop specifications for AI/ML management as part of the Rel-17 Management Data Analytics (MDA) work item and concluded then that a generic mechanism for managing the ML training for any kind of AI/ML enabled capabilities is needed (i.e., not just restricted to Management Data Analytics. The Rel-17 MDA specifications are documented in TS 28.104 while the specifications for AI/ML management are documented separately in TS 28.105. In Rel-18, SA5 continued the AI/ML management specifications development work starting with a dedicated comprehensive study which has recently been completed and documented in TR 28.908. The study describes the concepts and operational workflow, as well as addresses a wide range of use cases (capabilities) along with the corresponding potential requirements and solutions. As a progression from the successfully completed and detailed study, SA5 moved forward and started normative work which is currently progressing.
SA5 has already made good progress in specifying management services for managing AI/ML capabilities. The specification work continues to be documented in TS 28.105 in Rel-18. The Release 18 version of the 3GPP TS 28.105 is planned to be completed and published during the first quarter of 2024.
Multi-vendor interaction with flexibility of solutions
Deploying an ML model from a vendor into a function of another vendor is concerning and may require future further investigations and discussions which are beyond the scope of current SA5 specifications. The concerns are that this deployment may cause unpredictable complexities, including e.g., exposure of the proprietary details of ML model and associated algorithm which could lead to potential security risks to both the ML model and AI/ML inference function, and open the door for model re-engineering. Therefore, SA5 agreed to define a manageable entity in a more abstract manner around the ML model, which is called “ML entity”. The ML entity is manageable as a single composite entity, it is either an ML model or an entity containing an ML model and the ML model-related metadata. This abstraction facilitates and brings implementation flexibility while still enabling the required management capabilities.
AI/ML operational workflow
SA5 has agreed on the generic AI/ML operational workflow for an ML entity in the normative phase, as depicted in Figure 2.
Figure 2: AI/ML operational workflow
The workflow involves four main phases; training, emulation, deployment, and inference phase. The main tasks involved in each phase are briefly described below:
Training phase:
- ML training: training, including initial training and re-training, of an ML entity or a group of ML entities. It also includes validation of the trained ML entity to evaluate the performance variance when the ML entity performs on the training data and validation data. If the validation result does not meet the expectation (e.g., the variance is not acceptable), the ML entity needs to be re-trained. The ML training is the initial phase of the workflow.
- ML testing: testing of the validated ML entity to evaluate the performance of the trained ML entity when it performs on testing data. If the testing result meets the expectation, the ML entity may proceed to the next phase, otherwise the ML entity may need to be re-trained.
Emulation phase:
- ML emulation: running an ML entity or AI/ML inference function for inference in an emulation environment. The purpose is to evaluate the inference performance of the ML entity or AI/ML inference function in the emulation environment prior to applying it to the target operational network or system.
NOTE: The emulation phase is considered optional and can be skipped in the AI/ML operational workflow.
Deployment phase:
- ML entity loading: loading of a trained ML entity into the target AI/ML inference function which will use it for inference.
NOTE: The deployment phase may not be needed in some cases, for example when the training function and inference function are co-located.
Inference phase:
- AI/ML inference: performing inference using the ML entity by the AI/ML inference function.
AI/ML management capabilities
The AI/ML management study (TR 28.908) discussed and concluded more than 40 use cases related to the management of AI/ML, which are being categorized into the management capabilities for the four corresponding operational phases in the AI/ML workflow. The management capabilities are briefly described below for each of the corresponding operational phases while the concrete list of all use cases is listed in Table 1.
Management capabilities for training phase
- ML training management: allowing the MnS consumer to request the ML entity training, consume and control the producer-initiated training, set a policy for the producer-initiated ML entity training (e.g., conditions to trigger the ML (re)training based on the AI/ML inference performance), manage the ML entity training/retraining process, training performance management, and training data management.
- ML validation: ML training capability also includes validation to evaluate the performance of the ML entity when performing on the validation data, and to identify the variance of the performance on the training and validation data. If the variance is not acceptable, the entity would need to be tuned (re-trained) before being made available to the consumer and used for inference.
- ML testing management: allowing the MnS consumer to request the ML entity testing, and to receive the testing results for a trained ML entity. It may also include capabilities for selecting the specific performance metrics to be used or reported by the ML testing function. MnS consumer may also be allowed to set a policy for a producer-initiated ML entity testing after training and validation, and report back on the outcome, or trigger ML entity re-training based on the ML entity testing performance requirements.
Management capabilities for emulation phase
- AI/ML inference emulation: a capability allowing an MnS consumer to request an ML inference emulation for a specific ML entity or entities (after the training, validation, and testing) to evaluate the inference performance in an emulation environment prior to applying it to the target network or system.
- ML inference emulation management: this capability allows an authorized MnS consumer (e.g., an operator) to manage or control and monitor a specific ML inference emulation process, e.g., to start, suspend or resume the inference emulation, and to receive the emulation
Management capabilities for deployment phase
- AI/ML deployment control and monitoring: capabilities for loading the ML entity to the target inference function. It includes informing the consumer when new entities are available, enabling the consumer to request the loading of the ML entity or to set the policy for such deployment and to monitor the deployment process.
Management capabilities for inference phase
- AI/ML inference control: allowing an MnS consumer to control the inference, i.e., activate/deactivate the inference function and/or ML entity/entities, including instant activation, partial activation, schedule-based or policy-based activations, configure the allowed ranges of the inference output parameters, or the context for performing inference.
- AI/ML inference performance evaluation: allowing the MnS consumer to monitor and evaluate the inference performance of an ML entity or an AI/ML inference function.
- AI/ML inference orchestration: to enable the MnS consumer to orchestrate the AI/ML inference functions given e.g., aspects such as the knowledge of capabilities of the inference functions, the expected and actual running context of ML entity, the AI/ML inference performance, the AI/ML inference trustworthiness, etc. For example, the MnS consumer may set the conditions to trigger specific inferences based on the expected outcomes of those inferences.
Common management capabilities for all phases
- AI/ML trustworthiness management: allowing the MnS consumer to configure, monitor and evaluate the trustworthiness of an ML entity. This applies to the ML entity at every operational phase of the AI/ML operational workflow.
Table 1: List of use cases for AI/ML management in the operational phases:
Category |
Use cases |
Management Capabilities for ML training phase |
|
Event data for ML training |
Pre-processed event data for ML training |
ML entity validation |
ML entity validation performance reporting |
ML entity testing |
Consumer-requested ML entity testing |
Control of ML entity testing |
|
Multiple ML entities joint testing |
|
ML entity re-training |
Producer-initiated threshold-based ML Retraining |
Efficient ML entity re-training |
|
ML entities updating initiated by producer |
|
ML entity joint training |
Support for ML entity modularity – joint training of ML entities |
Training data effectiveness |
Training data effectiveness reporting |
Training data effectiveness analytics |
|
Measurement data correlation analytics for ML training |
|
ML context management |
ML context monitoring and reporting |
Mobility of ML Context |
|
Standby mode for ML entity |
|
ML entity capability discovery and mapping |
Identifying capabilities of ML entities |
Mapping of the capabilities of ML entities |
|
Performance evaluation for ML training |
Performance indicator selection for ML model training |
Monitoring and control of AI/ML behavior |
|
ML entity performance indicators query and selection for ML training |
|
ML entity performance indicators selection based on MnS consumer policy for ML training |
|
Configuration management for ML training |
Control of producer-initiated ML training |
ML Knowledge Transfer Learning |
Discovering sharable Knowledge |
Knowledge sharing and transfer learning |
|
Management Capabilities for ML emulation phase |
|
ML Inference emulation |
AI/ML Inference emulation |
Orchestrating ML Inference emulation |
|
Management Capabilities for ML entity deployment phase |
|
ML entity loading |
ML entity loading control and monitoring |
Management Capabilities for AI/ML inference phase |
|
AI/ML Inference History |
Tracking AI/ML inference decisions and context |
Orchestrating AI/ML Inference |
Knowledge sharing on executed actions |
Knowledge sharing on impacts of executed actions |
|
Abstract information on impacts of executed actions |
|
Triggering execution of AI/ML inference functions or ML entities |
|
Orchestrating decisions of AI/ML inference functions or ML entities |
|
Coordination between the ML capabilities |
Alignment of the ML capability between 5GC/RAN and 3GPP management system |
Performance evaluation for AI/ML inference |
AI/ML performance evaluation in inference phase |
ML entity performance indicators query and selection for AI/ML inference |
|
ML entity performance indicators selection based on MnS consumer policy for AI/ML inference |
|
AI/ML abstract performance |
|
Configuration management for AI/ML inference |
ML entity configuration for RAN domain ES initiated by consumer |
ML entity configuration for RAN domain ES initiated by producer |
|
Partial activation of AI/ML inference capabilities |
|
Configuration for AI/ML inference initiated by MnS consumer |
|
Configuration for AI/ML inference initiated by producer |
|
Enabling policy-based activation of AI/ML capabilities |
|
AI/ML update control |
Availability of new capabilities or ML entities |
Triggering ML entity update |
|
Common management capabilities for ML training and AI/ML inference phase |
|
Trustworthy Machine Learning |
AI/ML trustworthiness indicators |
AI/ML data trustworthiness |
|
ML training trustworthiness |
|
AI/ML inference trustworthiness |
|
Assessment of AI/ML trustworthiness |
Further reading
AI/ML management related SIDs/WIDs in Rel-17 and Rel-18:
Release |
UID |
Work item title |
Report/ Specification |
Work item document |
Acronym |
Rel-17 |
850028 |
Study on enhancement of Management Data Analytics Service |
SP-190930 |
FS_eMDAS |
|
Rel-17 |
910027 |
Enhancements of Management Data Analytics |
TS 28.105 |
SP-210132 |
eMDAS |
Rel-18 |
940039 |
Study on AI/ML management |
TR 28.908 |
SP-211443 |
FS_AIML_MGMT |
Rel-18 |
990119 |
AI/ML management |
TS 28.105 |
SP-230335 |
AIML_MGT |
Relevant specifications:
- 3GPP TS 28.105: "Management and orchestration; AI/ML management".
- 3GPP TR 28.908: "Study on AI/ML management".
- 3GPP TS 28.104: "Management and orchestration; Management Data Analytics".
- 3GPP TS 23.288: "Architecture enhancements for 5G System (5GS) to support network data analytics services".
- 3GPP TS 38.300: "NR; NR and NG-RAN Overall description; Stage-2".
- 3GPP TS 38.401: "NG-RAN; Architecture description".