Explanation of Common Terms in the Field of Data (First Batch)

Promulgated by: National Data Administration.
Issued by the National Data Administration on December 30, 2024 by the Drafting Expert Team for Explanation of Terms in the Field of Data. Effective December 30, 2024.

Editor’s Note — DCC. The National Data Administration (国家数据局) released this first batch of standardized term explanations on December 30, 2024 as part of building consensus on data-field vocabulary. The 40 terms below establish official Chinese government definitions for foundational data-economy concepts. We have preserved the official bilingual translation as-is; minor stylistic spacing in the source (“Semi- structured”, “Non- structured”) has been corrected.

Background

In order to promote the building of consensus, with the strong support of all walks of life, we have carefully studied and developed the Explanation of Common Terms in the Field of Data (First Batch). We will subsequently make iterative improvement in light of practice and development needs and welcome the continuous attention of the community.

— Drafting Expert Team for Explanation of Terms in the Field of Data, December 30, 2024

Annex: Explanation of Common Terms in the Field of Data (First Batch)

1. 数据. “Data” refer to any recording of information in an electronic or other form. Data are referred to as primary data, derived data, data resources, data products and services, data assets, data elements, etc., under different perspectives.

2. 原始数据. “Primary data” refer to the data that are first generated or collected at the source and have not been processed.

3. 数据资源. “Data resources”, a general term for data with potential for value creation, usually refer to a collection of data recorded and saved in electronic form, readable by machine, and available for social reuse.

4. 数据要素. “Data elements” refer to the data resources that are invested into production and business activities and participate in value creation.

5. 数据产品和服务. “Data products and services” refer to the data processing products and data services that are formed on the basis of data processing and can meet specific needs.

6. 数据资产. “Data assets” refer to the data resources that are legally owned or controlled by specific subjects, can be measured in monetary terms, and can bring about economic benefits or social benefits.

7. 数据要素市场化配置. “Market-oriented allocation of data elements” refers to the allocation of data as a new type of production element under the market mechanism, in order to establish a more open, safe and efficient data circulation environment and continuously release the value of data elements.

8. 数据处理. “Data handling” includes the collection, storage, use, processing, transmission, provision and publication of data.

9. 数据处理者. “Data handler” refers to an individual or organization that independently determines the purpose and method of handling in the data handling activities.

10. 受托数据处理者. “Commissioned data handler” refers to an individual or organization that receive a commission from others to handle data.

11. 数据流通. “Data circulation” refers to the process of the flow of data between different subjects, including data opening, sharing, transaction, exchange, etc.

12. 数据交易. “Data transaction” refers to a transaction between a supplier and a demander in respect of data, in which data in a specific form is taken as subject matter and currency or other equivalent is taken as consideration.

13. 数据治理. “Data governance” refers to the process of improving the quality, security and compliance of data and promoting the effective use of data, including organizational data governance, industry data governance, social data governance, etc.

14. 数据安全. “Data security” refers to ensuring that data are in a state of effective protection and lawful use by taking necessary measures, as well as having the ability to maintain a continuous state of security.

15. 公共数据. “Public data” refer to the data generated in the process of legally performing their duties or providing public services by the Party and government organs at all levels, enterprises and public institutions.

16. 数字产业化. “Digital industrialization” refers to the process of transforming digital technologies, such as mobile communication and artificial intelligence, into digital products and services and the transformation of data into resources and elements to form new digital industries, new business forms and new models.

17. 产业数字化. “Industrial digitalization” refers to the process in which traditional agriculture, industry, service industry and other industries apply digital technologies, collect and integrate data and mine the value of data resources to improve the efficiency of business operation, reduce the costs of production and operation, reconstruct the thinking and cognition, completely rebuild the mode of organization and management, systematically reform the process of production and operation, and constantly improve the total factor productivity.

18. 数字经济高质量发展. “The high-quality development of the digital economy” refers to the new stage of the development of the digital economy, in which the reform of market-oriented allocation of data elements is the main line and the goal of making the digital economy stronger, better and larger by improving the basic data system and digital infrastructure in a coordinated manner, comprehensively promoting the deep integration of digital technologies and real economy, and continuously improving the governance capacity and level of international cooperation of the digital economy, is achieved.

19. 数字消费. “Digital consumption” refers to the consumption activities and consumption patterns that are formed with digital technologies and application support, which include not only the consumption of digital intelligence technologies, products and services, but also the digitalization and intelligence of consumption contents, channels and environment, and the new consumption pattern with deep integration of online and offline.

20. 产业互联网. “Industrial Internet” refers to the process in which digital technologies and data elements are used to promote the data integration of the whole industry chain, enable the digitalized, network-based and intelligent development of the industry, promote the reorganization and reform of business processes, organizational structures and production modes etc., achieve the collaborative transformation of upstream and downstream of the industry chain, integrate online and offline development, reduce costs and increase efficiency and achieve high-quality development of the whole industry, and thus form a new system of industrial collaboration, resource allocation and value creation.

21. 城市全域数字化转型. “Citywide digital transformation” refers to the new mode of high-quality urban development in which cities reconstruct technical frameworks, reform urban management process and deeply integrate industries with cities by comprehensively deepening the data integration, development and utilization as the main line and comprehensively using digital technologies and institutional innovation tools, so as to promote the efficiency improvement in all areas of digital transformation, the all-round enhancement of support capability, and the optimization of the whole ecological process of transformation.

22. 东数西算工程. The “East Data and West Computing” project is a key project whereby data and demands arising from economic activities in eastern regions are computed and processed in western regions under overall planning for data center in terms of layout, network, electricity power, energy consumption, computing power and data, etc. For such business scenarios as the training and reasoning of artificial intelligence models and machine learning, eastern businesses may be relocated to areas with abundant wind, water, and electricity in western regions to achieve coordinated development of the eastern and western regions by way of “East Data and West Computing”. Accelerating the construction of the “East Data and West Computing” project will effectively stimulate the innovation vitality of data elements, speed up the process of digital industrialization and industrial digitization, generate new technologies, new industries, new types of business and new models, and support high-quality economic development.

23. 高速数据网. “High-speed data network” refers to the provision of data transmission services with flexible bandwidth, security, reliability and efficient transmission by relying on network virtualization, software definition network (SDN) and other technologies for data circulation and utilization scenarios.

24. 全国一体化算力网. “Integrated national computing power network” refers to digital infrastructure, which takes information network technology as carrier, to promote a high proportion of various computing power resources nationwide and large-scale integrated scheduling and operation. As the 2.0 version of the “East Data and West Computing” project, it has four typical characteristics: intensification, integration, synergy and value.

25. 元数据. “Metadata” refer to the data that define and describe specific data, which provide information about the structure, characteristics and relationships of data, and help to organize, search, understand and manage data.

26. 结构化数据. “Structured data” refer to a data representation form in which the structure of each record that is a collection of data elements is consistent and can be effectively described with a relational model.

27. 半结构化数据. “Semi-structured data” refer to a form of data structure that does not conform to the structure of the data model associated with relational databases or other forms of data tables but contains relevant tags to separate semantic elements and hierarchies of records and fields.

28. 非结构化数据. “Non-structured data” refer to the data that does not have a predefined model or is not organized in a predefined manner.

29. 数据分析. “Data analysis” refers to the process of sorting, studying, reasoning and summarizing data with specific techniques and methods, so as to extract useful information, find rules and form conclusions from the data.

30. 数据挖掘. “Data mining” refers to a means of data analysis, which is the process of mining information or value hidden in data with statistical analysis, machine learning, pattern recognition, expert system and other technologies.

31. 数据可视化. “Data visualization” refers to the process of clearly and effectively conveying the useful information contained in the data by statistical charts, graphs, maps and other graphic means, so as to facilitate better understanding and analysis of data by data users.

32. 数据仓库. “Data warehouse” refers to a database that is used for permanent storage of data after data preparation.

33. 数据湖. “Data lake” refers to a highly expandable data storage architecture, which is specially used for the storage of large amounts of original data and derived data from various sources and existing in different formats, including structured, semi- structured and unstructured data.

34. 湖仓一体. “The integration of lake and warehouse” refers to a new and open storage architecture, which connects the data warehouse and the data lake, and integrates the high performance and management capability of the data warehouse with the flexibility of the data lake, in which the bottom layer supports multiple data types and can realize the mutual sharing of the data, and the upper layer can access through a uniform encapsulated interface and can support real-time query and analysis at the same time.

35. 隐私保护计算. “Privacy-protective computation” refers to a type of information technology used to analyze and compute data on the premise that the data provider will not divulge the original data, in order to ensure that data “may be available but may not be visible” in each link of the whole process of data flow including data generation, storage, computation, application and destruction etc. The common technical schemes of privacy-protective computing include secure multi-party computing, federated learning, trusted execution environment, cryptographic computing and so on. The common underlying technologies include confusion circuit, inadvertent transmission, secret sharing, homomorphic encryption and so on.

36. 安全多方计算. “Secure multi-party computing” refers to that in a distributed network, multiple participating entities respectively hold secret data, and they want to use these data as inputs to jointly complete the computation on a certain function, while each participating entity is required to obtain no input information from other participating entities except the computation result and information that is expected to be disclosed. Secure multi-party computing mainly studies the problem of secure multi-party collaborative computation without a trusted third party.

37. 联邦学习. “Federated learning” refers to a mode in which multiple participants exchange intermediate computation results in a manner of protecting private data, so as to cooperate to complete a machine learning task, on the premise that their original private data do not go out of the trusted domain defined by the data provider.

38. 可信执行环境. “Trusted execution environment” refers to a software running environment that is built to ensure the confidentiality, integrity, authenticity and non-repudiation of data and codes relating to security-sensitive applications based on hardware-level isolation and secure boot mechanism.

39. 密态计算. “Cryptographic computing” refers to that by making comprehensive use of cryptography, trusted hardware and system security related technologies, data in the computation process can be used and invisible, and computation results can be kept in cryptographic state, so as to support the construction of complex combination computation, achieve computation full-link security, and prevent data leakage and abuse.

40. 区块链. “Blockchain” is a new database software integrated with distributed network, encryption technology, smart contract and other technologies, which has the characteristics of multi-centrality, consensus trusted, tamper-proof and traceability etc. and is mainly used to solve the trust and security problems in the process of data flow.

Explanation of Common Terms in the Field of Data (First Batch).

Background

Annex: Explanation of Common Terms in the Field of Data (First Batch)

Briefs on this law.

What a 'Data-Asset ABS' Actually Securitises — The Collateral Is Data, the Cash Flow Is Not

Cold Water on 'Token Trading' — Wang Qinglan on the NDA's High-Quality Data Set Initiative

What Is Actually Traded on China's Data Exchanges — A Bakery Metaphor