Activate massive "sleeping data" and my country's data industry will reach 7.5 trillion yuan in 2030

CCTV News2025-05-18

On the 17th, the reporter learned from the 2025 Data Security Development Conference that my country will cultivate and expand a number of upstream and downstream enterprises in the data factor industry chain. It is estimated that by 2030, the scale of my country's data industry will reach 7.5 trillion yuan.

Open sharing of public data

Activate massive "sleeping data"

As the world's first country to include data into production factors, my country has initially built a complete data industry chain. Data shows that my country's annual data production in 2024 reached 41.06 zet bytes, a year-on-year increase of 25%.

As of now, there are more than 190,000 related companies in the data field in my country, and the scale of the data industry exceeds 2 trillion yuan. Based on the annual growth rate of more than 20%, the scale of my country's data industry will reach 7.5 trillion yuan in 2030.

Director of the National Data Administration Liu Liehong: At present, we are planning to build a horizontally connected, vertically connected, and coordinated data infrastructure system, and basically build the main structure of the national data infrastructure by 2029.

Open sharing of public data has become an important breakthrough in the marketization of data elements. In 2024, the number of local public data open platforms at or above the municipal level nationwide increased by 7.5%, the number of open data increased by 7.1%, and the number of high-quality data sets increased by 27.4% year-on-year.

In terms of the integration of data elements and industries, the country is accelerating the opening-up barriers to public data sharing, promoting the deep integration of public data and enterprise data, and activate a massive "sleeping data".

Build high-quality data sets

Accelerate the development of artificial intelligence

At present, data has surpassed traditional production factors and has become the core driving force for breakthroughs in artificial intelligence technology and industrial transformation. High-quality data sets are not only the cornerstone of the leap in artificial intelligence model performance, but also reshape the entire industrial chain from technological research and development to commercial implementation. So how are high-quality data sets built?

In Wenzhou, Zhejiang, as a "test field" for the national market-oriented reform of data elements, a data security and compliance system has been built here to ensure the large-scale flow of data elements, form a data trading ecosystem, and make more data "live".

Jin Chuanla, Deputy Director of the Data Bureau of Wenzhou City, Zhejiang Province: 469 "practical, easy-to-use and safe" data products have been created, and a batch of high-quality data sets have been built in the fields of medical care, transportation, low-altitude economy, etc.

Technical personnel told reporters that building large model data sets mainly includes core links such as data collection, data cleaning, data annotation, and quality evaluation. Each link needs to carry out targeted technology research and development and adaptation based on the characteristics of the large-scale, sufficient diversity, and strong vertical attributes of the industry.

Professor Huang Tiejun, School of Computer Science, Peking University: Most of the data for text types, literature, books, papers, research reports, have been used. In the future, more non-textual things are still needed, such as images, videos, and various sensors. These data are also an important source of large-scale model learning.

Data annotation and cleaning are key links in the construction of high-quality data sets.

Data annotation teaches artificial intelligence to "know the world" through "labeling". Unlabeled data is like garbled textbooks, resulting in artificial intelligence being unable to learn effectively;

Data cleaning purifies data by removing duplicates and correcting errors, and chaotic data will directly affect the effectiveness of artificial intelligence training.

Liu Quan, deputy chief engineer of Cydie Research Institute: Only when the data covers a wide enough scenario and is professionally marked can the AI ​​model break through "laboratory accuracy", truly have the ability to implement industries, and drive the development of the digital economy.

The output value of my country's data labeling industry exceeds 8 billion yuan

The "2025 High-Quality Data Set Research Report" released at the 2025 Data Security Development Conference shows that with the iteration of artificial intelligence and large-scale model technology, the output value of my country's data labeling industry has exceeded 8 billion yuan, and the construction of high-quality data has entered a new stage of large-scale and standardized development.

In 2024, the number of enterprises developing or applying artificial intelligence in my country increased by 36% year-on-year, and the number of high-quality data sets increased by 27.4% year-on-year, strongly supporting artificial intelligence training and application. Data technology companies using large models and data application companies increased by 57.21% and 37.14% year-on-year respectively.

Liu Wenqiang, Vice President of Cydie Research Institute: The parameters of our big model have reached hundreds of billions of levels. Promote the construction of seven data labeling bases across the country, build 335 high-quality data sets in the fields of medical care, industry, education, etc., with a total marking scale of 1.7 trillion TB, supporting the research and development of 121 domestic large models.

The report shows that my country is currently accelerating the innovation and development of high-quality data sets, but it still faces problems such as small data stocks and low outputs, uneven quality of data sets, lack of mainstream high-value data guidance, and low data utilization efficiency.

Liu Quan, deputy chief engineer of Cydie Research Institute: Do a good job in data source control and ensure the reliability and integrity of data sources. Strengthen data privacy and security guarantees and promote the construction of data set security assessment capabilities.

(CCTV reporters Wang Shiyu, Zhang Wei, Tang Zhijian, Zhang Yan, Han Dong)

Looking at Xinjiang for projects from major countries | West-East Gas Pipeline Project benefited nearly 500 million people in my country

2025-05-18

Take root in one place, look at the whole country, and go to the world - Zhejiang accelerates the construction of a high-level open and powerful province

2025-05-18

Believe in China means believing in tomorrow

2025-05-18

Take multiple measures to launch a "combination punch" to stabilize employment

2025-05-18

Take multiple measures to launch a "combination punch" to stabilize employment

2025-05-18

"Tiangong" adds new customers! Space station discovers new microbial species, space biological mysteries exploration is expected

2025-05-18

Data elements and industries are accelerating integration. The scale of my country's data industry will reach 7.5 trillion yuan in 2030

2025-05-18

Beidou system has fully entered the standard system of 11 international organizations

2025-05-18