1. 北京物资学院计算机与人工智能学院
2. 北京联海信息系统有限公司
3. 中国信息安全测评中心
纸质出版:2026
移动端阅览
张茜茜1, 杨光3. 数据工厂:国家数据基础设施的新兴业态[J]. 网络安全与数据治理, 2026,(4).
Zhang Qianqian1, Yin Hongyu2. Data Factory: an emerging form of national data infrastructure[J]. 2026, (4).
张茜茜1, 杨光3. 数据工厂:国家数据基础设施的新兴业态[J]. 网络安全与数据治理, 2026,(4). DOI: 10.19358/j.issn.2097-1788.2026.04.001.
Zhang Qianqian1, Yin Hongyu2. Data Factory: an emerging form of national data infrastructure[J]. 2026, (4). DOI: 10.19358/j.issn.2097-1788.2026.04.001.
数据要素化价值化面临“供不出、流不动、用不好”的普遍难题,其核心原因在于数据生产业态尚未成熟,高质量数据集仍以作坊式生产为主,无法满足人工智能大模型对数据的规模化需求。针对这一问题,提出“数据工厂”这一概念,将其界定为面向人工智能大模型应用,开展高质量数据集设施化、规模化、标准化生产的数据基础设施。通过梳理工业社会、信息社会和数智社会基础设施业态的演进规律,论证了数据工厂作为国家数据基础设施基本构成单元的理论逻辑。在此基础上,依据物理分布、组织方式和技术水平等特征,将数据工厂划分为集中式、半集中式和分布式三种类型,并归纳出多样化、设施化、规模化、标准化和人工智能化五大特点。研究认为,发展数据工厂能够有效突破人工智能数据供给瓶颈,推动数据产业链上下游协同,是打通数据赋能人工智能“最后一公里”的关键路径。
The valorization of data as a factor of production faces widespread challenges
including insufficient supply
restricted circulation
and ineffective utilization. The core reason lies in the immaturity of data production modes
where highquality datasets still rely on workshopstyle production that fails to meet the largescale data demands of Artificial Intelligence (AI) large models. To address this problem
the concept of "Data Factory" is proposed and defined as a data infrastructure dedicated to the facilitybased
largescale
and standardized production of highquality datasets for AI large model applications. By tracing the evolution of infrastructure forms across industrial society
information society
and dataintelligent society
the theoretical logic of Data Factory as a fundamental building block of national data infrastructure is established. Based on characteristics such as physical distribution
organizational structure
and technological sophistication
Data Factories are classified into three types: centralized
semicentralized
and distributed. Five key features are identified: diversity
facilityorientation
scalability
standardization
and AIintegration. The study concludes that the development of Data Factories can effectively break through the data supply bottleneck in AI development
promote upstream and downstream collaboration in the data industry chain
and serve as a critical path to bridge the "last mile" gap between data and AI empowerment.
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010602201714号