The valorization of data as a factor of production faces widespread challenges
including insufficient supply
restricted circulation
and ineffective utilization. The core reason lies in the immaturity of data production modes
where highquality datasets still rely on workshopstyle production that fails to meet the largescale data demands of Artificial Intelligence (AI) large models. To address this problem
the concept of "Data Factory" is proposed and defined as a data infrastructure dedicated to the facilitybased
largescale
and standardized production of highquality datasets for AI large model applications. By tracing the evolution of infrastructure forms across industrial society
information society
and dataintelligent society
the theoretical logic of Data Factory as a fundamental building block of national data infrastructure is established. Based on characteristics such as physical distribution
organizational structure
and technological sophistication
Data Factories are classified into three types: centralized
semicentralized
and distributed. Five key features are identified: diversity
facilityorientation
scalability
standardization
and AIintegration. The study concludes that the development of Data Factories can effectively break through the data supply bottleneck in AI development
promote upstream and downstream collaboration in the data industry chain
and serve as a critical path to bridge the "last mile" gap between data and AI empowerment.