Research on semantic-enhanced spatial perception codec technology

Chen Mingkai; Liu Qinyan

doi:10.11959/j.issn.1000-436x.2026069

Chinese

您当前的位置：

首页 >

文章列表页 >

Research on semantic-enhanced spatial perception codec technology

Papers | 更新时间：2026-05-07

- Research on semantic-enhanced spatial perception codec technology
- Journal on Communications Vol. 47, Issue 4, Pages: 192-203(2026)
- 作者机构：
  
  南京邮电大学通信与信息工程学院，江苏南京 210003
- 作者简介：
- 基金信息：
  
  The National Natural Science Foundation of China(62001246);The Key Research and Development Program of Jiangsu Province(BE2023035);Open Research Fund of Jiangsu Engineering Research Center of Communication and Network Technology(JERCCN202301)
- DOI：10.11959/j.issn.1000-436x.2026069
  CLC： TN92
- Received：26 November 2025，
  
  Revised：2026-03-14，
  
  Accepted：16 March 2026，
  
  Published：20 April 2026
- 稿件说明：
移动端阅览
陈鸣锴,刘沁妍.语义赋能的空间感知编解码技术研究[J].通信学报,2026,47(04):192-203.

Chen Mingkai,Liu Qinyan.Research on semantic-enhanced spatial perception codec technology[J].Journal on Communications,2026,47(04):192-203.
陈鸣锴,刘沁妍.语义赋能的空间感知编解码技术研究[J].通信学报,2026,47(04):192-203. DOI： 10.11959/j.issn.1000-436x.2026069.

Chen Mingkai,Liu Qinyan.Research on semantic-enhanced spatial perception codec technology[J].Journal on Communications,2026,47(04):192-203. DOI： 10.11959/j.issn.1000-436x.2026069.

摘要

针对特种机器人在大规模场景实时重建时所面临的计算复杂性、网络传输效率和高保真度语义重建等问题，提出了一种语义赋能的空间感知编解码技术。首先，采用BEV特征映射将点云转换为结构化张量；其次，基于语义信息生成二值语义掩码实现关键区域定位和数据稀疏化；再次，构建层次化熵编码框架，通过可学习量化和超先验概率模型实现高效压缩；最后，解码采用具身智能场景语义对齐重建确保几何与语义高保真恢复。实验结果表明，所提方法在保持良好重建质量的同时实现了高效的数据传输，充分验证了该方法的有效性和鲁棒性。

Abstract

To address the challenges of computational complexity

network transmission efficiency

and high-fidelity semantic reconstruction faced by specialized robots in large-scale scene real-time reconstruction

a semantic-enhanced spatial perception codec technology was proposed. Firstly

BEV feature mapping was adopted to convert point clouds into structured tensors. Secondly

binary semantic masks were generated based on semantic information to achieve key region localization and data sparsification. Thirdly

a hierarchical entropy coding framework was constructed to realize efficient compression through learnable quantization and hyper-prior probability models. Finally

decoding adopts embodied intelligent scene semantic alignment reconstruction to ensure high-fidelity restoration of geometry and semantics. Experimental results demonstrate that the proposed method achieves efficient data transmission while maintaining good reconstruction quality

which fully validates the effectiveness and robustness of the method.

关键词

Keywords

references

Bogue R . The role of robots in environmental monitoring [J ] . Industrial Robot: The International Journal of Robotics Research and Application , 2023 , 50 ( 3 ): 369 - 375 .

Lee D , Jung M , Yang W , et al . LiDAR odometry survey: recent advancements and remaining challenges [J ] . Intelligent Service Robotics , 2024 , 17 ( 2 ): 95 - 118 .

Guo Y L , Wang H Y , Hu Q Y , et al . Deep learning for 3D point clouds: a survey [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 12 ): 4338 - 4364 .

Hughes N , Chang Y , Hu S Y , et al . Foundations of spatial perception for robotics: hierarchical representations and real-time systems [J ] . The International Journal of Robotics Research , 2024 , 43 ( 10 ): 1457 - 1505 .

Dawarka V , Bekaroo G . Building and evaluating cloud robotic systems: a systematic review [J ] . Robotics and Computer-Integrated Manufacturing , 2022 , 73 : 102240 .

Fang G C , Hu Q Y , Wang H Y , et al . 3DAC: learning attribute compression for point clouds [C ] // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2022 : 14799 - 14808 .

Song C H , Blukis V , Tremblay J , et al . RoboSpatial: teaching spatial understanding to 2D and 3D vision-language models for robotics [C ] // Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2025 : 15768 - 15780 .

Shridhar M , Manuelli L , Fox D . CLIPort: what and where pathways for robotic manipulation [C ] // Proceedings of the Conference on Robot Learning . New York : PMLR , 2022 : 894 - 906 .

Azuma D , Miyanishi T , Kurita S , et al . ScanQA: 3D question answering for spatial scene understanding [C ] // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2022 : 19107 - 19117 .

Leutenegger S . OKVIS2: realtime scalable visual-inertial SLAM with loop closure [PP ] . V2 . ( 2022-08-12 )[ 2025-11-26 ] . arXiv: arXiv. 2202.09199.

Bultmann S , Behnke S . 3D semantic scene perception using distributed smart edge sensors [C ] // International Conference on Intelligent Autonomous Systems . Berlin : Springer , 2023 : 313 - 329 .

Chen B Y , Xia F , Ichter B , et al . Open-vocabulary queryable scene representations for real world planning [C ] // Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2023 : 11509 - 11522 .

Jia Y P , He J , Chen R Z , et al . OccupancyDETR: making semantic scene completion as straightforward as object detection [PP ] . V3 . ( 2024-05-18 ) [ 2025-11-26 ] . arXiv: arXiv. 2309 . 08504 .

Li B H , Jin X , Wang J N , et al . OccScene: semantic occupancy-based cross-task mutual learning for 3D scene generation [PP ] . V2 . ( 2025-08-22 ) [ 2025-11-26 ] . arXiv: arXiv. 2412 . 11183 .

Qi Z Y , Zhang Z X , Fang Y , et al . GPT4Scene: understand 3D scenes from videos with vision-language models [PP ] . V4 . ( 2025-03-11 ) [ 2025-11-26 ] . arXiv: arXiv. 2501 . 01428 .

Jiang J P , Xiao W Y , Lin Z Y , et al . SOLAMI: social vision-language-action modeling for immersive interaction with 3D autonomous characters [C ] // Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2025 : 26887 - 26898 .

Fu H S , Liang F , Lin J P , et al . Learned image compression with Gaussian-Laplacian-logistic mixture model and concatenated residual modules [J ] . IEEE Transactions on Image Processing , 2023 , 32 : 2063 - 2076 .

He J , Gong H X , Lu H Y . Design of fractal image coding compression and transmission model based on wavelet transform [C ] // International Conference on Advanced Hybrid Information Processing . Berlin : Springer , 2022 : 15 - 25 .

Yuan F , Zhan L H , Pan P W , et al . Low bit-rate compression of underwater image based on human visual system [J ] . Signal Processing: Image Communication , 2021 , 91 : 116082 .

Yang J H , Yu H , Li P , et al . Real-time D-PMU data compression for edge computing devices in digital distribution networks [J ] . IEEE Transactions on Power Systems , 2024 , 39 ( 4 ): 5712 - 5725 .

Lu M , Guo P Y , Shi H Q , et al . Transformer-based image compression [PP ] . V1 . ( 2021-11-12 ) [ 2025-11-26 ] . arXiv: arXiv. 2111 . 06707 .

Khoshkhahtinat A , Zafari A , Mehta P M , et al . Neural-based video compression on solar dynamics observatory images [J ] . IEEE Transactions on Aerospace and Electronic Systems , 2024 , 60 ( 5 ): 6685 - 6701 .

Yasuda M , Ohishi Y , Saito S , et al . Multi-view and multi-modal event detection utilizing transformer-based multi-sensor fusion [C ] // Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2022 : 4638 - 4642 .

Zhou Q H , Chen S L , Wang Y S , et al . HAZARD challenge: embodied decision making in dynamically changing environments [PP ] . V1 . ( 2024-01-23 ) [ 2025-11-26 ] . arXiv: arXiv. 2401 . 12975 .

Gao G B , Zhou D M , Tang H , et al . An intelligent health diagnosis and maintenance decision-making approach in smart manufacturing [J ] . Reliability Engineering & System Safety , 2021 , 216 : 107965 .

Coito T , Firme B , Martins M S E , et al . Intelligent sensors for real-time decision-making [J ] . Automation , 2021 , 2 ( 2 ): 62 - 82 .

Ahn M , Brohan A , Brown N , et al . Do as I can, not as I say: grounding language in robotic affordances [PP ] . V2 . ( 2022-08-16 ) [ 2025-11-26 ] . arXiv: arXiv. 2204 . 01691 .

Lang A H , Vora S , Caesar H , et al . PointPillars: fast encoders for object detection from point clouds [C ] // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2019 : 12689 - 12697 .

Ballé J , Minnen D , Singh S , et al . Variational image compression with a scale hyperprior [PP ] . V2 . ( 2018-05-01 ) [ 2025-11-26 ] . arXiv: arXiv. 1802 . 01436 .

Armeni I , Sener O , Zamir A R , et al . 3D semantic parsing of large-scale indoor spaces [C ] // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2016 : 1534 - 1543 .

Hackel T , Savinov N , Ladicky L , et al . Semantic3D.net: a new large-scale point cloud classification benchmark [PP ] . V1 . ( 2017-04-12 ) [ 2025-11-26 ] . arXiv: arXiv. 1704 . 03847 .

Wang J Q , Zhu H , Ma Z , et al . Learned point cloud geometry compression [PP ] . V1 . ( 2019-09-26 ) [ 2025-11-26 ] . arXiv: arXiv. 1909 . 12037 .

Wu T , Pan L , Zhang J Z , et al . Density-aware chamfer distance as a comprehensive metric for point cloud completion [PP ] . V1 . ( 2021-11-24 ) [ 2025-11-26 ] . arXiv: arXiv. 2111 . 12702 .

Mari D , Camuffo E , Milani S . CACTUS: content-aware compression and transmission using semantics for automotive LiDAR data [J ] . Sensors , 2023 , 23 ( 12 ): 5611 .

Xu Y W , Chen D F , Fang Y , et al . Efficient vibrotactile codec based on nbeats network [J ] . IEEE Signal Processing Letters , 2024 , 31 : 2845 - 2849 .

Views

下载量

CSCD

Alert me when the article has been cited

提交

Tools

Publicity Resources

Embodied intelligent spectrum sensing techniques in dynamic electromagnetic environments

Research on the development of airborne information resilience networks based on embodied intelligence

Wireless body-area communication for embodied intelligence: status, challenges, and architecture design

Embodied intelligence-driven collaborative optimization of communication and control in multi-AMR systems

Related Author

Wang Meiyu

Shang Jiaying

Sun Lu

Zha Haoran

Lin Yun

FEI Aiguo

WU Yongkang

ZHOU Liliang

Related Institution

College of Information and Communication Engineering, Harbin Engineering University

College of Communication Engineering, Hangzhou Dianzi University

The 10th Research Institute, China Electronics Technology Group Corporation (CETC)

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications

Unit of People’s Liberation Army of China

AI问答

Postal code：100079
Tel：（010）53879206 Email：tmw@bjxintong.com.cn
Technical support is provided by Beijing Founder electronics co., LTD 京ICP备09082226号-64 京公网安备11010602201714号
It is recommended to read the content of this site in Chrome&IE9+. Please switch to extreme mode in browser 360.
Cookies We use cookies to help provide and enhance our service and tailor content. By continuing, you agree to the use of cookies.

⁰