
- Home
高级 检索
Chinese
English



南京邮电大学通信与信息工程学院,江苏 南京 210003
Received:26 November 2025,
Revised:2026-03-14,
Accepted:16 March 2026,
Published:20 April 2026
移动端阅览
陈鸣锴,刘沁妍.语义赋能的空间感知编解码技术研究[J].通信学报,2026,47(04):192-203.
Chen Mingkai,Liu Qinyan.Research on semantic-enhanced spatial perception codec technology[J].Journal on Communications,2026,47(04):192-203.
陈鸣锴,刘沁妍.语义赋能的空间感知编解码技术研究[J].通信学报,2026,47(04):192-203. DOI: 10.11959/j.issn.1000-436x.2026069.
Chen Mingkai,Liu Qinyan.Research on semantic-enhanced spatial perception codec technology[J].Journal on Communications,2026,47(04):192-203. DOI: 10.11959/j.issn.1000-436x.2026069.
针对特种机器人在大规模场景实时重建时所面临的计算复杂性、网络传输效率和高保真度语义重建等问题,提出了一种语义赋能的空间感知编解码技术。首先,采用BEV特征映射将点云转换为结构化张量;其次,基于语义信息生成二值语义掩码实现关键区域定位和数据稀疏化;再次,构建层次化熵编码框架,通过可学习量化和超先验概率模型实现高效压缩;最后,解码采用具身智能场景语义对齐重建确保几何与语义高保真恢复。实验结果表明,所提方法在保持良好重建质量的同时实现了高效的数据传输,充分验证了该方法的有效性和鲁棒性。
To address the challenges of computational complexity
network transmission efficiency
and high-fidelity semantic reconstruction faced by specialized robots in large-scale scene real-time reconstruction
a semantic-enhanced spatial perception codec technology was proposed. Firstly
BEV feature mapping was adopted to convert point clouds into structured tensors. Secondly
binary semantic masks were generated based on semantic information to achieve key region localization and data sparsification. Thirdly
a hierarchical entropy coding framework was constructed to realize efficient compression through learnable quantization and hyper-prior probability models. Finally
decoding adopts embodied intelligent scene semantic alignment reconstruction to ensure high-fidelity restoration of geometry and semantics. Experimental results demonstrate that the proposed method achieves efficient data transmission while maintaining good reconstruction quality
which fully validates the effectiveness and robustness of the method.
Bogue R . The role of robots in environmental monitoring [J ] . Industrial Robot: The International Journal of Robotics Research and Application , 2023 , 50 ( 3 ): 369 - 375 .
Lee D , Jung M , Yang W , et al . LiDAR odometry survey: recent advancements and remaining challenges [J ] . Intelligent Service Robotics , 2024 , 17 ( 2 ): 95 - 118 .
Guo Y L , Wang H Y , Hu Q Y , et al . Deep learning for 3D point clouds: a survey [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2021 , 43 ( 12 ): 4338 - 4364 .
Hughes N , Chang Y , Hu S Y , et al . Foundations of spatial perception for robotics: hierarchical representations and real-time systems [J ] . The International Journal of Robotics Research , 2024 , 43 ( 10 ): 1457 - 1505 .
Dawarka V , Bekaroo G . Building and evaluating cloud robotic systems: a systematic review [J ] . Robotics and Computer-Integrated Manufacturing , 2022 , 73 : 102240 .
Fang G C , Hu Q Y , Wang H Y , et al . 3DAC: learning attribute compression for point clouds [C ] // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2022 : 14799 - 14808 .
Song C H , Blukis V , Tremblay J , et al . RoboSpatial: teaching spatial understanding to 2D and 3D vision-language models for robotics [C ] // Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2025 : 15768 - 15780 .
Shridhar M , Manuelli L , Fox D . CLIPort: what and where pathways for robotic manipulation [C ] // Proceedings of the Conference on Robot Learning . New York : PMLR , 2022 : 894 - 906 .
Azuma D , Miyanishi T , Kurita S , et al . ScanQA: 3D question answering for spatial scene understanding [C ] // Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2022 : 19107 - 19117 .
Leutenegger S . OKVIS2: realtime scalable visual-inertial SLAM with loop closure [PP ] . V2 . ( 2022-08-12 )[ 2025-11-26 ] . arXiv: arXiv. 2202.09199.
Bultmann S , Behnke S . 3D semantic scene perception using distributed smart edge sensors [C ] // International Conference on Intelligent Autonomous Systems . Berlin : Springer , 2023 : 313 - 329 .
Chen B Y , Xia F , Ichter B , et al . Open-vocabulary queryable scene representations for real world planning [C ] // Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA) . Piscataway : IEEE Press , 2023 : 11509 - 11522 .
Jia Y P , He J , Chen R Z , et al . OccupancyDETR: making semantic scene completion as straightforward as object detection [PP ] . V3 . ( 2024-05-18 ) [ 2025-11-26 ] . arXiv: arXiv. 2309 . 08504 .
Li B H , Jin X , Wang J N , et al . OccScene: semantic occupancy-based cross-task mutual learning for 3D scene generation [PP ] . V2 . ( 2025-08-22 ) [ 2025-11-26 ] . arXiv: arXiv. 2412 . 11183 .
Qi Z Y , Zhang Z X , Fang Y , et al . GPT4Scene: understand 3D scenes from videos with vision-language models [PP ] . V4 . ( 2025-03-11 ) [ 2025-11-26 ] . arXiv: arXiv. 2501 . 01428 .
Jiang J P , Xiao W Y , Lin Z Y , et al . SOLAMI: social vision-language-action modeling for immersive interaction with 3D autonomous characters [C ] // Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2025 : 26887 - 26898 .
Fu H S , Liang F , Lin J P , et al . Learned image compression with Gaussian-Laplacian-logistic mixture model and concatenated residual modules [J ] . IEEE Transactions on Image Processing , 2023 , 32 : 2063 - 2076 .
He J , Gong H X , Lu H Y . Design of fractal image coding compression and transmission model based on wavelet transform [C ] // International Conference on Advanced Hybrid Information Processing . Berlin : Springer , 2022 : 15 - 25 .
Yuan F , Zhan L H , Pan P W , et al . Low bit-rate compression of underwater image based on human visual system [J ] . Signal Processing: Image Communication , 2021 , 91 : 116082 .
Yang J H , Yu H , Li P , et al . Real-time D-PMU data compression for edge computing devices in digital distribution networks [J ] . IEEE Transactions on Power Systems , 2024 , 39 ( 4 ): 5712 - 5725 .
Lu M , Guo P Y , Shi H Q , et al . Transformer-based image compression [PP ] . V1 . ( 2021-11-12 ) [ 2025-11-26 ] . arXiv: arXiv. 2111 . 06707 .
Khoshkhahtinat A , Zafari A , Mehta P M , et al . Neural-based video compression on solar dynamics observatory images [J ] . IEEE Transactions on Aerospace and Electronic Systems , 2024 , 60 ( 5 ): 6685 - 6701 .
Yasuda M , Ohishi Y , Saito S , et al . Multi-view and multi-modal event detection utilizing transformer-based multi-sensor fusion [C ] // Proceedings of the ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . Piscataway : IEEE Press , 2022 : 4638 - 4642 .
Zhou Q H , Chen S L , Wang Y S , et al . HAZARD challenge: embodied decision making in dynamically changing environments [PP ] . V1 . ( 2024-01-23 ) [ 2025-11-26 ] . arXiv: arXiv. 2401 . 12975 .
Gao G B , Zhou D M , Tang H , et al . An intelligent health diagnosis and maintenance decision-making approach in smart manufacturing [J ] . Reliability Engineering & System Safety , 2021 , 216 : 107965 .
Coito T , Firme B , Martins M S E , et al . Intelligent sensors for real-time decision-making [J ] . Automation , 2021 , 2 ( 2 ): 62 - 82 .
Ahn M , Brohan A , Brown N , et al . Do as I can, not as I say: grounding language in robotic affordances [PP ] . V2 . ( 2022-08-16 ) [ 2025-11-26 ] . arXiv: arXiv. 2204 . 01691 .
Lang A H , Vora S , Caesar H , et al . PointPillars: fast encoders for object detection from point clouds [C ] // Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2019 : 12689 - 12697 .
Ballé J , Minnen D , Singh S , et al . Variational image compression with a scale hyperprior [PP ] . V2 . ( 2018-05-01 ) [ 2025-11-26 ] . arXiv: arXiv. 1802 . 01436 .
Armeni I , Sener O , Zamir A R , et al . 3D semantic parsing of large-scale indoor spaces [C ] // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Piscataway : IEEE Press , 2016 : 1534 - 1543 .
Hackel T , Savinov N , Ladicky L , et al . Semantic3D.net: a new large-scale point cloud classification benchmark [PP ] . V1 . ( 2017-04-12 ) [ 2025-11-26 ] . arXiv: arXiv. 1704 . 03847 .
Wang J Q , Zhu H , Ma Z , et al . Learned point cloud geometry compression [PP ] . V1 . ( 2019-09-26 ) [ 2025-11-26 ] . arXiv: arXiv. 1909 . 12037 .
Wu T , Pan L , Zhang J Z , et al . Density-aware chamfer distance as a comprehensive metric for point cloud completion [PP ] . V1 . ( 2021-11-24 ) [ 2025-11-26 ] . arXiv: arXiv. 2111 . 12702 .
Mari D , Camuffo E , Milani S . CACTUS: content-aware compression and transmission using semantics for automotive LiDAR data [J ] . Sensors , 2023 , 23 ( 12 ): 5611 .
Xu Y W , Chen D F , Fang Y , et al . Efficient vibrotactile codec based on nbeats network [J ] . IEEE Signal Processing Letters , 2024 , 31 : 2845 - 2849 .
0
Views
34
下载量
0
CSCD
Publicity Resources
Related Articles
Related Author
Related Institution
京公网安备11010602201714号