虛擬3D模型於現今生活中應用相當廣泛,其建立的方式大多數為使用電腦繪圖進行繪製,或是以硬體的方式掃描需要的物體,生成其3D模型。RGB-D相機結合了彩色影像和深度資訊,能夠得到具有色彩,又有一定精細度的模型,因此在使用RGB-D相機進行建圖為現今的熱門議題。而在使用RGB-D相機進行建圖上,存在一個必要資訊,也就是相機的位姿,有了精準的相機位姿才能將不同影像中的資訊做建圖。基於影像的同步定位與建圖 (visual Simultaneous Localization and Mapping, vSLAM)為常用於RGB-D相機的位姿計算和建圖方法,因此本論文結合了RGB-D相機和vSLAM,目標建立出高精細且具有色彩的3D模型。在整體的演算法框架中,主要可分為三個部分,首先是相機位姿估測,本論文使用了vSLAM中的特徵點法來進行定位,並參考了ORB-SLAM2中的框架進行改良,接著第二個部分為回環檢測和位姿圖最佳化,有別於多數vSLAM使用的詞袋模型來進行回環檢測,在本篇論文則使用了特徵點匹配合暴力搜尋法來進行,並根據回環檢測的結果進行位姿圖最佳化,最後一個部分則為3D點雲重建,在這邊結合了地面點分割、點雲濾波器等前處理,在結合Point-to-plane ICP和Truncated Signed Distance Function (TSDF)等演算法來對點雲建圖做進一步的優化,並得到最終的點雲模型。經由使用Azure Kinect進行實作後,可證明本研究所提出的方法可成功將拍攝的物件由RGB-D影像重建成完整的3D點雲模型,並可將所建立之點雲進行3D列印,實現逆向工程之應用目的。
The application of virtual 3D models is widespread in modern life, with most of them created using computer graphics or generated by scanning objects with specialized hardware. One popular hardware used for this purpose is the RGB-D camera, which combines color and depth images. The RGB-D camera enables the creation of highly detailed models with accurate color representation. Consequently, the use of RGB-D cameras in model reconstruction has become a prominent topic in recent years. Based on visual Simultaneous Localization and Mapping (vSLAM), which is commonly used for pose estimation and mapping with RGB-D cameras, this paper combines RGB-D cameras with vSLAM to achieve the goal of creating highly detailed and color-rich 3D models. The entire algorithm framework can be divided into three main components: pose estimation, loop closing and point cloud reconstruction. Firstly, camera pose estimation is performed using feature-based methods from vSLAM, with reference to the framework of ORB-SLAM2 for improvements. In loop closing, differing from the commonly used bag-of-words model for loop detection in most vSLAM approaches, this paper employs feature point matching combined with brute-force search. Based on the loop detection results, pose graph optimization is performed to refine the camera poses. The final component is 3D point cloud reconstruction, which incorporates preprocessing steps such as ground point segmentation and point cloud filtering. Additionally, the algorithm combines techniques like Point-to-plane ICP (Iterative Closest Point) and TSDF (Truncated Signed Distance Function) to further optimize the point cloud mapping process and obtain the final point cloud model.
摘要 i
Extended Abstract ii
誌謝 x
目錄 xi
圖目錄 xiii
第1章 緒論 1
1.1. 研究動機與目的 1
1.2. 文獻回顧 2
1.3. 論文架構 3
第2章 RGB-D影像位姿估測 4
2.1. RGB-D相機模型 4
2.1.1. 深度相機成像原理 4
2.1.2. 針孔相機模型 5
2.2. 特徵點萃取 8
2.2.1. Scale Invariant Feature Transform (SIFT) 8
2.2.2 Oriented FAST and rotated BRIEF (ORB) 11
2.3. 光束法平差 14
2.4. ORB-SLAM2框架 19
2.4.1 Tracking 19
2.4.2 Local mapping 20
2.4.3 基於高精度點雲模型重建的改動 20
第3章 基於無詞袋模型的特徵回環檢測 22
3.1. Loop detection method 22
3.2. Pose Graph Optimization 25
第4章 高密度點雲建圖 29
4.1. Point Cloud preprocess method 29
4.1.1. 高密度點雲生成 29
4.1.2. 深度值篩選 30
4.1.3. 地面點移除 32
4.1.4. 點雲濾波器 34
4.2. Point-to-plane ICP 38
4.3. Truncated signed distance function 41
4.4. Point cloud reconstruction pipeline 45
第5章 實驗結果與分析 47
5.1. 實驗演算法流程 47
5.2. Unreal Engine人體模型重建模擬 48
5.2.1. Unreal Engine位姿估計 49
5.2.2. 回環檢測與位姿圖最佳化 51
5.2.3. 3D點雲重建 53
5.3. Azure Kinect人體模型重建實作 58
5.3.1. 相機位姿估測 58
5.3.2. 回環檢測與位姿圖最佳化 60
5.3.3. 3D點雲重建 62
第6章 結論與未來展望 67
6.1 結論 67
6.2 未來展望 67
參考文獻 69
[1]S. Ullman, "The interpretation of structure from motion," Proceedings of the Royal Society of London. Series B. Biological Sciences, vol. 203, no. 1153, pp. 405-426, 1979.
[2]R. Hartley and A. Zisserman, Multiple view geometry in computer vision. Cambridge university press, 2003.
[3] G. Farnebäck, "Two-Frame Motion Estimation Based on Polynomial Expansion," Berlin, Heidelberg, 2003: Springer Berlin Heidelberg, in Image Analysis, pp. 363-370.
[4] J. Engel, T. Schöps, and D. Cremers, "LSD-SLAM: Large-Scale Direct Monocular SLAM," Cham, 2014: Springer International Publishing, in Computer Vision – ECCV 2014, pp. 834-849.
[5] D. G. Lowe, "Object recognition from local scale-invariant features," in Proceedings of the seventh IEEE international conference on computer vision, 1999, vol. 2: Ieee, pp. 1150-1157.
[6]H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, "Speeded-Up Robust Features (SURF)," Computer Vision and Image Understanding, vol. 110, no. 3, pp. 346-359, 2008/06/01/ 2008, doi: https://doi.org/10.1016/j.cviu.2007.09.014.
[7] E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, "ORB: An efficient alternative to SIFT or SURF," in 2011 International Conference on Computer Vision, 6-13 Nov. 2011 2011, pp. 2564-2571, doi: 10.1109/ICCV.2011.6126544.
[8] G. Klein and D. Murray, "Parallel Tracking and Mapping for Small AR Workspaces," in 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, 13-16 Nov. 2007 2007, pp. 225-234, doi: 10.1109/ISMAR.2007.4538852.
[9]R. Mur-Artal, J. M. M. Montiel, and J. D. Tardós, "ORB-SLAM: A Versatile and Accurate Monocular SLAM System," IEEE Transactions on Robotics, vol. 31, no. 5, pp. 1147-1163, 2015, doi: 10.1109/TRO.2015.2463671.
[10]R. Mur-Artal and J. D. Tardós, "ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras," IEEE Transactions on Robotics, vol. 33, no. 5, pp. 1255-1262, 2017, doi: 10.1109/TRO.2017.2705103.
[11]K. S. Arun, T. S. Huang, and S. D. Blostein, "Least-Squares Fitting of Two 3-D Point Sets," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. PAMI-9, no. 5, pp. 698-700, 1987, doi: 10.1109/TPAMI.1987.4767965.
[12]S. Izadi et al., "KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera," presented at the Proceedings of the 24th annual ACM symposium on User interface software and technology, Santa Barbara, California, USA, 2011. [Online]. Available: https://doi.org/10.1145/2047196.2047270.
[13] T. Whelan, S. Leutenegger, R. Salas-Moreno, B. Glocker, and A. Davison, "ElasticFusion: Dense SLAM without a pose graph," 2015: Robotics: Science and Systems.
[14] C. Kerl, J. Sturm, and D. Cremers, "Dense visual SLAM for RGB-D cameras," in 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems, 3-7 Nov. 2013 2013, pp. 2100-2106, doi: 10.1109/IROS.2013.6696650.
[15] G. Kim and A. Kim, "Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map," in 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 1-5 Oct. 2018 2018, pp. 4802-4809, doi: 10.1109/IROS.2018.8593953.
[16]D. Galvez-López and J. D. Tardos, "Bags of Binary Words for Fast Place Recognition in Image Sequences," IEEE Transactions on Robotics, vol. 28, no. 5, pp. 1188-1197, 2012, doi: 10.1109/TRO.2012.2197158.
[17]G. Grisetti, R. Kümmerle, C. Stachniss, and W. Burgard, "A Tutorial on Graph-Based SLAM," IEEE Intelligent Transportation Systems Magazine, vol. 2, no. 4, pp. 31-43, 2010, doi: 10.1109/MITS.2010.939925.
[18] L. Carlone, R. Tron, K. Daniilidis, and F. Dellaert, "Initialization techniques for 3D SLAM: A survey on rotation estimation and its use in pose graph optimization," in 2015 IEEE International Conference on Robotics and Automation (ICRA), 26-30 May 2015 2015, pp. 4597-4604, doi: 10.1109/ICRA.2015.7139836.
[19] M. Keller, D. Lefloch, M. Lambers, S. Izadi, T. Weyrich, and A. Kolb, "Real-Time 3D Reconstruction in Dynamic Scenes Using Point-Based Fusion," in 2013 International Conference on 3D Vision - 3DV 2013, 29 June-1 July 2013 2013, pp. 1-8, doi: 10.1109/3DV.2013.9.
[20] C. Harris and M. Stephens, "A combined corner and edge detector," in Alvey vision conference, 1988, vol. 15, no. 50: Citeseer, pp. 10-5244.
[21] D. G. Viswanathan, "Features from accelerated segment test (fast)," in Proceedings of the 10th workshop on image analysis for multimedia interactive services, London, UK, 2009, pp. 6-8.
[22] M. Calonder, V. Lepetit, C. Strecha, and P. Fua, "BRIEF: Binary Robust Independent Elementary Features," in Computer Vision – ECCV 2010, Berlin, Heidelberg, K. Daniilidis, P. Maragos, and N. Paragios, Eds., 2010// 2010: Springer Berlin Heidelberg, pp. 778-792.
[23]P. L. Rosin, "Measuring Corner Properties," Computer Vision and Image Understanding, vol. 73, no. 2, pp. 291-307, 1999/02/01/ 1999, doi: https://doi.org/10.1006/cviu.1998.0719.
[24] B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, "Bundle Adjustment — A Modern Synthesis," Berlin, Heidelberg, 2000: Springer Berlin Heidelberg, in Vision Algorithms: Theory and Practice, pp. 298-372.
[25]N. Jacobson, Lie algebras (no. 10). Courier Corporation, 1979.
[26]V. Lepetit, F. Moreno-Noguer, and P. Fua, "EPnP: An Accurate O(n) Solution to the PnP Problem," International Journal of Computer Vision, vol. 81, no. 2, pp. 155-166, 2009/02/01 2009, doi: 10.1007/s11263-008-0152-6.
[27]E. Karami, S. Prasad, and M. Shehata, "Image matching using SIFT, SURF, BRIEF and ORB: performance comparison for distorted images," arXiv preprint arXiv:1710.02726, 2017.
[28]R. Gilmore, "Baker‐Campbell‐Hausdorff formulas," Journal of Mathematical Physics, vol. 15, no. 12, pp. 2090-2092, 1974.
[29]A.-C. Chang, " Principal Component Analysis based 3D Point Clouds Denoising and Sharpening," National Cheng Kung University.
[30] C. Tomasi and R. Manduchi, "Bilateral filtering for gray and color images," in Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271), 7-7 Jan. 1998 1998, pp. 839-846, doi: 10.1109/ICCV.1998.710815.
[31]I. Sobel and G. Feldman, "A 3x3 isotropic gradient operator for image processing," a talk at the Stanford Artificial Project in, pp. 271-272, 1968.
[32]K.-L. Low, "Linear least-squares optimization for point-to-plane icp surface registration," Chapel Hill, University of North Carolina, vol. 4, no. 10, pp. 1-3, 2004.
[33] B. Curless and M. Levoy, "A volumetric method for building complex models from range images," in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, pp. 303-312.
電子全文
(
網際網路公開日期:20280809
)