[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Image-based 3D model retrieval via disentangled feature learning and enhanced semantic alignment

Published: 01 March 2023 Publication History

Abstract

With the development of 3D technology and the increase in 3D models, 2D image-based 3D model retrieval tasks have drawn increased attention from scholars. Previous works align cross-domain features via adversarial domain alignment and semantic alignment. However, the extracted features of previous methods are disturbed by the residual domain-specific features, and the lack of labels for 3D models makes the semantic alignment challenging. Therefore, we propose disentangled feature learning associated with enhanced semantic alignment to address these problems. On one hand, the disentangled feature learning enables decoupling the twisted raw features into the isolated domain-invariant and domain-specific features, and the domain-specific features will be dropped while performing adversarial domain alignment and semantic alignment to acquire domain-invariant features. On the other hand, we mine the semantic consistency by compacting each 3D model sample and its nearest neighbors to further enhance semantic alignment for unlabeled 3D model domain. We give comprehensive experiments on two public datasets, and the results demonstrate the superiority of the proposed method. Especially on MI3DOR-2 dataset, our method outperforms the current state-of-the-art methods with gains of 2.88% for the strictest retrieval metric NN.

Highlights

An end-to-end unsupervised 2D image-based 3D model retrieval framework.
Transferring knowledge from labeled 2D images to unlabeled 3D models.
Domain-invariant features are disentangled from the original features.
Nearest neighbors enhanced semantic alignment for cross domain feature learning.
Experiments on MI3DOR and MI3DOR-2 verified the superiority of the method.

References

[1]
Abdul-Rashid H., Yuan J., Li B., Lu Y., Schreck T., Bui N.-M., et al., Extended 2D scene image-based 3D scene retrieval, in: Eurographics workshop on 3D object retrieval, Eurographics-European Association for Computer Graphics, 2019.
[2]
Chen Z., Ai S., Jia C., Structure-aware deep learning for product image classification, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 15 (1s) (2019) 1–20.
[3]
Chen, J., & Fang, Y. (2018). Deep cross-modality adaptation via semantics preserving adversarial learning for sketch-based 3D shape retrieval. In Proceedings of the European conference on computer vision (pp. 605–620).
[4]
Chen D.-Y., Tian X.-P., Shen Y.-T., Ouhyoung M., On visual similarity based 3D model retrieval, in: Computer Graphics Forum, 22, Wiley Online Library, 2003, pp. 223–232.
[5]
Chen, M., Zhao, S., Liu, H., & Cai, D. (2020). Adversarial-learned loss for domain adaptation. In Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 04 (pp. 3521–3528).
[6]
Cui, S., Wang, S., Zhuo, J., Su, C., Huang, Q., & Tian, Q. (2020). Gradually vanishing bridge for adversarial domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12455–12464).
[7]
Deng J., Dong W., Socher R., Li L.-J., Li K., Fei-Fei L., Imagenet: A large-scale hierarchical image database, in: 2009 IEEE conference on computer vision and pattern recognition, Ieee, 2009, pp. 248–255.
[8]
Fan, H., Yang, Y., & Kankanhalli, M. (2021). Point 4D Transformer Networks for Spatio-Temporal Modeling in Point Cloud Videos. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14204–14213).
[9]
Fan, H., Yu, X., Ding, Y., Yang, Y., & Kankanhalli, M. (2020). PSTNet: Point spatio-temporal convolution on point cloud sequences. In International conference on learning representations.
[10]
Feng, Y., Feng, Y., You, H., Zhao, X., & Gao, Y. (2019). Meshnet: Mesh neural network for 3D shape representation. In Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01 (pp. 8279–8286).
[11]
Feng, Y., Zhang, Z., Zhao, X., Ji, R., & Gao, Y. (2018). GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 264–272).
[12]
Fernando, B., Habrard, A., Sebban, M., & Tuytelaars, T. (2013). Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE international conference on computer vision (pp. 2960–2967).
[13]
Gallego A.-J., Calvo-Zaragoza J., Fisher R.B., Incremental unsupervised domain-adversarial training of neural networks, IEEE Transactions on Neural Networks and Learning Systems (2020) 1–15,.
[14]
Ganin Y., Lempitsky V., Unsupervised domain adaptation by backpropagation, in: International conference on machine learning, PMLR, 2015, pp. 1180–1189.
[15]
Gao Z., Li Y., Wan S., Exploring deep learning for view-based 3D model retrieval, ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 16 (1) (2020) 1–21.
[16]
Gao Z., Xue H., Wan S., Multiple discrimination and pairwise CNN for view-based 3D object retrieval, Neural Networks 125 (2020) 290–302.
[17]
Gao Z., Zhang Y., Zhang H., Guan W., Feng D., Chen S., Multi-level view associative convolution network for view-based 3D model retrieval, IEEE Transactions on Circuits and Systems for Video Technology (2021).
[18]
Gong, R., Li, W., Chen, Y., & Gool, L. V. (2019). Dlow: Domain flow for adaptation and generalization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 2477–2486).
[19]
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770–778).
[20]
Hu N., Zhou H., Huang X., Li X., Liu A.-A., A feature transformation framework with selective pseudo-labeling for 2D image-based 3D shape retrieval, IEEE Transactions on Circuits and Systems for Video Technology (2022).
[21]
Hu N., Zhou H., Liu A.-A., Huang X., Zhang S., Jin G., et al., Collaborative distribution alignment for 2D image-based 3D shape retrieval, Journal of Visual Communication and Image Representation (2022).
[22]
Huang J., Gretton A., Borgwardt K., Schölkopf B., Smola A., Correcting sample selection bias by unlabeled data, Advances in Neural Information Processing Systems 19 (2006) 601–608.
[23]
Huang, J., Guan, D., Xiao, A., Lu, S., & Shao, L. (2022). Category contrast for unsupervised domain adaptation in visual tasks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1203–1214).
[24]
Li W., Liu A., Nie W., Song D., Li Y., Wang W., et al., Monocular image based 3D model retrieval, The Eurographics Association, 2019.
[25]
Li J., Wu Y., Lu K., Structured domain adaptation, IEEE Transactions on Circuits and Systems for Video Technology 27 (8) (2016) 1700–1713.
[26]
Li W.-H., Xiang S., Nie W.-Z., Song D., Liu A.-A., Li X.-Y., et al., Joint deep feature learning and unsupervised visual domain adaptation for cross-domain 3D object retrieval, Information Processing & Management 57 (5) (2020).
[27]
Li W.-H., Yang S., Wang Y., Song D., Li X.-Y., Multi-level similarity learning for image-text retrieval, Information Processing & Management 58 (1) (2021).
[28]
Li L., Zhao K., Gan J., Cai S., Liu T., Mu H., et al., Robust adaptive semi-supervised classification method based on dynamic graph and self-paced learning, Information Processing & Management 58 (1) (2021).
[29]
Liang Q., Li Q., Nie W., Liu A., Unsupervised cross-media graph convolutional network for 2D image-based 3D model retrieval, IEEE Transactions on Multimedia (2022).
[30]
Liu, Z., Miao, Z., Pan, X., Zhan, X., Lin, D., Yu, S. X., et al. (2020). Open compound domain adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12406–12415).
[31]
Liu A.-A., Nie W.-Z., Gao Y., Su Y.-T., View-based 3-D model retrieval: A benchmark, IEEE Transactions on Cybernetics 48 (3) (2017) 916–928.
[32]
Liu A.-A., Nie W.-Z., Su Y.-T., 3D object retrieval based on multi-view latent variable model, IEEE Transactions on Circuits and Systems for Video Technology 29 (3) (2018) 868–880.
[33]
Liu, A., Xiang, S., Li, W., Nie, W., & Su, Y. (2018). Cross-domain 3D model retrieval via visual domain adaptation. In International joint conference on artificial intelligence (pp. 828–834).
[34]
Liu A.-A., Zhou H.-Y., Li X., Wang L., Vulnerability of feature extractors in 2D image-based 3D object retrieval, IEEE Transactions on Multimedia (2022).
[35]
Long M., Cao Y., Wang J., Jordan M., Learning transferable features with deep adaptation networks, in: International conference on machine learning, PMLR, 2015, pp. 97–105.
[36]
Long M., Zhu H., Wang J., Jordan M.I., Deep transfer learning with joint adaptation networks, in: International conference on machine learning, PMLR, 2017, pp. 2208–2217.
[37]
Ma C., Guo Y., Yang J., An W., Learning multi-view representation with LSTM for 3-D shape recognition and retrieval, IEEE Transactions on Multimedia 21 (5) (2018) 1169–1182.
[38]
Ma A., You F., Jing M., Li J., Lu K., Multi-source domain adaptation with graph embedding and adaptive label prediction, Information Processing & Management 57 (6) (2020).
[39]
Maturana D., Scherer S., Voxnet: A 3D convolutional neural network for real-time object recognition, in: 2015 IEEE/RSJ international conference on intelligent robots and systems, IEEE, 2015, pp. 922–928.
[40]
Nie W., Zhao Y., Niea J., Liu A.-A., Zhaob S., CLN: Cross-domain learning network for 2D image-based 3D shape retrieval, IEEE Transactions on Circuits and Systems for Video Technology (2021).
[41]
Peng, M., Zhang, Q., Jiang, Y.-g., & Huang, X.-J. (2018). Cross-domain sentiment classification with target domain specific information. In Proceedings of the 56th annual meeting of the association for computational linguistics (volume 1: long papers) (pp. 2505–2513).
[42]
Phong B.T., Illumination for computer generated pictures, Communications of the ACM 18 (6) (1975) 311–317.
[43]
Qi, C. R., Su, H., Mo, K., & Guibas, L. J. (2017). Pointnet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 652–660).
[44]
Sain, A., Bhunia, A. K., Yang, Y., Xiang, T., & Song, Y.-Z. (2021). Stylemeup: Towards style-agnostic sketch-based image retrieval. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8504–8513).
[45]
Savva, M., Yu, F., Su, H., Kanezaki, A., Furuya, T., Ohbuchi, R., et al. (2017). Large-scale 3D shape retrieval from ShapeNet Core55: SHREC’17 track. In Proceedings of the workshop on 3D object retrieval (pp. 39–50).
[46]
Shilane P., Min P., Kazhdan M., Funkhouser T., The princeton shape benchmark, in: Proceedings shape modeling applications, vol. 2004, IEEE, 2004, pp. 167–178.
[47]
Song D., Li T.-B., Li W.-H., Nie W.-Z., Liu W., Liu A.-A., Universal cross-domain 3D model retrieval, 23 (2020) 2721–2731.
[48]
Song D., Nie W.-Z., Li W.-H., Kankanhalli M., Liu A.-A., Monocular image-based 3-D model retrieval: A benchmark, IEEE Transactions on Cybernetics (2021).
[49]
Su, H., Maji, S., Kalogerakis, E., & Learned-Miller, E. (2015). Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE international conference on computer vision (pp. 945–953).
[50]
Sun, B., Feng, J., & Saenko, K. (2016). Return of frustratingly easy domain adaptation. In Proceedings of the AAAI conference on artificial intelligence, vol. 30, no. 1.
[51]
Sun, T., Lu, C., Zhang, T., & Ling, H. (2022). Safe Self-Refinement for Transformer-based Domain Adaptation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7191–7200).
[52]
Sun B., Saenko K., Deep coral: Correlation alignment for deep domain adaptation, in: European conference on computer vision, Springer, 2016, pp. 443–450.
[53]
Tzeng, E., Hoffman, J., Saenko, K., & Darrell, T. (2017). Adversarial discriminative domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 7167–7176).
[54]
Tzeng E., Hoffman J., Zhang N., Saenko K., Darrell T., Deep domain confusion: Maximizing for domain invariance, 2014, arXiv preprint arXiv:1412.3474.
[55]
Wang J., Feng W., Chen Y., Yu H., Huang M., Yu P.S., Visual domain adaptation with manifold embedded distribution alignment, in: Proceedings of the 26th ACM international conference on multimedia, 2018, pp. 402–410.
[56]
Wang Y., Sun Y., Liu Z., Sarma S.E., Bronstein M.M., Solomon J.M., Dynamic graph CNN for learning on point clouds, Acm Transactions on Graphics (Tog) 38 (5) (2019) 1–12.
[57]
Wu, W., Qi, Z., & Fuxin, L. (2019). Pointconv: Deep convolutional networks on 3D point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 9621–9630).
[58]
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., et al. (2015). 3Dshapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1912–1920).
[59]
Xie, J., Dai, G., Zhu, F., & Fang, Y. (2017). Learning barycentric representations of 3D shapes for sketch-based 3D shape retrieval. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5068–5076).
[60]
Xie S., Zheng Z., Chen L., Chen C., Learning semantic representations for unsupervised domain adaptation, in: International conference on machine learning, PMLR, 2018, pp. 5423–5432.
[61]
Xu, T., Chen, W., Pichao, W., Wang, F., Li, H., & Jin, R. (2021). CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation. In International conference on learning representations.
[62]
Xu X., He H., Zhang H., Xu Y., He S., Unsupervised domain adaptation via importance sampling, IEEE Transactions on Circuits and Systems for Video Technology 30 (12) (2019) 4688–4699.
[63]
Yang P., Gao W., Tan Q., Wong K.-F., A link-bridged topic model for cross-domain document classification, Information Processing & Management 49 (6) (2013) 1181–1193.
[64]
Yang Y., Han J., Zhang D., Cheng D., Disentangling deep network for reconstructing 3D object shapes from single 2D images, in: Chinese conference on pattern recognition and computer vision, Springer, 2021, pp. 153–166.
[65]
Yang Y., Han J., Zhang D., Tian Q., Exploring rich intermediate representations for reconstructing 3D shapes from 2D images, Pattern Recognition 122 (2022).
[66]
Yang Z., Lin Z., Guo L., Li Q., Liu W., MMED: A multi-domain and multi-modality event dataset, Information Processing & Management 57 (6) (2020).
[67]
Yosinski J., Clune J., Bengio Y., Lipson H., How transferable are features in deep neural networks?, 2014, arXiv preprint arXiv:1411.1792.
[68]
Zhang, J., Li, W., & Ogunbona, P. (2017). Joint geometrical and statistical alignment for visual domain adaptation. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1859–1867).
[69]
Zhou, H., Liu, A.-A., & Nie, W. (2019). Dual-level embedding alignment network for 2D image-based 3D object retrieval. In Proceedings of the 27th ACM international conference on multimedia (pp. 1667–1675).
[70]
Zhou Y., Liu Y., Zhou H., Cheng Z., Li X., Liu A.-A., Learning transferable and discriminative representations for 2D image-based 3D model retrieval, IEEE Transactions on Circuits and Systems for Video Technology (2022).
[71]
Zhou, H., Nie, W., Song, D., Hu, N., Li, X., & Liu, A.-A. (2020). Semantic Consistency Guided Instance Feature Alignment for 2D Image-Based 3D Shape Retrieval. In Proceedings of the 28th ACM international conference on multimedia (pp. 925–933).

Cited By

View all
  • (2024)Why do variational autoencoders really promote disentanglement?Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692224(3817-3849)Online publication date: 21-Jul-2024
  • (2024)Structured serialization semantic transfer network for unsupervised cross-domain recognition and retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10356561:1Online publication date: 1-Jan-2024

Index Terms

  1. Image-based 3D model retrieval via disentangled feature learning and enhanced semantic alignment
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Please enable JavaScript to view thecomments powered by Disqus.

        Information & Contributors

        Information

        Published In

        cover image Information Processing and Management: an International Journal
        Information Processing and Management: an International Journal  Volume 60, Issue 2
        Mar 2023
        1443 pages

        Publisher

        Pergamon Press, Inc.

        United States

        Publication History

        Published: 01 March 2023

        Author Tags

        1. 3D model retrieval
        2. Unsupervised domain adaptation
        3. Feature disentangle learning

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 13 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)Why do variational autoencoders really promote disentanglement?Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692224(3817-3849)Online publication date: 21-Jul-2024
        • (2024)Structured serialization semantic transfer network for unsupervised cross-domain recognition and retrievalInformation Processing and Management: an International Journal10.1016/j.ipm.2023.10356561:1Online publication date: 1-Jan-2024

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media