概要
长期以来困扰人工智能领域的一个问题是: 人工智能是否具有创造力, 或者说, 算法的推理过程是否可以具有创造性. 本文从思维科学的角度探讨人工智能创造力的问题. 首先, 列举形象思维推理的相关研究; 然后, 重点介绍一种特殊的视觉知识表示形式, 即视觉场景图; 最后, 详细介绍视觉场景图构造问题与潜在应用. 所有证据表明, 视觉知识和视觉思维不仅可以改善当前人工智能任务的性能, 而且可以用于机器创造力的实践.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arnheim R, 1997. Visual Thinking. University of California Press, San Francisco, USA.
Bau D, Zhu JY, Wulff J, et al., 2019. Seeing what a GAN cannot generate. Proc IEEE/CVF Int Conf on Computer Vision, p.4501–4510. https://doi.org/10.1109/ICCV.2019.00460
Chen L, Zhang HW, Xiao J, et al., 2019. Counterfactual critic multi-agent training for scene graph generation. Proc IEEE/CVF Int Conf on Computer Vision, p.4612–4622. https://doi.org/10.1109/ICCV.2019.00471
Denis M, 1991. Imagery and thinking. In: Cornoldi C, McDaniel MA (Eds.), Imagery and Cognition. Springer, New York, NY, USA, p.103–131. https://doi.org/10.1007/978-1-4684-6407-8_4
Elgammal A, Liu BC, Elhoseiny M, et al., 2017. CAN: creative adversarial networks, generating “art” by learning about styles and deviating from style norms. https://arxiv.org/abs/1706.07068
Gazzaniga MS, 1967. The split brain in man. Sci Am, 217(2): 24–29. https://doi.org/10.1038/scientificamerican0867-24
Gu JX, Zhao HD, Lin Z, et al., 2019. Scene graph generation with external knowledge and image reconstruction. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1969–1978. https://doi.org/10.1109/CVPR.2019.00207
Haurilet M, Roitberg A, Stiefelhagen R, 2019. It’s not about the journey; it’s about the destination: following soft paths under question-guidance for visual reasoning. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1930–1939. https://doi.org/10.1109/CVPR.2019.00203
Herzig R, Bar A, Xu HJ, et al., 2020. Learning canonical representations for scene graph to image generation. 16th European Conf on Computer Vision, p.210–227. https://doi.org/10.1007/978-3-030-58574-7_13
Hudson DA, Manning CD, 2019. GQA: a new dataset for real-world visual reasoning and compositional question answering. https://arxiv.org/abs/1902.09506
Johnson J, Gupta A, Li FF, 2018. Image generation from scene graphs. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1219–1228. https://doi.org/10.1109/CVPR.2018.00133
Kolodner J, 2014. Case-Based Reasoning. Morgan Kaufmann, San Mateo, USA.
Krishna R, Zhu YK, Groth O, et al., 2017. Visual genome: connecting language and vision using crowdsourced dense image annotations. Int J Comput Vis, 123(1):32–73. https://doi.org/10.1007/s11263-016-0981-7
Li ML, Zareian A, Zeng Q, et al., 2020. Cross-media structured common space for multimedia event extraction. https://arxiv.org/abs/2005.02472
Li YL, Xu L, Huang XJ, et al., 2019. HAKE: human activity knowledge engine. https://arxiv.org/abs/1904.06539v2
Liu DQ, Zhang HW, Zha ZJ, et al., 2019. Referring expression grounding by marginalizing scene graph likelihood. https://arxiv.org/abs/1906.03561v1
McCarthy J, Minsky ML, Rochester N, et al., 2006. A proposal for the Dartmouth summer research project on artificial intelligence. AI Mag, 27(4):12–14.
Mittal G, Agrawal S, Agarwal A, et al., 2019. Interactive image generation using scene graphs. https://arxiv.org/abs/1905.03743
Mu Z, Tang S, Tan J, et al., 2021. Disentangled motif-aware graph learning for phrase grounding. Proc 35th AAAI Conf on Artificial Intelligence.
Norcliffe-Brown W, Vafeais E, Parisot S, 2018. Learning conditioned graph structures for interpretable visual question answering. https://arxiv.org/abs/1806.07243v1
Pan YH, 2019. On visual knowledge. Front Inform Technol Electron Eng, 20(8):1021–1025. https://doi.org/10.1631/FITEE.1910001
Pan YH, 2020a. Miniaturized five fundamental issues about visual knowledge. Front Inform Technol Electron Eng, online. https://doi.org/10.1631/FITEE.2040000
Pan YH, 2020b. Multiple knowledge representation of artificial intelligence. Engineering, 6(3):216–217. https://doi.org/10.1016/j.eng.2019.12.011
Radford A, Metz L, Chintala S, 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. https://arxiv.org/abs/1511.06434
Shen K, Wu LF, Xu FL, et al., 2020. Hierarchical attention based spatial-temporal graph-to-sequence learning for grounded video description. Proc 29th Int Joint Conf on Artificial Intelligence, p.941–947. https://doi.org/10.24963/ijcai.2020/131
Tripathi S, Bhiwandiwalla A, Bastidas A, et al., 2019. Using scene graph context to improve image generation. https://arxiv.org/abs/1901.03762
Yang JW, Lu JS, Lee S, et al., 2018. Graph R-CNN for scene graph generation. Proc 15th European Conf on Computer Vision, p.690–706. https://doi.org/10.1007/978-3-030-01246-5_41
Yang X, Tang KH, Zhang HW, et al., 2019. Auto-encoding scene graphs for image captioning. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.10677–10686. https://doi.org/10.1109/CVPR.2019.01094
Yang XY, Mei T, Xu YQ, et al., 2016. Automatic generation of visual-textual presentation layout. ACM Trans Multim Comput Commun Appl, 12(2):33. https://doi.org/10.1145/2818709
Yu RC, Li A, Morariu VI, et al., 2017. Visual relationship detection with internal and external linguistic knowledge distillation. Proc IEEE Int Conf on Computer Vision, p.1068–1076. https://doi.org/10.1109/ICCV.2017.121
Zareian A, Karaman S, Chang SF, 2020. Weakly supervised visual semantic parsing. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.3733–3742. https://doi.org/10.1109/CVPR42600.2020.00379
Zhang HW, Kyaw Z, Chang SF, et al., 2017. Visual translation embedding network for visual relation detection. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.3107–3115. https://doi.org/10.1109/CVPR.2017.331
Zhang W, Wang XE, Tang S, et al., 2020. Relational graph learning for grounded video description generation. Proc 28th ACM Int Conf on Multimedia, p.3807–3828. https://doi.org/10.1145/3394171.3413746
Zhang W, Shi H, Tang S, et al., 2021. Consensus graph representation learning for better grounded image captioning. Proc 35th AAAI Conf on Artificial Intelligence.
Author information
Authors and Affiliations
Contributions
Yueting ZHUANG provided the main idea and outlined the manuscript. Siliang TANG drafted the manuscript. Yueting ZHUANG and Siliang TANG revised and finalized the paper.
Corresponding author
Ethics declarations
Yueting ZHUANG and Siliang TANG declare that they have no conflict of interest.