Abstract
During the last decade, the deluge of multimedia data has impacted a wide range of research areas, including multimedia retrieval, 3D tracking, database management, data mining, machine learning, social media analysis, medical imaging, and so on. Machine learning is largely involved in multimedia applications of building models for classification and regression tasks, etc., and the learning principle consists in designing the models based on the information contained in the multimedia dataset. While many paradigms exist and are widely used in the context of machine learning, most of them suffer from the ‘curse of dimensionality’, which means that some strange phenomena appears when data are represented in a high-dimensional space. Given the high dimensionality and the high complexity of multimedia data, it is important to investigate new machine learning algorithms to facilitate multimedia data analysis. To deal with the impact of high dimensionality, an intuitive way is to reduce the dimensionality. On the other hand, some researchers devoted themselves to designing some effective learning schemes for high-dimensional data. In this survey, we cover feature transformation, feature selection and feature encoding, three approaches fighting the consequences of the curse of dimensionality. Next, we briefly introduce some recent progress of effective learning algorithms. Finally, promising future trends on multimedia learning are envisaged.
Similar content being viewed by others
References
Bartlett, P.L., Hazan, E., Rakhlin, A.: Adaptive online gradient descent. In: NIPS (2007)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS, pp. 153–160 (2006)
Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: a benchmark. In: ECCV, pp. 414–429. Springer, Berlin, Heidelberg (2012)
Chandrashekar, G., Sahin, F.: A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014)
Chatfield, K., Lempitsky, V.S., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC, pp. 1–12 (2011)
Choi, S., Zhou, Q.-Y., Koltun, V.: Robust reconstruction of indoor scenes. In: CVPR (2015)
Choi, W., Pantofaru, C., Savarese, S.: A general framework for tracking multiple people from a moving camera. TPAMI 35(7), 1577–1591 (2013)
Cunningham, J.P., Ghahramani, Z.: Linear dimensionality reduction: survey, insights, and generalizations. JMLR (2015)
de Oliveira, L.E.S., Sabourin, R. Bortolozzi, F., Suen, C.Y.: A methodology for feature selection using multiobjective genetic algorithms for handwritten digit string recognition. In: IJPRAI (2003)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Van der Maaten, L.J.P., Postma, E.O., Van den Herik, H.J.: Dimensionality reduction: a comparative review. Technical Report TiCC TR 2009-005 (2009)
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)
Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. USA 100(10), 5591–5596 (2003)
Duchi, J.C., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. JMLR 12, 2121–2159 (2011)
Engel, D., Hüttenberger, L., Hamann, B.: A survey of dimension reduction methods for high-dimensional data analysis and visualization. In: VLUDS, pp. 135–149 (2011)
Escalante-B, A.N., Wiskott, L.: How to solve classification and regression problems on high-dimensional data with a supervised extension of slow feature analysis. JMLR 14, 3683–3719 (2013)
Feng, Z., Jin, R., Jain, A.: Large-scale image annotation by efficient and robust kernel metric learning. In: ICCV (2013)
Gao, L., Song, J., Nie, F., Yan, Y., Sebe, N., Shen, H.T.: Optimal graph leaning with partial tags and multiple features for image and video annotation. In: CVPR (2015)
Gao, L.L., Song, J., Shao, J. Zhu, X., Shen, H.T.: Zero-shot image categorization by image correlation exploration. In: ICMR, pp. 487–490 (2015)
Gao, L., Song, J., Zou, F., Zhang, D., Shao, J.: Scalable multimedia retrieval by deep learning hashing with relative similarity learning. In: ACM Multimedia (2015)
Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning, 1st edn. Addison-Wesley Longman Publishing Co., Inc, Boston (1989)
Gong, Y., Lazebnik, S., Gordo, A., Perronnin, F.: Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. TPAMI 35(12), 2916–2929 (2013)
Gupta, S., Arbeláez, P.A., Girshick, R.B., Malik, J.: Indoor scene understanding with RGB-D images: bottom-up segmentation, object detection and semantic segmentation. IJCV 112(2), 133–149 (2015)
Gupta, S., Girshick, R.B., Arbeláez, P.A., Malik, J.: Learning rich features from RGB-D images for object detection and segmentation. In: ECCV, pp. 345–360 (2014)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. JMLR 3, 1157–1182 (2003)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer New York Inc., New York (2001)
Hazan, E., Kale, S.: Extracting certainty from uncertainty: regret bounded by variation in costs. Mach. Learn. 80(2–3), 165–188 (2010)
He, R., Tan, T., Wang, L., Zheng, W.-S.: l2, 1 regularized correntropy for robust feature selection. In: CVPR, pp. 2504–2511 (2012)
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: NIPS (2005)
He, X., Niyogi, P.: Locality preserving projections. In: NIPS (2003)
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: CVPR (2015)
Javed, K., Babri, H.A., Saeed, M.: Feature selection based on class-dependent densities for high-dimensional binary data. TKDE 24(3), 465–477 (2012)
Jawanpuria, P., Varma, M., Nath, S.: On p-norm path following in multiple kernel learning for non-linear feature selection. In: ICML, pp. 118–126 (2014)
Jégou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)
Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2002)
Kantorov, V., Laptev, I.: Efficient feature extraction, encoding and classification for action recognition. In: CVPR (2014)
Kantorski, G.Z., Moreira, V.P., Heuser, C.A.: Automatic filling of hidden web forms: a survey. SIGMOD 44(1), 24–35 (2015)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
Khosla, A., An, B., Lim, J.J., Torralba, A.: Looking beyond the visible scene. In: CVPR (2014)
Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: ICASSP, pp. 3687–3691 (2013)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1), 273–324 (1997)
Kulis, B., Darrell, T.: Learning to hash with binary reconstructive embeddings. In: NIPS (2009)
Kumar, S., Udupa, R.: Learning hash functions for cross-view similarity search. In: IJCAI, pp. 1360–1365 (2011)
Lafon, S., Lee, A.B.: Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. TPAMI 28(9), 1393–1403 (2006)
Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR (2015)
Lazar, C., Taminau, J., Meganck, S., Steenhoff, D., Coletta, A., Molter, C., de Schaetzen, V., Duque, R., Bersini, H., Nowé, A.: A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans. Comput. Biol. Bioinform. 9(4), 1106–1119 (2012)
Lee, H., Ekanadham, C., Ng, A.Y.: Sparse deep belief net model for visual area V2. In: NIPS, pp. 873–880 (2007)
Lin, G., Shen, C., Shi, Q., van den Hengel, A., Suter, D.: Fast supervised hashing with decision trees for high-dimensional data. In: CVPR, pp. 1971–1978 (2014)
Lin, Z., Ding, G., Hu, M., Wang, J.: Semantics-preserving hashing for cross-view retrieval. In: CVPR, pp. 3864–3872 (2015)
Liu, W., Wang, J., Ji, R., Jiang, Y.-G., Chang, S.-F.: Supervised hashing with kernels. In: CVPR, pp. 2074–2081 (2012)
Liu, W., Wang, J., Ji, R., Jiang, Y.-G., Chang, S.-F.: Supervised hashing with kernels. In: CVPR (2012)
Liu, W., Wang, J., Kumar, S., Chang, S.-F.: Hashing with graphs. In: ICML, pp. 1–8 (2011)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)
McMahan, H.B.: Follow-the-regularized-leader and mirror descent: equivalence theorems and l1 regularization. In: ICAIS (2011)
Mittelman, R., Lee, H., Kuipers, B., Savarese, S.: Weakly supervised learning of mid-level features with beta-bernoulli process restricted Boltzmann machines. In: CVPR, pp. 476–483 (2013)
Mladenic, D.: Feature subset selection in text-learning. In: ECML (1998)
Neshatian, K., Zhang, M.: Genetic programming and class-wise orthogonal transformation for dimension reduction in classification problems. In: EuroGP, pp. 242–253 (2008)
Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: ICDM (2010)
Ngiam, J., Khosla, A., Kim, M., Nam, J., Lee, H., Ng, A.Y.: Multimodal deep learning. In: ICML (2011)
Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: CVPR (2015)
Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: CVPR, pp. 2161–2168 (2006)
Norouzi, M., Fleet, D.J.: Minimal loss hashing for compact binary codes. In: ICML, pp. 353–360 (2011)
Norouzi, M., Fleet, D.J.: Cartesian k-means. In: CVPR (2013)
Papandreou, G., Kokkinos, I., Savalle, P.-A.: Modeling local and global deformations in deep learning: epitomic convolution, multiple instance learning, and sliding window detection. In: CVPR (2015)
Pudil, P., Novovičová, J., Kittler, J.: Floating search methods in feature selection. Pattern Recognit. Lett. 15(11), 1119–1125 (1994)
Rehman, A., Javed, K., Babri, H.A., Saeed, M.: Relative discrimination criterion—a novel feature ranking method for text data. Expert Syst. Appl. 42(7), 3670–3681 (2015)
Reunanen, J.: Overfitting in making comparisons between variable selection methods. JMLR 3, 1371–1382 (2003)
Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contractive auto-encoders: explicit invariance during feature extraction. In: ICML, pp. 833–840 (2011)
Saini, M.K., Gadde, R., Yan, S., Ooi, W.T.: Movimash: online mobile video mashup. In: ACM Multimedia, pp. 139–148 (2012)
Salakhutdinov, R., Hinton, G.E.: Semantic hashing. Int. J. Approx. Reason. 50(7), 969–978 (2009)
Saul, L.K., Weinberger, K.Q., Ham, J.H., Sha, F., Lee, D.D.: Spectral methods for dimensionality reduction. Semisuperv. Learn., pp. 293–308 (2006)
Zhou, X., Chen, L., Zhang, Y., Cao, L., Huang, G., Wang, C.: Online video recommendation in sharing community. In: SIGMOD, pp. 1645–1656 (2015)
Shalev-Shwartz, S.: Online learning and online convex optimization. Found. Trends Mach. Learn. 4(2), 107–194 (2012)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Shi, M., Avrithis, Y., Jegou, H.: Early burst detection for memory-efficient image retrieval. In: CVPR (2015)
Sohn, K., Zhou, G., Lee, C., Lee, H.: Learning and selecting features jointly with point-wise gated Boltzmann machines. In: ICML, pp. 217–225 (2013)
Song, J., Gao, L., Yan, Y., Zhang, D., Sebe, N.: Supervised hashing with pseudo labels for scalable multimedia retrieval. In: ACM Multimedia (2015)
Song, J., Yang, Y., Yang, Y., Huang, Z., Shen, H.T.: Inter-media hashing for large-scale retrieval from heterogeneous data sources. In: SIGMOD, pp. 785–796 (2013)
Song, J., Yang, Y., Huang, Z., Shen, H.T., Hong, R.: Multiple feature hashing for real-time large scale near-duplicate video retrieval. In: ACM Multimedia, pp. 423–432 (2011)
Song, J., Yang, Y., Huang, Z., Shen, H.T., Luo, J.: Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimed. 15(8), 1997–2008 (2013)
Song, J., Yang, Y., Li, X., Huang, Z., Yang, Y.: Robust hashing with local models for approximate similarity search. IEEE Trans. Cybern. 44(7), 1225–1236 (2014)
Strecha, C., Bronstein, A.M., Bronstein, M.M., Fua, P.: Ldahash: improved matching with smaller descriptors. TPAMI 34(1), 66–78 (2012)
Teng, L., Li, H., Fu, X., Chen, W., Shen, I.-F.: Dimension reduction of microarray data based on local tangent space alignment. In: ICCI, pp. 154–159 (2005)
Torralba, A., Fergus, R., Weiss, Y.: Small codes and large image databases for recognition. In: CVPR, pp. 1–8 (2008)
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR 11, 3371–3408 (2010)
Wang, F., Kang, L., Li, Y.: Sketch-based 3d shape retrieval using convolutional neural networks. In: CVPR (2015)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)
Wang, J., Wang, J., Song, J., Xin-Shun, X., Shen, H.T., Li, S.: Optimized cartesian k-means. IEEE Trans. Knowl. Data Eng. 27(1), 180–192 (2015)
Wang, J., Wang, J., Yu, N., Li, S.: Order preserving hashing for approximate nearest neighbor search. In: ACM Multimedia (2013)
Wang, J., Kumar, S., Chang, S.-F.: Semi-supervised hashing for large-scale search. TPAMI 34(12), 2393–2406 (2012)
Wang, W., Huang, Y., Wang, Y., Wang, L.: Generalized autoencoder: a neural network framework for dimensionality reduction. In: CVPR Workshops, pp. 496–503 (2014)
Weber, R., Schek, H.-J., Blott, S.: A quantitative analysis and performance study for similarity-search methods in high-dimensional spaces. In: VLDB, pp. 194–205 (1998)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS, pp. 1753–1760 (2008)
Wichterich, M., Assent, I., Kranen, P., Seidl, T.: Efficient emd-based similarity search in multimedia databases via flexible dimensionality reduction. In: SIGMOD, pp. 199–212 (2008)
Wu, S., Flach, P.A.: Feature selection with labelled and unlabelled data. In: ECML/PKDD, pp. 156–167 (2002)
Xu, H., Wang, J., Li, Z., Zeng, G., Li, S., Yu, N.: Complementary hashing for approximate nearest neighbor search. In: ICCV, pp. 1631–1638 (2011)
Yao, B., Khosla, A., Fei-Fei, L.: Classifying actions and measuring action similarity by modeling the mutual context of objects and human poses. In: ICML (2011)
Zhang, L., Zhang, Y., Tang, J., Lu, K., Tian, Q.: Binary code ranking with weighted hamming distance. In: CVPR (2013)
Zhang, Y., Sohn, K., Villegas, R., Pan, G., Lee, H.: Improving object detection with deep convolutional networks via Bayesian optimization and structured prediction. In: CVPR (2015)
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004)
Zhen, Y., Yeung, D.-Y.: A probabilistic model for multimodal hash function learning. In: KDD, pp. 940–948 (2012)
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: ICLR (2015)
Zhou, K., Liu, Y., Song, J., Yan, L., Zou, F., Shen, F.: Deep self-taught hashing for image retrieval. In: ACM Multimedia (2015)
Zhu, X., Huang, Z., Cheng, H., Cui, J., Shen, H.T.: Sparse hashing for fast multimedia search. ACM Trans. Inf. Syst. 31(2), 9 (2013)
Zhu, X., Huang, Z., Shen, H.T., Cheng, J., Xu, C.: Dimensionality reduction by mixed kernel canonical correlation analysis. Pattern Recognit. 45(8), 3003–3016 (2012)
Zhu, X., Huang, Z., Shen, H.T., Zhao, X.: Linear cross-modal hashing for efficient multimedia search. In: ACM Multimedia, pp. 143–152 (2013)
Zhu, X., Huang, Z., Yang, Y., Shen, H.T., Xu, C., Luo, J.: Self-taught dimensionality reduction on the high-dimensional small-sized data. Pattern Recognit. 46(1), 215–229 (2013)
Zhu, X., Suk, H.-I., Lee, S.-W., Shen, D.: Canonical feature selection for joint regression and multi-class identification in Alzheimers disease diagnosis. Brain Imaging Behav., pp. 1–11 (2015). doi:10.1007/s11682-015-9430-4
Zhu, X., Suk, H.-I., Lee, S.-W., Shen, D.: Subspace regularized sparse multi-task learning for multi-class neurodegenerative disease identification. IEEE Trans. Biomed. Eng. (2015)
Zhu, X., Suk, H.-I., Shen, D.: Sparse discriminative feature selection for multi-class alzheimer’s disease classification. In: MICCAI, pp. 157–164 (2014)
Zhu, X., Zhang, L., Huang, Z.: A sparse embedding and least variance encoding approach to hashing. IEEE Trans. Image Process. 23(9), 3737–3750 (2014)
Zou, F., Chen, Y., Song, J., Zhou, K., Yang, Y., Sebe, N.: Compact image fingerprint via multiple kernel hashing. IEEE Trans. Multime. 17(7), 1006–1018 (2015)
Zou, F., Feng, H., Ling, H., Liu, C., Yan, L., Li, P., Li, D.: Nonnegative sparse coding induced hashing for image copy detection. Neurocomputing 105, 81–89 (2013)
Zou, F., Liu, C., Ling, H., Feng, H., Yan, L., Li, D.: Least square regularized spectral hashing for similarity search. Signal Process. 93(8), 2265–2273 (2013)
Acknowledgments
The work of Lianli Gao has been partially supported by NSFC (Grant No. 61502080) and by the Fundamental Research Funds for the Central University (Grant No. ZYGX2014J063). The work of Junming Shao has been supported partially by NSFC (Grant Nos. 61403062, 61433014), and Fundamental Research Funds for the Central Universities (Grant No. ZYGX2014J053).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gao, L., Song, J., Liu, X. et al. Learning in high-dimensional multimedia data: the state of the art. Multimedia Systems 23, 303–313 (2017). https://doi.org/10.1007/s00530-015-0494-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-015-0494-1