Abstract
Most image data available are often stored in a compressed format, from which JPEG is the most widespread. To feed this data on a convolutional neural network (CNN), a preliminary decoding process is required to obtain RGB pixels, demanding a high computational load and memory usage. For this reason, the design of CNNs for processing JPEG compressed data has gained attention in recent years. In most existing works, typical CNN architectures are adapted to facilitate the learning with the DCT coefficients rather than RGB pixels. Although they are effective, their architectural changes either raise the computational costs or neglect relevant information from DCT inputs. In this paper, we examine different ways of speeding up CNNs designed for DCT inputs, exploiting learning strategies to reduce the computational complexity by taking full advantage of DCT inputs. Our experiments were conducted on the ImageNet dataset. Results show that learning how to combine all DCT inputs in a data-driven fashion is better than discarding them by hand, and its combination with a reduction of layers has proven to be effective for reducing the computational costs while retaining accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: CVPR, pp. 1251–1258 (2017)
Deguerre, B., Chatelain, C., Gasso, G.: Fast object detection in compressed JPEG images. In: IEEE Intelligent Transportation Systems Conference (ITSC’19), pp. 333–338 (2019)
Delac, K., Grgic, M., Grgic, S.: Face recognition in JPEG and JPEG2000 compressed domain. Image Vision Comput. 27(8), 1108–1120 (2009)
Ehrlich, M., Davis, L.S.: Deep residual learning in the JPEG transform domain. In: ICCV, pp. 3484–3493 (2019)
Gueguen, L., Sergeev, A., Kadlec, B., Liu, R., Yosinski, J.: Faster neural networks straight from JPEG. In: NIPS, pp. 3937–3948 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). arXiv preprint arXiv:1704.04861
Li, Y., Gu, S., Gool, L.V., Timofte, R.: Learning filter basis for convolutional neural network compression. In: ICCV, pp. 5623–5632 (2019)
Lin, M., Chen, Q., Yan, S.: Network in network (2013). arXiv preprint arXiv:1312.4400
Liu, W., et al.: SSD: single shot multibox detector. In: ECCV, pp. 21–37 (2016)
Luong, M.T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation. In: Conference on Empirical Methods in Natural Language Processing (EMNLP’15), pp. 1412–1421 (2015)
Poursistani, P., Nezamabadi-pour, H., Moghadam, R.A., Saeed, M.: Image indexing and retrieval in JPEG compressed domain based on vector quantization. Math. Comput. Modell. 57(5–6), 1005–1017 (2013)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Santos, S.F., Almeida, J.: Faster and accurate compressed video action recognition straight from the frequency domain. In: SIBGRAPI - Conference on Graphics, Patterns and Images (SIBGRAPI’20), pp. 62–68 (2020)
Santos, S.F., Sebe, N., Almeida, J.: CV-C3D: action recognition on compressed videos with convolutional 3d networks. In: SIBGRAPI - Conference on Graphics, Patterns and Images (SIBGRAPI’19), pp. 24–30 (2019)
Santos, S.F., Sebe, N., Almeida, J.: The good, the bad, and the ugly: neural networks straight from jpeg. In: ICIP, pp. 1896–1900 (2020)
Tan, M., Le, Q.V.: Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML, pp. 6105–6114 (2019)
Acknowledgment
This research was supported by the FAPESP-Microsoft Research Virtual Institute (grant 2017/25908-6) and the Brazilian National Council for Scientific and Technological Development - CNPq (grant 314868/2020-8).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
dos Santos, S.F., Almeida, J. (2021). Less Is More: Accelerating Faster Neural Networks Straight from JPEG. In: Tavares, J.M.R.S., Papa, J.P., González Hidalgo, M. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2021. Lecture Notes in Computer Science(), vol 12702. Springer, Cham. https://doi.org/10.1007/978-3-030-93420-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-030-93420-0_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-93419-4
Online ISBN: 978-3-030-93420-0
eBook Packages: Computer ScienceComputer Science (R0)