Multimodal Human Recognition in Significantly Low Illumination Environment Using Modified EnlightenGAN
<p>Overall procedure of proposed method.</p> "> Figure 2
<p>Architecture of modified EnlightenGAN: (<b>a</b>) generator and (<b>b</b>) discriminator.</p> "> Figure 2 Cont.
<p>Architecture of modified EnlightenGAN: (<b>a</b>) generator and (<b>b</b>) discriminator.</p> "> Figure 3
<p>Example of DFB-DB3 figures obtained from (<b>a</b>) the Logitech C920 camera and (<b>b</b>) the Logitech BCC950 camera. (<b>c</b>) Converted low-illumination figure of DFB-DB3.</p> "> Figure 4
<p>Example images for ChokePoint dataset. (<b>a</b>) Original images of ChokePoint dataset [<a href="#B34-mathematics-09-01934" class="html-bibr">34</a>]. (<b>b</b>) Converted low-illumination image of ChokePoint dataset.</p> "> Figure 4 Cont.
<p>Example images for ChokePoint dataset. (<b>a</b>) Original images of ChokePoint dataset [<a href="#B34-mathematics-09-01934" class="html-bibr">34</a>]. (<b>b</b>) Converted low-illumination image of ChokePoint dataset.</p> "> Figure 5
<p>Method for data augmentation including (<b>a</b>) cropping and image translation, and (<b>b</b>) horizontal flipping.</p> "> Figure 6
<p>Graphs illustrating training accuracy and loss of DFB-DB3 (<b>a</b>–<b>d</b>) result and ChokePoint dataset (<b>e</b>–<b>h</b>) result. VGG face net-16 with respect to (<b>a</b>,<b>e</b>) the first fold and (<b>b</b>,<b>f</b>) the second fold. ResNet-50 with respect to (<b>c</b>,<b>g</b>) the first fold and (<b>d</b>,<b>h</b>) the second fold.</p> "> Figure 6 Cont.
<p>Graphs illustrating training accuracy and loss of DFB-DB3 (<b>a</b>–<b>d</b>) result and ChokePoint dataset (<b>e</b>–<b>h</b>) result. VGG face net-16 with respect to (<b>a</b>,<b>e</b>) the first fold and (<b>b</b>,<b>f</b>) the second fold. ResNet-50 with respect to (<b>c</b>,<b>g</b>) the first fold and (<b>d</b>,<b>h</b>) the second fold.</p> "> Figure 7
<p>Comparisons of output images by original EnlightenGAN and modified EnlightenGAN. (<b>a</b>) Original normal illumination image. (<b>b</b>) Low-illumination image. Output images by (<b>c</b>) original EnlightenGAN, and (<b>d</b>) modified EnlightenGAN.</p> "> Figure 7 Cont.
<p>Comparisons of output images by original EnlightenGAN and modified EnlightenGAN. (<b>a</b>) Original normal illumination image. (<b>b</b>) Low-illumination image. Output images by (<b>c</b>) original EnlightenGAN, and (<b>d</b>) modified EnlightenGAN.</p> "> Figure 8
<p>ROC curves of recognition accuracies with or without modified EnlightenGAN. Results of (<b>a</b>) face and body recognition, and (<b>b</b>) various score-level fusions.</p> "> Figure 9
<p>Graph for ROC curves acquired using our method and the previous GAN-based techniques. (<b>a</b>,<b>b</b>) Recognition results for face and body images, and (<b>c</b>) score-level fusion result.</p> "> Figure 9 Cont.
<p>Graph for ROC curves acquired using our method and the previous GAN-based techniques. (<b>a</b>,<b>b</b>) Recognition results for face and body images, and (<b>c</b>) score-level fusion result.</p> "> Figure 10
<p>ROC curves acquired by the previous methods and proposed method. (<b>a</b>) Recognition results for face image, and (<b>b</b>) recognition results for body and face image.</p> "> Figure 11
<p>CMC curves of the proposed and state-of-the-art methods. (<b>a</b>) Face recognition results obtained by the proposed and state-of-the-art methods, and (<b>b</b>) face and body recognition results obtained by the proposed and state-of-the-art methods.</p> "> Figure 12
<p>Graphs for <span class="html-italic">t</span>-test result between the second best model and our proposed method with respect to accuracy of average recognition. (<b>a</b>) Comparison between ResNet-50 and the proposed method, and (<b>b</b>) comparison between ResNet IDE + LIME and the proposed method.</p> "> Figure 13
<p>Cases of FA, FR, and correction recognition. (<b>a</b>) Cases of FA, (<b>b</b>) cases of FR, and (<b>c</b>) correct cases. In (<b>a</b>–<b>c</b>), left and right images are enrolled and recognized images, respectively.</p> "> Figure 13 Cont.
<p>Cases of FA, FR, and correction recognition. (<b>a</b>) Cases of FA, (<b>b</b>) cases of FR, and (<b>c</b>) correct cases. In (<b>a</b>–<b>c</b>), left and right images are enrolled and recognized images, respectively.</p> "> Figure 14
<p>Results on class activation feature map of DFB-DB3. (<b>a</b>,<b>b</b>) are the activation map results from face images. From the left image to the right image: original image, low-illumination image, enhancement image from modified EnlightenGAN, result from 7th ReLU layer, result from 12th ReLU layer, and result from 13th ReLU layer of VGG face-net 16. (<b>c</b>,<b>d</b>) are the activation map results from body images. From the left image to the right image: original image, low-illumination image, enhancement image from modified EnlightenGAN, result from 3rd batch-normalized layer, result from conv5 2nd block, and result from conv5 3rd block of ResNet-50.</p> "> Figure 15
<p>ROC curves of recognition accuracies with and without modified EnlightenGAN. Results of (<b>a</b>) face and body recognition, and (<b>b</b>) various score-level fusions.</p> "> Figure 15 Cont.
<p>ROC curves of recognition accuracies with and without modified EnlightenGAN. Results of (<b>a</b>) face and body recognition, and (<b>b</b>) various score-level fusions.</p> "> Figure 16
<p>ROC curves acquired by our proposed method and previous GAN-based methods. (<b>a</b>,<b>b</b>) Recognition results for face and body images, and (<b>c</b>) score-level fusion result.</p> "> Figure 16 Cont.
<p>ROC curves acquired by our proposed method and previous GAN-based methods. (<b>a</b>,<b>b</b>) Recognition results for face and body images, and (<b>c</b>) score-level fusion result.</p> "> Figure 17
<p>ROC curves acquired using the proposed method and the previous methods. (<b>a</b>) Results for face recognition, and (<b>b</b>) results for body and face recognition.</p> "> Figure 18
<p>CMC curves of the proposed and state-of-the-art methods. (<b>a</b>) Face recognition results obtained by the proposed and state-of-the-art methods, and (<b>b</b>) face and body recognition results obtained by the proposed and state-of-the-art methods.</p> "> Figure 19
<p>Graphs for <span class="html-italic">t</span>-test result of the second best model and our proposed method with regard to average recognition accuracy. (<b>a</b>) Comparison of VGG face net-16 [<a href="#B44-mathematics-09-01934" class="html-bibr">44</a>] and the proposed method, and (<b>b</b>) comparison of the proposed method and ELF [<a href="#B48-mathematics-09-01934" class="html-bibr">48</a>].</p> "> Figure 20
<p>Cases of FA, FR, and correction recognition on ChokePoint database [<a href="#B34-mathematics-09-01934" class="html-bibr">34</a>]. (<b>a</b>) Cases of FA, (<b>b</b>) cases of FR, and (<b>c</b>) correct cases. In (<b>a</b>–<b>c</b>), the left and right images are enrolled and recognized images, respectively.</p> "> Figure 20 Cont.
<p>Cases of FA, FR, and correction recognition on ChokePoint database [<a href="#B34-mathematics-09-01934" class="html-bibr">34</a>]. (<b>a</b>) Cases of FA, (<b>b</b>) cases of FR, and (<b>c</b>) correct cases. In (<b>a</b>–<b>c</b>), the left and right images are enrolled and recognized images, respectively.</p> "> Figure 21
<p>Results on class activation feature map on ChokePoint dataset [<a href="#B34-mathematics-09-01934" class="html-bibr">34</a>]. (<b>a</b>,<b>b</b>) are activation map results from face images. From the left image to the right image: original image, low-illumination image, enhancement image from modified EnlightenGAN, result from 7th ReLU layer, result from 12th ReLU layer, and result from 13th ReLU layer of VGG face-net 16. (<b>c</b>,<b>d</b>) are activation map results from body images. From the left image to the right image: original image, low-illumination image, enhancement image from modified EnlightenGAN, result from 3rd batch-normalized layer, result from conv5 2nd block, and result from conv5 3rd block of ResNet-50.</p> "> Figure 22
<p>Examples of original images captured in real low-light environments.</p> "> Figure 23
<p>Comparisons of (<b>a</b>) ROC and (<b>b</b>) CMC curves obtained by proposed method and state-of-the-art-methods.</p> "> Figure 24
<p>Example of an original image from an open database captured in a real low-light environment.</p> "> Figure 25
<p>Comparisons of (<b>a</b>) ROC and (<b>b</b>) CMC curves obtained by the proposed method and state-of-the-art-methods with an open database.</p> "> Figure 25 Cont.
<p>Comparisons of (<b>a</b>) ROC and (<b>b</b>) CMC curves obtained by the proposed method and state-of-the-art-methods with an open database.</p> "> Figure 26
<p>Jetson TX2 embedded system.</p> ">
Abstract
:1. Introduction
- -
- It is the first study of face- and body-based human recognition in very low illumination images. Accordingly, a modified EnlightenGAN is newly proposed.
- -
- A modified EnlightenGAN is a model that has an improved size (40 × 40) of the input patch of a discriminator compared to that (32 × 32) of the conventional EnlightenGAN. In addition, it uses the improved features of the rectified linear unit (ReLU) 5-3 layer in the discriminator for computing the self-feature preserving loss, whereas the conventional EnlightenGAN uses the features of the ReLU 5-1 layer. Therefore, the human recognition performance in a very low illumination image was improved compared to the conventional EnlightenGAN.
- -
- The structural complexity was reduced by separating the modified EnlightenGAN for converting a low-illumination image to a normal illumination one and employing CNNs for human recognition.
2. Related Work
2.1. Recognition That Does Not Consider the Low-Illumination Condition
2.2. Recognition That Considers the Low-Illumination Condition
3. Proposed Methods
3.1. System Overview
3.2. Structure of Modified EnlightenGAN
3.3. Loss Function of Modified EnlightenGAN
3.4. Deep CNNs and Score-Level Fusion for Face and Body Recognition
4. Experimental Results and Analysis
4.1. Experimental Environment and Database
4.2. Training of Modified EnlightenGAN and CNN Models
4.3. Testing of Modified EnlightenGAN and CNN Models with DFB-DB3
4.3.1. Ablation Studies
4.3.2. Comparisons of Proposed Method with State-of-the-Art Methods
4.3.3. Class Activation Map
4.4. Testing of Modified EnlightenGAN and CNN Models with ChokePoint Dataset
4.4.1. Ablation Studies
4.4.2. Comparing between the Proposed Method and the Previous Techniques
4.4.3. Class Activation Feature Map
4.5. Testing of Proposed Method with Low-Illuminated Images Captured in Real Environments
4.6. Comparisons of Desktop Computer and Jetson TX2 Processing Time
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Grgic, M.; Delac, K.; Grgic, S. SCface–surveillance cameras face database. Multimed. Tools Appl. 2011, 51, 863–879. [Google Scholar] [CrossRef]
- Banerjee, S.; Das, S. Domain adaptation with soft-margin multiple feature-kernel learning beats deep learning for surveillance face recognition. arXiv 2016, arXiv:1610.01374v2. [Google Scholar]
- Varior, R.R.; Haloi, M.; Wang, G. Gated siamese convolutional neural network architecture for human re-identification. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 791–808. [Google Scholar]
- Shi, H.; Yang, Y.; Zhu, X.; Liao, S.; Lei, Z.; Zheng, W.; Li, S.Z. Embedding deep metric for individual re-identification: A study against large variations. In Proceedings of the 14th European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 732–748. [Google Scholar]
- Han, J.; Bhanu, B. Statistical feature fusion for gait-based human recognition. In Proceedings of the IEEE Conference and Computer Vision and Pattern Recognition, Washington, DC, USA, 27 June–2 July 2004; pp. II-842–II-847. [Google Scholar]
- Liu, Z.; Sarkar, S. Outdoor recognition at a distance by fusing gait and face. Image Vision Comput. 2007, 25, 817–832. [Google Scholar] [CrossRef] [Green Version]
- Koo, J.H.; Cho, S.W.; Baek, N.R.; Kim, M.C.; Park, K.R. CNN-based multimodal human recognition in surveillance environments. Sensors 2018, 18, 3040. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015; pp. 1–14. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Kamenetsky, D.; Yiu, S.Y.; Hole, M. Image enhancement for face recognition in adverse environments. In Proceedings of the Digital Image Computing: Techniques and Applications, Canberra, Australia, 10–13 December 2018; pp. 1–6. [Google Scholar]
- Huang, Y.H.; Chen, H.H. Face recognition under low illumination via deep feature reconstruction network. In Proceedings of the IEEE International Conference on Image Processing, Abu Dhabi, United Arab Emirates, 25–28 October 2020; pp. 2161–2165. [Google Scholar]
- Poon, B.; Amin, M.A.; Yan, H. PCA based human face recognition with improved methods for distorted images due to illumination and color background. IAENG Intern. J. Comput. Sci. 2016, 43, 277–283. [Google Scholar]
- Zhang, T.; Tang, Y.Y.; Fang, B.; Shang, Z.; Liu, X. Face recognition under varying illumination using gradientfaces. IEEE Trans. Image Proc. 2009, 18, 2599–2606. [Google Scholar] [CrossRef] [PubMed]
- Zhao, M.; Wang, L. Face recognition based on a novel illumination normalization method. In Proceedings of the 5th International Congress on Image and Signal Processing, Chongqing, China, 16–18 October 2012; pp. 434–438. [Google Scholar]
- Vu, N.S.; Caplier, A. Illumination-robust face recognition using retina modeling. In Proceedings of the 16th IEEE International Conference on Image Processing, Cairo, Egypt, 7–10 November 2009; pp. 3289–3292. [Google Scholar]
- Kang, Y.; Pan, W. A novel approach of low-light image denoising for face recognition. Adv. Mech. Eng. 2014, 6, 1–13. [Google Scholar] [CrossRef]
- Ren, D.; Ma, H.; Sun, L.; Yan, T. A novel approach of low-light image used for face recognition. In Proceedings of the 4th International Conference on Computer Science and Network Technology, Harbin, China, 19–20 December 2015; pp. 790–793. [Google Scholar]
- Viola, P.; Jones, M.J. Robust Real-Time Face Detection. Int. J. Comput. Vis. 2004, 57, 137–154. [Google Scholar] [CrossRef]
- Kazemi, V.; Sullivan, J. OneMillisecond Face Alignment with an Ensemble of Regression Trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1867–1874. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. arXiv 2014, arXiv:1406.2661. [Google Scholar]
- Kupyn, O.; Budzan, V.; Mykhailych, M.; Mishkin, D.; Matas, J. DeblurGAN: Blind motion deblurring using conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8183–8192. [Google Scholar]
- Isola, P.; Zhu, J.-Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 5967–5976. [Google Scholar]
- Zhu, J.-Y.; Park, T.S.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2242–2251. [Google Scholar]
- Jiang, Y.; Gong, X.; Liu, D.; Cheng, Y.; Fang, C.; Shen, X.; Yang, J.; Zhou, P.; Wang, Z. EnlightenGAN: Deep light enhancement without paired supervision. IEEE Trans. Image Process. 2021, 30, 2340–2349. [Google Scholar] [CrossRef] [PubMed]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on Mahcine Learning, Lille, France, 7–9 July 2015; pp. 448–456. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.K.; Wang, Z.; Smolley, S.P. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2813–2821. [Google Scholar]
- Wolf, L.; Hassner, T.; Maoz, I. Face recognition in unconstrained videos with matched background similarity. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Colorado Springs, CO, USA, 20–25 June 2011; pp. 529–534. [Google Scholar]
- Huang, G.B.; Ramesh, M.; Berg, T.; Learned-miller, E. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. In Proceedings of the Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, Marseille, France, 17 October 2008; pp. 1–11. [Google Scholar]
- Mateo, J.R.S.C. Weighted Sum Method and Weighted Product Method. In Multi Criteria Analysis in the Renewable Energy Industry; Green Energy and Technology; Springer: London, UK, 2012. [Google Scholar] [CrossRef]
- Vapnik, V. Statistical Learning Theory; Wiley: Hoboken, NJ, USA, 1998. [Google Scholar]
- Logitech BCC950 Camera. Available online: https://www.logitech.com/en-roeu/product/conferencecam-bcc950?crid=1689 (accessed on 1 March 2021).
- Logitech C920 Camera. Available online: https://www.logitech.com/en-us/product/hd-pro-webcam-c920?crid=34 (accessed on 1 March 2021).
- ChokePoint Dataset. Available online: http://arma.sourceforge.net/chokepoint/ (accessed on 26 February 2021).
- CUDA. Available online: https://developer.nvidia.com/cuda-10.0-download-archive (accessed on 18 April 2021).
- NVIDIA GeForce GTX 1070 Card. Available online: https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1070/ (accessed on 12 May 2021).
- Pytorch. Available online: https://pytorch.org/get-started/previous-versions (accessed on 18 April 2021).
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia, Orlando, FL, USA, 3–7 November 2014; pp. 675–678. [Google Scholar]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016, arXiv:1609.04747. [Google Scholar]
- Oxford Face Database. Available online: https://www.robots.ox.ac.uk/~vgg/data/vgg_face/ (accessed on 18 April 2021).
- Stathaki, T. Image Fusion: Algorithms and Applications; Academic Press: Cambridge, MA, USA, 2008. [Google Scholar]
- Salomon, D. Data Compression: The Complete Reference, 4th ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep face recognition. In Proceedings of the British Machine Vision Conference, Swansea, UK, 7–10 September 2015; pp. 1–12. [Google Scholar]
- Gruber, I.; Hlaváč, M.; Železný, M.; Karpov, A. Facing face recognition with ResNet: Round one. In Proceedings of the International Conference on Interaction Collaborative Robotics, Hatfield, UK, 12–16 September 2017; pp. 67–74. [Google Scholar]
- Martínez-Díaz, Y.; Méndez-Vázquez, H.; López-Avila, L.; Chang, L.; Enrique Sucar, L.; Tistarelli, M. Toward more realistic face recognition evaluation protocols for the youtube faces database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 526–534. [Google Scholar]
- Guo, X.; Li, Y.; Ling, H. LIME: Low-Light Image Enhancement via Illumination Map Estimation. IEEE Trans. Image Process. 2017, 26, 982–993. [Google Scholar] [CrossRef] [PubMed]
- Gray, D.; Tao, H. Viewpoint invariant pedestrian recognition with an ensemble of localized features. In Proceedings of the 10th European Conference on Computer Vision, Marseille, France, 12–18 October 2008; pp. 262–275. [Google Scholar]
- Livingston Edward, H. Who was student and why do we care so much about his t-test? J. Surg. Res. 2004, 118, 58–65. [Google Scholar] [CrossRef] [PubMed]
- Cohen, J. A power primer. Psychol. Bull. 1992, 112, 155–159. [Google Scholar] [CrossRef] [PubMed]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Cho, S.W.; Baek, N.R.; Kim, M.C.; Koo, J.H.; Kim, J.H.; Park, K.R. Face Detection in Nighttime Images Using Visible-Light Camera Sensors with Two-Step Faster Region-Based Convolutional Neural Network. Sensors 2018, 18, 2995. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Open Database of Fudan University. Available online: https://cv.fudan.edu.cn/_upload/tpl/06/f4/1780/template1780/humandetection.htm (accessed on 26 May 2021).
- Jetson TX2 Module. Available online: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-tx2/ (accessed on 12 December 2020).
- Tensorflow: The Python Deep Learning Library. Available online: https://www.tensorflow.org/ (accessed on 12 December 2020).
- Keras: The Python Deep Learning Library. Available online: https://keras.io/ (accessed on 12 December 2020).
- CUDNN. Available online: https://developer.nvidia.com/cudnn (accessed on 11 January 2021).
- CUDA. Available online: https://developer.nvidia.com/cuda-90-download-archive (accessed on 11 January 2021).
- Dongguk Face and Body Database Version 3 (DFB-DB3), Modified EnlightenGAN, and CNN Models for Face & Body Recognition. Available online: http://dm.dgu.edu/link.html (accessed on 11 March 2021).
Type | Techniques | Strength | Weakness | |
---|---|---|---|---|
Not considering low-illumination condition | Face recognition | PCA [1] | Good recognition performance when images are captured up close | Recognition performance may be degraded owing to external light |
SML-MKFC with DA [2] | ||||
Texture- and shape-based body recognition | S-CNN [3] | Requires less data for recognition compared to gait-based recognition | Recognition performance may be degraded in a low-illumination environment | |
CNN+DDML [4] | ||||
Gait-based body recognition | Synthetic GEI, PCA + MDA [5] | Recognition performance less affected by low illumination | Requires an extended period of time to obtain gait data | |
Gait-based body and face recognition | HMM/Gabor feature-based EBGM [6] | |||
Body and face recognition based on texture and shape | ResNet-50 and VGG face net-16 [7] | Takes relatively less time to acquire data compared to gait-based recognition | Possibility of clothes color being changed owing to lighting changes or losing important facial features | |
Considering low-illumination condition | Face recognition | MPEf and fMPE [10] | Able to perform face recognition in a low-illumination condition | Did not consider face and body recognition in a very low illumination environment |
FRN [11] | ||||
Gradientfaces [12,13] | ||||
DCT and local normalized method [14] | ||||
DoG filter and adaptive nonlinear function [15] | ||||
DeLFN [16] | ||||
Homomorphic filter and image multiplication [17] | ||||
Face and body recognition | Proposed method | Recognition possible in a very low illumination environment | Requires more time to process face and body recognition data |
Type of Layer | Feature Map Sizes (Height × Width × Channel) | Number of Filters | Filter Sizes | Number of Strides | Number of Paddings |
---|---|---|---|---|---|
Input image layer | 224 × 224 × 3 | ||||
Attention map layer | 224 × 224 × 1 | ||||
Concatenated layer | 224 × 224 × 4 | ||||
Convolution block1 | 224 × 224 × 32 | 32 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block2 | 224 × 224 × 32 | 32 | 3 × 3 | 1 × 1 | 1 × 1 |
Maxpooling layer1 | 112 × 112 × 32 | 2 × 2 | 2 × 2 | ||
Convolution block3 | 112 × 112 × 64 | 64 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block4 | 112 × 112 × 64 | 64 | 3 × 3 | 1 × 1 | 1 × 1 |
Maxpooling layer2 | 56 × 56 × 64 | 2 × 2 | 2 × 2 | ||
Convolution block5 | 56 × 56 × 128 | 128 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block6 | 56 × 56 × 128 | 128 | 3 × 3 | 1 × 1 | 1 × 1 |
Maxpooling layer3 | 28 × 28 × 128 | 2 × 2 | 2 × 2 | ||
Convolution block7 | 28 × 28 × 256 | 256 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block8 | 28 × 28 × 256 | 256 | 3 × 3 | 1 × 1 | 1 × 1 |
Maxpooling layer4 | 14 × 14 × 256 | 2 × 2 | 2 × 2 | ||
Convolution block9 | 14 × 14 × 512 | 512 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block10 | 14 × 14 × 512 | 512 | 3 × 3 | 1 × 1 | 1 × 1 |
Deconvolution block1 | 28 × 28 × 256 | 256 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block11 | 28 × 28 × 256 | 256 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block12 | 28 × 28 × 256 | 256 | 3 × 3 | 1 × 1 | 1 × 1 |
Deconvolution block2 | 56 × 56 × 128 | 128 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block13 | 56 × 56 × 128 | 128 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block14 | 56 × 56 × 128 | 128 | 3 × 3 | 1 × 1 | 1 × 1 |
Deconvolution block3 | 112 × 112 × 64 | 64 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block15 | 112 × 112 × 64 | 64 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block16 | 112 × 112 × 64 | 64 | 3 × 3 | 1 × 1 | 1 × 1 |
Deconvolution block4 | 224 × 224 × 32 | 32 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block17 | 224 × 224 × 32 | 32 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution block18 | 224 × 224 × 32 | 32 | 3 × 3 | 1 × 1 | 1 × 1 |
Convolution layer (Output layer) | 224 × 224 × 3 | 3 | 1 × 1 | 1 × 1 |
Type of Layer | Feature Map Sizes (Height × Width × Channel) | Number of Filters | Filter Sizes | Number of Strides | Number of Paddings |
---|---|---|---|---|---|
Input image layer | 224 × 224 × 3 | ||||
Target image layer | 224 × 224 × 3 | ||||
Convolution block1 | 113 × 113 × 64 | 64 | 4 × 4 | 2 × 2 | 2 × 2 |
Convolution block2 | 58 × 58 × 128 | 128 | 4 × 4 | 2 × 2 | 2 × 2 |
Convolution block3 | 30 × 30 × 256 | 256 | 4 × 4 | 2 × 2 | 2 × 2 |
Convolution block4 | 16 × 16 × 512 | 512 | 4 × 4 | 2 × 2 | 2 × 2 |
Convolution block5 | 9 × 9 × 512 | 512 | 4 × 4 | 2 × 2 | 2 × 2 |
Convolution block6 | 10 × 10 × 512 | 512 | 4 × 4 | 1 × 1 | 2 × 2 |
Convolution layer | 11 × 11 × 1 | 1 | 4 × 4 | 1 × 1 | 2 × 2 |
Type of Layer | Feature Map Sizes (Height × Width × Channel) | Number of Filters | Filter Sizes | Number of Strides | Number of Paddings |
---|---|---|---|---|---|
Layer of input image | 40 × 40 × 3 | ||||
Layer of target image | 40 × 40 × 3 | ||||
Convolution block1 | 21 × 21 × 64 | 64 | 4 × 4 | 2 × 2 | 2 × 2 |
Convolution block2 | 12 × 12 × 128 | 128 | 4 × 4 | 2 × 2 | 2 × 2 |
Convolution block3 | 7 × 7 × 256 | 256 | 4 × 4 | 2 × 2 | 2 × 2 |
Convolution block4 | 5 × 5 × 256 | 512 | 4 × 4 | 2 × 2 | 2 × 2 |
Convolution block5 | 6 × 6 × 512 | 512 | 4 × 4 | 1 × 1 | 2 × 2 |
Convolution layer | 7 × 7 × 1 | 1 | 4 × 4 | 1 × 1 | 2 × 2 |
DFB-DB3 | ChokePoint Dataset | ||||||
---|---|---|---|---|---|---|---|
Number of Each Fold Class | Number of Testing Images | Number of Augmented Training Images | Number of Each Fold Class | Number of Testing Images | Number of Augmented Training Images | ||
Face | Sub-Dataset1 | 11 | 827 | 200,134 | 14 | 10,381 | 332,192 |
Sub-Dataset2 | 11 | 989 | 239,338 | 14 | 10,269 | 328,608 | |
Body | Sub-Dataset1 | 11 | 827 | 200,134 | 14 | 10,381 | 332,192 |
Sub-Dataset2 | 11 | 989 | 239,338 | 14 | 10,269 | 328,608 |
Method | Number of Patches | Face | Body |
---|---|---|---|
SNR | 5 | 15.532 | 14.172 |
8 | 19.294 | 15.343 | |
11 | 11.237 | 11.2 | |
PSNR | 5 | 24.42 | 23.533 |
8 | 28.181 | 24.704 | |
11 | 20.125 | 20.61 | |
SSIM | 5 | 0.78 | 0.703 |
8 | 0.896 | 0.769 | |
11 | 0.727 | 0.697 |
Method | Original EnlightenGAN [24] | Modified EnlightenGAN | ||
---|---|---|---|---|
Face | Body | Face | Body | |
SNR | 19.59 | 18.852 | 19.294 | 15.343 |
PSNR | 28.478 | 28.213 | 28.181 | 24.704 |
SSIM | 0.895 | 0.849 | 0.896 | 0.769 |
Method | Face | Body | |
---|---|---|---|
Without modified EnlightenGAN | 1st fold | 20.22 | 22.08 |
2nd fold | 29.32 | 21.32 | |
Average | 24.77 | 21.7 | |
With modified EnlightenGAN | 1st fold | 11.21 | 19 |
2nd fold | 8.44 | 19.76 | |
Average | 9.825 | 19.38 |
Method | Score-Level Fusion | |||
---|---|---|---|---|
SVM | Weighted Product | Weighted Sum | ||
Without modified EnlightenGAN | 1st fold | 33.62 | 16.979 | 17.077 |
2nd fold | 25.73 | 21.605 | 21.345 | |
Average | 29.675 | 19.292 | 19.211 | |
With modified EnlightenGAN | 1st fold | 11.552 | 8.805 | 8.848 |
2nd fold | 11.51 | 7.735 | 7.812 | |
Average | 11.531 | 8.27 | 8.33 |
Method | 1st Fold | 2nd Fold | Average | |
---|---|---|---|---|
Face | Modified EnlightenGAN | 11.21 | 8.44 | 9.825 |
Original EnlightenGAN [24] | 12.51 | 8.53 | 10.52 | |
CycleGAN [23] | 14.2 | 6.47 | 10.335 | |
Pix2pix [22] | 12.26 | 7.7 | 9.98 | |
Body | Modified EnlightenGAN | 19 | 19.76 | 19.38 |
Original EnlightenGAN [24] | 18.52 | 19.47 | 18.995 | |
CycleGAN [23] | 21.89 | 21.38 | 21.635 | |
Pix2pix [22] | 17.04 | 18.39 | 17.715 | |
Weighted sum | Modified EnlightenGAN (face) + Pix2pix (body) | 7.819 | 7.034 | 7.427 |
Modified EnlightenGAN (face and body) | 8.848 | 7.812 | 8.33 | |
CycleGAN [23] | 12.067 | 5.866 | 8.967 | |
Pix2pix [22] | 10.466 | 5.506 | 7.986 | |
Weighted product | Modified EnlightenGAN (face) + Pix2pix (body) | 7.623 | 6.958 | 7.291 |
Modified EnlightenGAN (face and body) | 8.805 | 7.735 | 8.27 | |
CycleGAN [23] | 10.767 | 5.659 | 8.213 | |
Pix2pix [22] | 9.35 | 5.399 | 7.375 | |
SVM | Modified EnlightenGAN (face) + Pix2pix (body) | 9.72 | 7.12 | 8.42 |
Modified EnlightenGAN (face and body) | 11.552 | 11.51 | 11.531 | |
CycleGAN [23] | 13.32 | 6.57 | 9.945 | |
Pix2pix [22] | 9.08 | 5.9 | 7.49 |
Method | 1st Fold | 2nd Fold | Average |
---|---|---|---|
Proposed method | 7.623 | 6.958 | 7.291 |
VGG face net-16 [44] | 25.21 | 35.59 | 30.4 |
ResNet-50 [45,46] | 22.92 | 30.13 | 26.525 |
Method | 1st Fold | 2nd Fold | Average |
---|---|---|---|
ResNet IDE + LIME [47] | 20.3 | 29.26 | 24.78 |
ELF [48] | 28.02 | 30.07 | 29.045 |
Proposed method | 7.623 | 6.958 | 7.291 |
Method | Face | Body | |
---|---|---|---|
Without modified EnlightenGAN | 1st fold | 25.5 | 38.06 |
2nd fold | 28.49 | 32.07 | |
Average | 26.995 | 35.065 | |
With modified EnlightenGAN | 1st fold | 13.25 | 28.96 |
2nd fold | 11.47 | 25.94 | |
Average | 12.36 | 27.45 |
Method | Score-Level Fusion | |||
---|---|---|---|---|
Weighted Sum | Weighted Product | SVM | ||
Without modified EnlightenGAN | 1st fold | 24.44 | 24.4 | 40.16 |
2nd fold | 23.52 | 23.47 | 40.02 | |
Average | 23.98 | 23.935 | 40.09 | |
With modified EnlightenGAN | 1st fold | 12.59 | 12.542 | 23.82 |
2nd fold | 19.83 | 19.928 | 17.16 | |
Average | 16.21 | 16.235 | 20.49 |
Method | 1st Fold | 2nd Fold | Average | |
---|---|---|---|---|
Face | Modified EnlightenGAN | 13.25 | 11.47 | 12.36 |
Original EnlightenGAN [24] | 12.88 | 15.17 | 14.025 | |
CycleGAN [23] | 15.87 | 11.11 | 13.49 | |
Body | Modified EnlightenGAN | 28.96 | 25.94 | 27.45 |
Original EnlightenGAN [24] | 25.01 | 23.86 | 24.435 | |
CycleGAN [23] | 27.16 | 29.56 | 28.36 | |
Weighted sum | Modified EnlightenGAN (face) + Pix2pix (body) | 11.821 | 9.54 | 10.681 |
Modified EnlightenGAN (face and body) | 12.59 | 9.92 | 11.255 | |
CycleGAN [23] | 13.69 | 10.32 | 12.005 | |
Pix2pix [22] | 9.835 | 13.43 | 11.633 | |
Weighted product | Modified EnlightenGAN (face) + Pix2pix (body) | 11.689 | 9.49 | 10.59 |
Modified EnlightenGAN (face and body) | 12.542 | 9.87 | 11.206 | |
CycleGAN [23] | 13.59 | 10.23 | 11.91 | |
Pix2pix [22] | 9.843 | 14.31 | 12.08 | |
SVM | Modified EnlightenGAN (face) + Pix2pix (body) | 17.17 | 15.51 | 16.34 |
Modified EnlightenGAN (face and body) | 23.82 | 17.16 | 20.49 | |
CycleGAN [23] | 21.15 | 16.83 | 18.99 | |
Pix2pix [22] | 15.86 | 8.76 | 12.31 |
Method | 1st Fold | 2nd Fold | Average |
---|---|---|---|
Proposed method | 11.689 | 9.49 | 10.59 |
VGG face net-16 [44] | 32.72 | 39.61 | 36.165 |
ResNet-50 [45,46] | 39.94 | 41.21 | 40.575 |
Method | 1st Fold | 2nd Fold | Average |
---|---|---|---|
ResNet IDE + LIME [47] | 30.88 | 49.47 | 40.175 |
ELF [48] | 36.27 | 30.6 | 33.435 |
Proposed method | 11.689 | 9.49 | 10.59 |
GAN Model | Jetson TX2 | Desktop Computer |
---|---|---|
Modified EnlightenGAN | 288.1 | 18.8 |
CNN Model | Desktop Computer | Jetson TX2 |
---|---|---|
VGG face net-16 | 24.6 | 91.7 |
ResNet-50 | 18.93 | 40.9 |
Total | 43.53 | 132.6 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Koo, J.H.; Cho, S.W.; Baek, N.R.; Park, K.R. Multimodal Human Recognition in Significantly Low Illumination Environment Using Modified EnlightenGAN. Mathematics 2021, 9, 1934. https://doi.org/10.3390/math9161934
Koo JH, Cho SW, Baek NR, Park KR. Multimodal Human Recognition in Significantly Low Illumination Environment Using Modified EnlightenGAN. Mathematics. 2021; 9(16):1934. https://doi.org/10.3390/math9161934
Chicago/Turabian StyleKoo, Ja Hyung, Se Woon Cho, Na Rae Baek, and Kang Ryoung Park. 2021. "Multimodal Human Recognition in Significantly Low Illumination Environment Using Modified EnlightenGAN" Mathematics 9, no. 16: 1934. https://doi.org/10.3390/math9161934
APA StyleKoo, J. H., Cho, S. W., Baek, N. R., & Park, K. R. (2021). Multimodal Human Recognition in Significantly Low Illumination Environment Using Modified EnlightenGAN. Mathematics, 9(16), 1934. https://doi.org/10.3390/math9161934