An Integrated Framework Based on GAN and RBI for Learning with Insufficient Datasets
<p>The membership function <math display="inline"><semantics> <mrow> <mi>M</mi> <msub> <mi>F</mi> <mn>0</mn> </msub> </mrow> </semantics></math> of the original MTD method, whose corresponding value of <math display="inline"><semantics> <mrow> <mi>C</mi> <msub> <mi>L</mi> <mi>j</mi> </msub> </mrow> </semantics></math> is set to 1 (the highest plausibility of this value occurring) [<a href="#B17-symmetry-14-00339" class="html-bibr">17</a>].</p> "> Figure 2
<p>The training flow and architecture of WGAN_MTD2.</p> ">
Abstract
:1. Introduction
2. Literature Review
2.1. Virtual Sample Generation
2.2. Generative Adversarial Networks
2.3. Robust Bayesian Inference
3. Learning Framework of Integrating RBI and GAN
3.1. Modified MTD with RBI
3.2. The Architecture of Modified WGAN_MTD
4. Experimental Studies
4.1. Evaluation Criterion
4.2. Experiment Environment and Datasets
4.3. Experiment Results
5. Conclusions and Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chao, G.Y.; Tsai, T.I.; Lu, T.J.; Hsu, H.C.; Bao, B.Y.; Wu, W.Y.; Lin, M.T.; Lu, T.L. A new approach to prediction of radiotherapy of bladder cancer cells in small dataset analysis. Expert Syst. Appl. 2011, 38, 7963–7969. [Google Scholar] [CrossRef]
- Ivǎnescu, V.C.; Bertrand, J.W.M.; Fransoo, J.C.; Kleijnen, J.P.C. Bootstrapping to solve the limited data problem in production control: An application in batch process industries. J. Oper. Res. Soc. 2006, 57, 2–9. [Google Scholar] [CrossRef] [Green Version]
- Kuo, Y.; Yang, T.; Peters, B.A.; Chang, I. Simulation metamodel development using uniform design and neural networks for automated material handling systems in semiconductor wafer fabrication. Simul. Model. Pract. Theory 2007, 15, 1002–1015. [Google Scholar] [CrossRef]
- Lanouette, R.; Thibault, J.; Valade, J.L. Process modeling with neural networks using small experimental datasets. Comput. Chem. Eng. 1999, 23, 1167–1176. [Google Scholar] [CrossRef]
- Oniśko, A.; Druzdzel, M.J.; Wasyluk, H. Learning Bayesian network parameters from small data sets: Application of Noisy-OR gates. Int. J. Approx. Reason. 2001, 27, 165–182. [Google Scholar] [CrossRef] [Green Version]
- Huang, C.J.; Wang, H.F.; Chiu, H.J.; Lan, T.H.; Hu, T.M.; Loh, E.W. Prediction of the period of psychotic episode in individual schizophrenics by simulation-data construction approach. J. Med. Syst. 2010, 34, 799–808. [Google Scholar] [CrossRef]
- Li, D.C.; Lin, W.K.; Chen, C.C.; Chen, H.Y.; Lin, L.S. Rebuilding sample distributions for small dataset learning. Decis. Support Syst. 2018, 105, 66–76. [Google Scholar] [CrossRef]
- Liu, Y.; Zhou, Y.; Liu, X.; Dong, F.; Wang, C.; Wang, Z. Wasserstein GAN-Based Small-Sample Augmentation for New-Generation Artificial Intelligence: A Case Study of Cancer-Staging Data in Biology. Engineering 2019, 5, 156–163. [Google Scholar] [CrossRef]
- Gonzalez-Abril, L.; Angulo, C.; Ortega, J.A.; Lopez-Guerra, J.L. Generative Adversarial Networks for Anonymized Healthcare of Lung Cancer Patients. Electronics 2021, 10, 2220. [Google Scholar] [CrossRef]
- Ali-Gombe, A.; Elyan, E. MFC-GAN: Class-imbalanced dataset classification using multiple fake class generative adversarial network. Neurocomputing 2019, 361, 212–221. [Google Scholar] [CrossRef]
- Shamsolmoali, P.; Zareapoor, M.; Shen, L.; Sadka, A.H.; Yang, J. Imbalanced data learning by minority class augmentation using capsule adversarial networks. Neurocomputing 2021, 459, 481–493. [Google Scholar] [CrossRef]
- Vuttipittayamongkol, P.; Elyan, E. Improved overlap-based undersampling for imbalanced dataset classification with application to epilepsy and parkinson’s disease. Int. J. Neural Syst. 2020, 30, 2050043. [Google Scholar] [CrossRef] [PubMed]
- Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; CRC Press: New York, NY, USA, 1994. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. arXiv 2017, arXiv:1701.07875. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Li, D.-C.; Chen, S.-C.; Lin, Y.-S.; Huang, K.-C. A Generative Adversarial Network Structure for Learning with Small Numerical Data Sets. Appl. Sci. 2021, 11, 10823. [Google Scholar] [CrossRef]
- Niyogi, P.; Girosi, F.; Poggio, T. Incorporating prior information in machine learning by creating virtual examples. Proc. IEEE 1998, 86, 2196–2208. [Google Scholar] [CrossRef] [Green Version]
- Li, D.C.; Chen, L.S.; Lin, Y.S. Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. Int. J. Prod. Res. 2003, 41, 4011–4024. [Google Scholar] [CrossRef]
- Li, D.C.; Lin, Y.S. Using virtual sample generation to build up management knowledge in the early manufacturing stages. Eur. J. Oper. Res. 2006, 175, 413–434. [Google Scholar] [CrossRef]
- Huang, C.F. Principle of information diffusion. Fuzzy Sets Syst. 1997, 91, 69–90. [Google Scholar]
- Huang, C.; Moraga, C. A diffusion-neural-network for learning from small samples. Int. J. Approx. Reason. 2004, 35, 137–161. [Google Scholar] [CrossRef] [Green Version]
- Li, D.C.; Wu, C.S.; Tsai, T.I.; Lina, Y.S. Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Comput. Oper. Res. 2007, 34, 966–982. [Google Scholar] [CrossRef]
- Khot, L.; Panigrahi, S.; Woznica, S. Neural-network-based classification of meat: Evaluation of techniques to overcome small dataset problems. Biol. Eng. Trans. 2008, 1, 127–143. [Google Scholar] [CrossRef]
- Yamashita, R.; Nishio, M.; Do, R.K.G.; Togashi, K. Convolutional neural networks: An overview and application in radiology. Insights Imaging 2018, 9, 611–629. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Bland, J.M.; Altman, D.G. Bayesians and frequentists. Br. Med. J. 1998, 317, 1151–1160. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- De Finetti, B. Theory of Probability: A Critical Introductory Treatment; John Wiley & Sons, Ltd: Chichester, UK, 2017; ISBN 9781119286370. [Google Scholar]
- Avila, L.; Martínez, E. An active inference approach to on-line agent monitoring in safety–critical systems. Adv. Eng. Inform. 2015, 29, 1083–1095. [Google Scholar] [CrossRef]
- Chen, P.; Wu, K.; Ghattas, O. Bayesian inference of heterogeneous epidemic models: Application to COVID-19 spread accounting for long-term care facilities. Comput. Methods Appl. Mech. Eng. 2021, 385, 114020. [Google Scholar] [CrossRef] [PubMed]
- Huang, Y.; Shao, C.; Wu, B.; Beck, J.L.; Li, H. State-of-the-art review on Bayesian inference in structural system identification and damage assessment. Adv. Struct. Eng. 2019, 22, 1329–1351. [Google Scholar] [CrossRef]
- Snihur, Y.; Wiklund, J. Searching for innovation: Product, process, and business model innovations and search behavior in established firms. Long Range Planning 2019, 52, 305–325. [Google Scholar] [CrossRef]
- Berger, J.O.; Moreno, E.; Pericchi, L.R.; Bayarri, M.J.; Bernardo, J.M.; Cano, J.A.; De la Horra, J.; Martín, J.; Ríos-Insúa, D.; Betrò, B.; et al. An overview of robust Bayesian analysis. Test 1994, 3, 5–124. [Google Scholar]
- Lin, Y.S. Modeling with insufficient data to increase prediction stability. In Proceedings of the 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Kumamoto, Japan, 10–14 July 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 719–724. [Google Scholar]
- Lin, Y.S. Small sample regression: Modeling with insufficient data. In Proceedings of the 40th International Conference on Computers & Indutrial Engineering, Awaji Island, Japan, 26–28 July 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–7. [Google Scholar]
Datasets | Total Samples | Input Attributes | Output Attributes | Number of Samples | ||
---|---|---|---|---|---|---|
Class 1 | Class 2 | Class 3 | ||||
Wine | 178 | 13 | 1 | 59 | 71 | 48 |
Seeds | 210 | 6 | 1 | 70 | 70 | 70 |
Cervical Cancer | 72 | 18 | 1 | 21 | 51 | - |
Lung Cancer | 32 | 55 | 1 | 9 | 13 | 10 |
Averaged Accuracy | SVM_poly | SVM_rbf | DT | NBC |
SDS | 55.323% | 57.684% | 68.171% | 61.905% |
WGAN_MTD | 77.673% * | 78.323% * | 85.632% | 74.271% |
WGAN_MTD2 | 83.119% **,++ | 79.160% *,+ | 86.342% **,+ | 79.104% ***,++ |
Accuracy (%) | Learning Model | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SVM_poly | SVM_rbf | DT | NBC | ||||||||||
SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | ||
10 with 100 virtual samples | average | 55.323 | 77.673 | 83.119 | 57.684 | 78.323 | 79.160 | 68.171 | 85.632 | 86.342 | 61.905 | 74.271 | 79.104 |
Comparsion | * | **,++ | * | **,+ | **,+ | ***,++ | |||||||
15 with 100 virtual samples | average | 70.319 | 80.849 | 86.421 | 72.438 | 80.077 | 82.109 | 73.119 | 84.371 | 86.231 | 75.125 | 81.903 | 80.111 |
Comparsion | ** | **,++ | *,+ | ** | ***,+ | ns | *,+ | ||||||
20 with 100 virtual samples | average | 75.369 | 82.263 | 85.157 | 77.069 | 85.512 | 86.287 | 74.731 | 83.132 | 84.781 | 79.709 | 80.739 | 82.246 |
Comparsion | ** | ***,++ | ns | *,+ | ** | ***,++ | ns | *,+ |
Accuracy (%) | Learning Model | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SVM_poly | SVM_rbf | DT | NBC | ||||||||||
SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | ||
10 with 100 virtual samples | average | 69.403 | 74.532 | 80.619 | 83.527 | 85.720 | 84.760 | 79.797 | 79.180 | 81.275 | 71.895 | 74.907 | 78.944 |
Comparsion | * | **,++ | ns | *,+ | ns | + | * | **,++ | |||||
15 with 100 virtual samples | average | 72.785 | 80.171 | 82.219 | 87.303 | 87.711 | 86.809 | 82.018 | 82.191 | 84.233 | 81.837 | 82.328 | 82.191 |
Comparsion | *** | ***,++ | ns | ns | ns | *,+ | ns | *,+ | |||||
20 with 100 virtual samples | average | 76.789 | 81.981 | 82.578 | 88.762 | 89.563 | 87.876 | 84.705 | 85.561 | 85.192 | 86.004 | 85.372 | 84.286 |
Comparsion | ** | ***,++ | ns | * | ns | ns | ns | *,+ |
Accuracy (%) | Learning Model | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SVM_poly | SVM_rbf | DT | NBC | ||||||||||
SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | ||
10 with 100 virtual samples | average | 70.593 | 75.391 | 82.953 | 84.771 | 84.953 | 83.701 | 73.184 | 76.453 | 80.372 | 70.462 | 72.682 | 80.944 |
Comparsion | * | **,++ | * | *,++ | ns | *,+ | * | **,++ | |||||
15 with 100 virtual samples | average | 74.963 | 81.224 | 84.829 | 86.921 | 88.761 | 87.139 | 80.253 | 81.741 | 83.723 | 79.937 | 81.592 | 82.143 |
Comparsion | *** | ***,+++ | * | ns | ns | *,++ | * | *,+ | |||||
20 with 100 virtual samples | average | 75.971 | 82.891 | 83.816 | 87.267 | 88.938 | 87.943 | 83.535 | 86.171 | 84.982 | 80.922 | 82.782 | 83.736 |
Comparsion | ** | ***,++ | * | * | ns | ns | ns | *,++ |
Accuracy (%) | Learning Model | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SVM_poly | SVM_rbf | DT | NBC | ||||||||||
SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | SDS | WGAN_MTD | WGAN_MTD2 | ||
10 with 100 virtual samples | average | 55.443 | 63.102 | 71.528 | 60.173 | 69.040 | 73.811 | 61.053 | 72.610 | 75.732 | 61.231 | 70.432 | 74.567 |
Comparsion | * | **,++ | * | *,+ | *,+ | * | **,++ | ||||||
15 with 100 virtual samples | average | 60.683 | 70.214 | 75.289 | 61.034 | 70.671 | 77.019 | 64.431 | 73.341 | 77.873 | 65.387 | 71.052 | 76.431 |
Comparsion | ** | ***,+++ | * | **,+ | *,+ | * | *,++ | ||||||
20 with 100 virtual samples | average | 71.251 | 76.981 | 78.386 | 70.426 | 74.898 | 78.413 | 72.139 | 76.711 | 80.802 | 70.112 | 72.892 | 81.076 |
Comparsion | ** | **,++ | * | **,+ | ** | **,+ | ** | **,++ |
Learning Accuracy from SVM_poly | Wine | Seeds | Cervical Cancer | Lung Cancer |
---|---|---|---|---|
SDS | 75.369% | 76.789% | 75.971% | 71.251% |
WGAN_MTD | 82.263% ** | 81.981% ** | 82.891% ** | 76.981% ** |
WGAN_MTD2 | 85.157% ***,++ | 82.578% ***,++ | 83.816% ***,++ | 78.386% **,++ |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lin, Y.-S.; Lin, L.-S.; Chen, C.-C. An Integrated Framework Based on GAN and RBI for Learning with Insufficient Datasets. Symmetry 2022, 14, 339. https://doi.org/10.3390/sym14020339
Lin Y-S, Lin L-S, Chen C-C. An Integrated Framework Based on GAN and RBI for Learning with Insufficient Datasets. Symmetry. 2022; 14(2):339. https://doi.org/10.3390/sym14020339
Chicago/Turabian StyleLin, Yao-San, Liang-Sian Lin, and Chih-Ching Chen. 2022. "An Integrated Framework Based on GAN and RBI for Learning with Insufficient Datasets" Symmetry 14, no. 2: 339. https://doi.org/10.3390/sym14020339
APA StyleLin, Y. -S., Lin, L. -S., & Chen, C. -C. (2022). An Integrated Framework Based on GAN and RBI for Learning with Insufficient Datasets. Symmetry, 14(2), 339. https://doi.org/10.3390/sym14020339