Abstract
A random foam trains several fuzzy-rule-foam function approximators and then combines them into a single rule-based approximator. The foam systems train independently on bootstrapped random samples from a trained neural classifier. The foam systems convert the neural black box into an interpretable set of rules. The fuzzy rule-based systems have an underlying probability mixture structure that gives rise to an interpretable Bayesian posterior over the rules for each input. A rule foam also measures the uncertainty in its outputs through the conditional variance of the generalized probability mixture. A random foam combines the learned additive fuzzy systems by averaging their throughputs or rule structure. A random foam is also interpretable in terms of its rules, its posterior of its rules, and its conditional variance. Thirty 1000-rule foams trained on random subsets of the MNIST digit data set. Each such foam system had about 93.5% classification accuracy. The random foam that averaged throughputs achieved \(96.80\%\) accuracy while the random foam that averaged only their outputs achieved 96.06% accuracy. The throughput-averaged random foam also slightly outperformed a standard random forest that output-averaged 30 classification trees. Thirty 1000-rule foams also trained on a deep neural classifier that had 96.26% accuracy. The random foam that averaged these foam throughputs was itself 96.14% accurate. The random foam that averaged their outputs was just 95.6% accurate. The appendix proves a Gaussian combined-foam version of the uniform approximation theorem for additive fuzzy systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
A. Adadi, M. Berrada, Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)
A. Buzo, R.M. Gray et al., An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)
B. Efron, T. Hastie, Computer Age Statistical Inference, vol. 5 (Cambridge University Press, 2016)
S. Grossberg, Competitive learning: from interactive activation to adaptive resonance. Cogn. Sci. 11(1), 23–63 (1987)
R. Guimerà, I. Reichardt, A. Aguilar-Mogas, F.A. Massucci, M. Miranda, J. Pallarès, M. Sales-Pardo, A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6(5) (2020)
F. Horst, S. Lapuschkin, W. Samek, K.-R. Müller, W.I. Schöllhorn, Explaining the unique nature of individual gait patterns with deep learning. Sci. Rep. 9(1), 1–13 (2019)
A.K. Jain, Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
T. Kohonen, The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)
B. Kosko, Stochastic competitive learning. IEEE Trans. Neural Netw. 2(5), 522–529 (1991)
B. Kosko, Fuzzy systems as universal approximators. IEEE Trans. Comput. 43(11), 1329–1333 (1994). November
B. Kosko, Fuzzy Engineering (Prentice-Hall, 1996)
B. Kosko, Additive fuzzy systems: from generalized mixtures to rule Continua. Int. J. Intell. Syst. 33(8), 1573–1623 (2018)
V. Kreinovich, G.C. Mouzouris, H.T. Nguyen, Fuzzy rule based modeling as a universal approximation tool, in Fuzzy Systems (Springer, 1998), pp. 135–195
Y. LeCun, C. Cortes, MNIST handwritten digit database (2010)
J.B. MacQueen, Some methods for classification and analysis of multivariate observations. Western Management Science Institute University. Technical report, of California Working Paper (1966)
A.K. Panda, B. Kosko, Converting neural networks to rule foam, pp. 519–525 (2019)
W. Rudin, Real and Complex Analysis (McGraw-hill education, 2006)
W. Samek, Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, vol. 11700 (Springer Nature, 2019)
F. Watkins, The representation problem for additive fuzzy systems, in Proceedings of the International Conference on Fuzzy Systems (IEEE FUZZ-95) (pp. 117–122) (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
7 Appendix : Fuzzy Approximation Theorem for a Combination of Gaussian SAMs
7 Appendix : Fuzzy Approximation Theorem for a Combination of Gaussian SAMs
Theorem 1 extends a version of the fuzzy approximation theorem [10] to the case of combined additive systems. The Gaussian structure of the if-part sets gives insight into the uniform approximation. The approximation may need ab exponentially growing total number of rules mq with respect to the number of dimensions n.
Theorem 1
A combination of q Gaussian SAMs can uniformly approximate a continuous function f over a compact space.
Proof
We combine q SAMs to approximate f(x) with F(x) by mixing their then-part sets. We want to show that the approximation error obeys \(|f(x)-F(x)| < \varepsilon \) for any \(\varepsilon >0\) using a finite number of rules. We use Gaussian if-part sets with set functions \(a_j^k(x) = \exp \{-||x-\hat{x}_j^k||^2/(d_j^k)^2\}\). We assign equal weights, equal volumes, and equal dispersions to all the rules: \(v^k = v\), \(w_j^k = w\), \(V_j^k = V\), and \(d_j^k = d\). This reduces the generalized mixing weights \(p_j(x)\) to softmax activation functions:
Multiply f(x) by \(\sum _{k=1}^q\sum _{j=1}^m p_j^k(x)=1\) and then use (7):
Rearrange the terms and use the triangle inequality:
because we center the then-part set around \(f(\hat{x}_j^k)\). So \(c_j^k = f(\hat{x}_j^k)\). The target function f is uniformly continuous because it is continuous and X is compact. So for all \(\varepsilon > 0\) there is a \(\delta > 0\) such that \(|f(x) - f(\hat{x}_j^k)| < \varepsilon /2\) if \(||x - \hat{x}_j^k|| < \delta \) . This splits the rules into the two sets \(J_1= \{ (j,k) : ||x - \hat{x}_j^k||<\delta \}\) and \(J_2= \{ (j,k) : ||x - \hat{x}_j^k||\ge \delta \}\). We assume that \(J_1 \ne \emptyset \). Then (18) becomes
Consider the first term on the right side of (19). The absolute differences obey \(\big |(f(x)-f(\hat{x}_j^k))\big | < \varepsilon /2\) because \((j,k) \in J_1\)implies that \(||x - \hat{x}_j^k||<\delta \). Add these terms over all \((j,k) \in J_1\):
because \(\sum _{(j,k)\in J_1} p_j^k(x) \le \sum _{k=1}^q\sum _{j=1}^m p_j^k(x) = 1\). Now pick a real number C large enough that \(|f(x) - f(\hat{x}_j^k)|<C\) for all j and k. Pick \((j,k) \in J_2\) and pick \((j_1, k_1) \in J_1\). Then \(||x-\hat{x}_j^k||\ge \delta \) and \(||x - \hat{x}_{j_1}^{k_1}||<\delta \). Pick a real number D small enough that \(||x - \hat{x}_j^k||^2 - ||x - \hat{x}_{j_1}^{k_1}||^2 \ge D > 0\). Choose the rule if-part dispersion d small enough that \(\exp (D/d^2)>2mqC/\varepsilon \). Then the definition of D gives
Add \(\sum \limits _{i\ne j_1}\sum \limits _{l\ne k_1}\frac{\exp {(-||x - \hat{x}_i^l||^2/d^2)}}{\exp {(-||x - \hat{x}_j^k||^2/d^2)}}\) to both sides of the above inequality:
because \(\sum \limits _{i\ne j_1}\sum \limits _{l\ne k_1}\frac{\exp {(-||x - \hat{x}_i^l||^2/d^2)}}{\exp {(-||x - \hat{x}_j^k||^2/d^2)}}>0\). Equation (16) and the definition of C give
Sum both sides over all \((j,k) \in J_2\):
because \(J_2\) has fewer than mq elements. Then (20) and (24) add to give the approximation-error bound: \(|f(x) {-} F(x)| \le \varepsilon /2 +\varepsilon /2 <\varepsilon \). So the combined SAM with mq rules uniformly approximates f. \(\square \)
The proof gives insight into a foam’s rule explosion. The proof assumes that \(J_1 \ne \emptyset \). So for all \(x \in X\) there is a \(\hat{x}_j^k \) such that \(||x - \hat{x}_j||<\delta \). Let \(N_{\delta }(\hat{x}_j^k) = \{ x : ||x - \hat{x}_j^k||<\delta \}\) denote the \(\delta \)-neighborhood around \(\hat{x}_j^k\). Then \(\forall \; x \in X :\; \exists \; \hat{x}_j^k : x \in N_{\delta }(\hat{x}_j^k)\). This gives
X is bounded because X is compact by hypothesis. Let an n-dimensional sphere of radius R contain X. The volume of this sphere is \(cR^n\) for some constant c. Each \(N_{\delta }(\hat{x}_j^k)\)’s is an n-dimensional sphere with radius \(\delta \) and volume \(c\delta ^n\). Use the \(\delta \)-neighborhoods to cover the R-sphere that contains X. The \(N_{\delta }(\hat{x}_j^k)\)’s are not disjoint. So the volume of the R-sphere is less than the volume of the mq \(N_{\delta }(\hat{x}_j^k)\)’s combined: \(cR^n \le mqc\delta ^n\). Then \( mq \ge (R/\delta )^n\). This inequality shows that the required total number of rules grows exponentially with the dimension n.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Panda, A.K., Kosko, B. (2022). Random Fuzzy-Rule Foams for Explainable AI. In: Bede, B., Ceberio, M., De Cock, M., Kreinovich, V. (eds) Fuzzy Information Processing 2020. NAFIPS 2020. Advances in Intelligent Systems and Computing, vol 1337. Springer, Cham. https://doi.org/10.1007/978-3-030-81561-5_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-81561-5_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-81560-8
Online ISBN: 978-3-030-81561-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)