[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

Random Fuzzy-Rule Foams for Explainable AI

  • Conference paper
  • First Online:
Fuzzy Information Processing 2020 (NAFIPS 2020)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1337))

Included in the following conference series:

  • 298 Accesses

Abstract

A random foam trains several fuzzy-rule-foam function approximators and then combines them into a single rule-based approximator. The foam systems train independently on bootstrapped random samples from a trained neural classifier. The foam systems convert the neural black box into an interpretable set of rules. The fuzzy rule-based systems have an underlying probability mixture structure that gives rise to an interpretable Bayesian posterior over the rules for each input. A rule foam also measures the uncertainty in its outputs through the conditional variance of the generalized probability mixture. A random foam combines the learned additive fuzzy systems by averaging their throughputs or rule structure. A random foam is also interpretable in terms of its rules, its posterior of its rules, and its conditional variance. Thirty 1000-rule foams trained on random subsets of the MNIST digit data set. Each such foam system had about 93.5% classification accuracy. The random foam that averaged throughputs achieved \(96.80\%\) accuracy while the random foam that averaged only their outputs achieved 96.06% accuracy. The throughput-averaged random foam also slightly outperformed a standard random forest that output-averaged 30 classification trees. Thirty 1000-rule foams also trained on a deep neural classifier that had 96.26% accuracy. The random foam that averaged these foam throughputs was itself 96.14% accurate. The random foam that averaged their outputs was just 95.6% accurate. The appendix proves a Gaussian combined-foam version of the uniform approximation theorem for additive fuzzy systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 159.50
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 199.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. A. Adadi, M. Berrada, Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018)

    Article  Google Scholar 

  2. A. Buzo, R.M. Gray et al., An algorithm for vector quantizer design. IEEE Trans. Commun. 28(1), 84–95 (1980)

    Article  Google Scholar 

  3. B. Efron, T. Hastie, Computer Age Statistical Inference, vol. 5 (Cambridge University Press, 2016)

    Google Scholar 

  4. S. Grossberg, Competitive learning: from interactive activation to adaptive resonance. Cogn. Sci. 11(1), 23–63 (1987)

    Article  Google Scholar 

  5. R. Guimerà, I. Reichardt, A. Aguilar-Mogas, F.A. Massucci, M. Miranda, J. Pallarès, M. Sales-Pardo, A Bayesian machine scientist to aid in the solution of challenging scientific problems. Sci. Adv. 6(5) (2020)

    Google Scholar 

  6. F. Horst, S. Lapuschkin, W. Samek, K.-R. Müller, W.I. Schöllhorn, Explaining the unique nature of individual gait patterns with deep learning. Sci. Rep. 9(1), 1–13 (2019)

    Article  Google Scholar 

  7. A.K. Jain, Data clustering: 50 years beyond K-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  8. T. Kohonen, The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)

    Article  Google Scholar 

  9. B. Kosko, Stochastic competitive learning. IEEE Trans. Neural Netw. 2(5), 522–529 (1991)

    Article  Google Scholar 

  10. B. Kosko, Fuzzy systems as universal approximators. IEEE Trans. Comput. 43(11), 1329–1333 (1994). November

    Article  Google Scholar 

  11. B. Kosko, Fuzzy Engineering (Prentice-Hall, 1996)

    Google Scholar 

  12. B. Kosko, Additive fuzzy systems: from generalized mixtures to rule Continua. Int. J. Intell. Syst. 33(8), 1573–1623 (2018)

    Article  Google Scholar 

  13. V. Kreinovich, G.C. Mouzouris, H.T. Nguyen, Fuzzy rule based modeling as a universal approximation tool, in Fuzzy Systems (Springer, 1998), pp. 135–195

    Google Scholar 

  14. Y. LeCun, C. Cortes, MNIST handwritten digit database (2010)

    Google Scholar 

  15. J.B. MacQueen, Some methods for classification and analysis of multivariate observations. Western Management Science Institute University. Technical report, of California Working Paper (1966)

    Google Scholar 

  16. A.K. Panda, B. Kosko, Converting neural networks to rule foam, pp. 519–525 (2019)

    Google Scholar 

  17. W. Rudin, Real and Complex Analysis (McGraw-hill education, 2006)

    Google Scholar 

  18. W. Samek, Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, vol. 11700 (Springer Nature, 2019)

    Google Scholar 

  19. F. Watkins, The representation problem for additive fuzzy systems, in Proceedings of the International Conference on Fuzzy Systems (IEEE FUZZ-95) (pp. 117–122) (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bart Kosko .

Editor information

Editors and Affiliations

7 Appendix : Fuzzy Approximation Theorem for a Combination of Gaussian SAMs

7 Appendix : Fuzzy Approximation Theorem for a Combination of Gaussian SAMs

Theorem 1 extends a version of the fuzzy approximation theorem [10] to the case of combined additive systems. The Gaussian structure of the if-part sets gives insight into the uniform approximation. The approximation may need ab exponentially growing total number of rules mq with respect to the number of dimensions n.

Theorem 1

A combination of q Gaussian SAMs can uniformly approximate a continuous function f over a compact space.

Proof

We combine q SAMs to approximate f(x) with F(x) by mixing their then-part sets. We want to show that the approximation error obeys \(|f(x)-F(x)| < \varepsilon \) for any \(\varepsilon >0\) using a finite number of rules. We use Gaussian if-part sets with set functions \(a_j^k(x) = \exp \{-||x-\hat{x}_j^k||^2/(d_j^k)^2\}\). We assign equal weights, equal volumes, and equal dispersions to all the rules: \(v^k = v\), \(w_j^k = w\), \(V_j^k = V\), and \(d_j^k = d\). This reduces the generalized mixing weights \(p_j(x)\) to softmax activation functions:

$$\begin{aligned} p_j^k(x) = \frac{vwa_j^k(x)V}{\sum _{k=1}^q\sum _{j=1}^m vwa_j^k(x)V}= \frac{\exp \{-||x-\hat{x}_j^k||^2/d^2\}}{\sum _{k=1}^q\sum _{j=1}^m \exp \{-||x-\hat{x}_j^k||^2/d^2\}}. \end{aligned}$$
(16)

Multiply f(x) by \(\sum _{k=1}^q\sum _{j=1}^m p_j^k(x)=1\) and then use (7):

$$\begin{aligned} |f(x) - F(x)| = \bigg |\bigg (\sum _{k=1}^q\sum _{j=1}^m p_j^k(x)\bigg )f(x) - \sum _{k=1}^q\sum _{j=1}^m p_j^k(x) c_j^k\bigg |. \end{aligned}$$
(17)

Rearrange the terms and use the triangle inequality:

$$\begin{aligned} |f(x) - F(x)| \le \sum _{k=1}^q\sum _{j=1}^m p_j^k(x)\big |f(x)-c_j^k\big |=\sum _{k=1}^q\sum _{j=1}^m p_j^k(x)\big |f(x)-f(\hat{x}_j^k)\big | \end{aligned}$$
(18)

because we center the then-part set around \(f(\hat{x}_j^k)\). So \(c_j^k = f(\hat{x}_j^k)\). The target function f is uniformly continuous because it is continuous and X is compact. So for all \(\varepsilon > 0\) there is a \(\delta > 0\) such that \(|f(x) - f(\hat{x}_j^k)| < \varepsilon /2\) if \(||x - \hat{x}_j^k|| < \delta \) . This splits the rules into the two sets \(J_1= \{ (j,k) : ||x - \hat{x}_j^k||<\delta \}\) and \(J_2= \{ (j,k) : ||x - \hat{x}_j^k||\ge \delta \}\). We assume that \(J_1 \ne \emptyset \). Then (18) becomes

$$\begin{aligned} |f(x) - F(x)| \le \sum _{(j,k)\in J_1} p_j^k(x)\big |(f(x)-f(\hat{x}_j^k))\big |+ \sum _{(j,k)\in J_2} p_j^k(x)\big |(f(x)-f(\hat{x}_j^k))\big |. \end{aligned}$$
(19)

Consider the first term on the right side of (19). The absolute differences obey \(\big |(f(x)-f(\hat{x}_j^k))\big | < \varepsilon /2\) because \((j,k) \in J_1\)implies that \(||x - \hat{x}_j^k||<\delta \). Add these terms over all \((j,k) \in J_1\):

$$\begin{aligned} \sum _{(j,k) \in J_1} p_j^k(x)\big |(f(x)-f(\hat{x}_j^k))\big |< \sum _{(j,k)\in J_1} p_j^k(x)\frac{\varepsilon }{2} < \frac{\varepsilon }{2} \end{aligned}$$
(20)

because \(\sum _{(j,k)\in J_1} p_j^k(x) \le \sum _{k=1}^q\sum _{j=1}^m p_j^k(x) = 1\). Now pick a real number C large enough that \(|f(x) - f(\hat{x}_j^k)|<C\) for all j and k. Pick \((j,k) \in J_2\) and pick \((j_1, k_1) \in J_1\). Then \(||x-\hat{x}_j^k||\ge \delta \) and \(||x - \hat{x}_{j_1}^{k_1}||<\delta \). Pick a real number D small enough that \(||x - \hat{x}_j^k||^2 - ||x - \hat{x}_{j_1}^{k_1}||^2 \ge D > 0\). Choose the rule if-part dispersion d small enough that \(\exp (D/d^2)>2mqC/\varepsilon \). Then the definition of D gives

$$\begin{aligned} \frac{\exp {(-||x - \hat{x}_{j_1}^{k_1}||^2/d^2)}}{\exp {(-||x - \hat{x}_j^k||^2/d^2)}} = \exp {\bigg (\frac{||x - \hat{x}_j^k||^2 - ||x - \hat{x}_{j_1}^{k_1}||^2}{d^2}\bigg )}> \frac{2mqC}{\varepsilon }. \end{aligned}$$
(21)

Add \(\sum \limits _{i\ne j_1}\sum \limits _{l\ne k_1}\frac{\exp {(-||x - \hat{x}_i^l||^2/d^2)}}{\exp {(-||x - \hat{x}_j^k||^2/d^2)}}\) to both sides of the above inequality:

$$\begin{aligned} \frac{\sum \limits _{j_1=1}^m\sum \limits _{k_1=1}^q\exp {(-||x - \hat{x}_{j_1}^{k_1}||^2/d^2)}}{\exp {(-||x - \hat{x}_j^k||^2/d^2)}}> \frac{2mqC}{\varepsilon } + \frac{\sum \limits _{i\ne j_1}\sum \limits _{l\ne k_1}\exp {(-||x - \hat{x}_{i}^{l}||^2/d^2)}}{\exp {(-||x - \hat{x}_j^k||^2/d^2)}}>\frac{2mqC}{\varepsilon } \end{aligned}$$
(22)

because \(\sum \limits _{i\ne j_1}\sum \limits _{l\ne k_1}\frac{\exp {(-||x - \hat{x}_i^l||^2/d^2)}}{\exp {(-||x - \hat{x}_j^k||^2/d^2)}}>0\). Equation (16) and the definition of C give

$$\begin{aligned} p_j^k(x)\big |f(x)-f(\hat{x}_j^k)\big |< \frac{\varepsilon }{2mqC}\big |f(x)-f(\hat{x}_j)\big |< \frac{\varepsilon }{2mq}. \end{aligned}$$
(23)

Sum both sides over all \((j,k) \in J_2\):

$$\begin{aligned} \sum _{(j,k)\in J_2} p_j^k(x)|f(x)-f(\hat{x}_j^k)| < \sum _{(j,k)\in J_2} \frac{\varepsilon }{2mq}\le mq\frac{\varepsilon }{2mq} = \frac{\varepsilon }{2} \end{aligned}$$
(24)

because \(J_2\) has fewer than mq elements. Then (20) and (24) add to give the approximation-error bound: \(|f(x) {-} F(x)| \le \varepsilon /2 +\varepsilon /2 <\varepsilon \). So the combined SAM with mq rules uniformly approximates f. \(\square \)

The proof gives insight into a foam’s rule explosion. The proof assumes that \(J_1 \ne \emptyset \). So for all \(x \in X\) there is a \(\hat{x}_j^k \) such that \(||x - \hat{x}_j||<\delta \). Let \(N_{\delta }(\hat{x}_j^k) = \{ x : ||x - \hat{x}_j^k||<\delta \}\) denote the \(\delta \)-neighborhood around \(\hat{x}_j^k\). Then \(\forall \; x \in X :\; \exists \; \hat{x}_j^k : x \in N_{\delta }(\hat{x}_j^k)\). This gives

$$\begin{aligned} X\subseteq \bigcup \limits ^q_{k=1}\bigcup \limits ^m_{j=1} N_{\delta }(\hat{x}_j^k). \end{aligned}$$
(25)

X is bounded because X is compact by hypothesis. Let an n-dimensional sphere of radius R contain X. The volume of this sphere is \(cR^n\) for some constant c. Each \(N_{\delta }(\hat{x}_j^k)\)’s is an n-dimensional sphere with radius \(\delta \) and volume \(c\delta ^n\). Use the \(\delta \)-neighborhoods to cover the R-sphere that contains X. The \(N_{\delta }(\hat{x}_j^k)\)’s are not disjoint. So the volume of the R-sphere is less than the volume of the mq \(N_{\delta }(\hat{x}_j^k)\)’s combined: \(cR^n \le mqc\delta ^n\). Then \( mq \ge (R/\delta )^n\). This inequality shows that the required total number of rules grows exponentially with the dimension n.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Panda, A.K., Kosko, B. (2022). Random Fuzzy-Rule Foams for Explainable AI. In: Bede, B., Ceberio, M., De Cock, M., Kreinovich, V. (eds) Fuzzy Information Processing 2020. NAFIPS 2020. Advances in Intelligent Systems and Computing, vol 1337. Springer, Cham. https://doi.org/10.1007/978-3-030-81561-5_21

Download citation

Publish with us

Policies and ethics