[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Next Article in Journal
A Dynamic Bayesian Network Structure for Joint Diagnostics and Prognostics of Complex Engineering Systems
Previous Article in Journal
A Review of Lithium-Ion Battery Fault Diagnostic Algorithms: Current Progress and Future Challenges
Previous Article in Special Issue
Top Position Sensitive Ordinal Relation Preserving Bitwise Weight for Image Retrieval
You seem to have javascript disabled. Please note that many of the page functionalities won't work as expected without javascript enabled.
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On a Hybridization of Deep Learning and Rough Set Based Granular Computing

Faculty of Mathematics and Computer Science, University of Warmia and Mazury in Olsztyn, 10-710 Olsztyn, Poland
*
Author to whom correspondence should be addressed.
Algorithms 2020, 13(3), 63; https://doi.org/10.3390/a13030063
Submission received: 20 February 2020 / Revised: 6 March 2020 / Accepted: 7 March 2020 / Published: 11 March 2020
(This article belongs to the Special Issue Algorithms for Pattern Recognition)
Figure 1
<p>The diagram shows a scheme of our experimental part. The exact design of the neural network is in <a href="#algorithms-13-00063-f003" class="html-fig">Figure 3</a>. The data that is fed into the neural network is normalized after granulation to a range of <math display="inline"><semantics> <mrow> <mo>&lt;</mo> <mn>0</mn> <mo>,</mo> <mn>1</mn> <mo>&gt;</mo> </mrow> </semantics></math>.</p> ">
Figure 2
<p>A diagram of the neural network used to learn the Australian credit system. Neural networks for the other two systems differ only in the number of inputs determined by the number of conditional attributes.</p> ">
Figure 3
<p>Results for 10 learning cycles, using 10 splits; for Australian credit data set; In ‘percentage of objects’ ax, we have the percentage size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.</p> ">
Figure 4
<p>Mean result for 10 learning cycles, using 10 splits; for Australian credit data set; The only way to show the average values from the experiments was to calculate the average accuracy for specific granulation radii. Hence, on the x-axis we have the granulation radii (approximation levels). The figure shows the result from <a href="#algorithms-13-00063-t005" class="html-table">Table 5</a>.</p> ">
Figure 5
<p>Results for 10 learning cycles, using 10 splits; for Heart Disease data set; In ‘percentage of objects’ ax, we have the percentage size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.</p> ">
Figure 6
<p>Mean result for 10 learning cycles, using 10 splits; for Heart disease data set; Mean result for 10 learning cycles, using 10 splits; for Australian credit data set; The only way to show the average values from the experiments was to calculate the average accuracy for specific granulation radii. Hence, on the x-axis we have the granulation radii (approximation levels). The figure shows the result from <a href="#algorithms-13-00063-t006" class="html-table">Table 6</a>.</p> ">
Figure 7
<p>Results for 10 learning cycles, using 10 splits; for Pima Indians Diabetes data set; In ‘percentage of objects’ ax, we have the percentege size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.</p> ">
Figure 8
<p>Mean results for 10 learning cycles, using 10 splits; for Pima Indians Diabetes data set; Mean result for 10 learning cycles, using 10 splits; for Australian credit data set; The only way to show the average values from the experiments was to calculate the average accuracy for specific granulation radii. Hence, on the x-axis we have the granulation radii (approximation levels). The figure shows the result from <a href="#algorithms-13-00063-t007" class="html-table">Table 7</a>.</p> ">
Figure 9
<p>Results for 10 learning cycles, using 10 splits; for Australian credit data set converted to dummy variables (after conversion to dummy variables its 35 attributes); In ‘percentage of objects’ ax, we have the percentage size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.</p> ">
Figure 10
<p>Dummy variables - mean result for 10 learning cycles, using 10 splits; for Australian credit data set; after conversion to Dummy variables its 35 attributes.</p> ">
Versions Notes

Abstract

:
The set of heuristics constituting the methods of deep learning has proved very efficient in complex problems of artificial intelligence such as pattern recognition, speech recognition, etc., solving them with better accuracy than previously applied methods. Our aim in this work has been to integrate the concept of the rough set to the repository of tools applied in deep learning in the form of rough mereological granular computing. In our previous research we have presented the high efficiency of our decision system approximation techniques (creating granular reflections of systems), which, with a large reduction in the size of the training systems, maintained the internal knowledge of the original data. The current research has led us to the question whether granular reflections of decision systems can be effectively learned by neural networks and whether the deep learning will be able to extract the knowledge from the approximated decision systems. Our results show that granulated datasets perform well when mined by deep learning tools. We have performed exemplary experiments using data from the UCI repository—Pytorch and Tensorflow libraries were used for building neural network and classification process. It turns out that deep learning method works effectively based on reduced training sets. Approximation of decision systems before neural networks learning can be important step to give the opportunity to learn in reasonable time.

1. Introduction

This paper is divided into parts dedicated to deep learning, rough sets, granular computing by means of rough mereology. Deep learning as a collection of techniques and is rooted in artificial neural networks (ANN’s) [1,2,3]. The idea of a neural network is that of an acyclic directed graph whose nodes are computing units—neurons—joined by edges labelled with weights. Nodes with input degrees of zero are called inputs while nodes with the output degree zero are said to be outputs. The flow of information is forward: from input nodes to output nodes. Nodes are classified into layers, each layer defined recurrently starting from the input nodes layer. Exemplary Computation by a neural net begins with the input vector and in the simple case of a sigmoidal perceptron, with the input x, the output is given as f ( x ) , f being an activation sigmoidal function. The result of the computation is the vector output by the output layer of neurons.
The learning procedure for ANN’s is a series of computations on sequences of training vectors x i which stops when the output vector is sufficiently close to the target vector on each input vector. Theoretically justified by the Perceptron Learning Theorem [4], the method of learning by changing weights by the delta rule turned effective when the backpropagation technique came into usage [5]. Deep learning proceeds further by enhancing the neural net with many filters allowing for exhibiting of many local features. Some variants like LSTM allow for reaching deep back into memory of the process which makes such networks especially effective in, for example, speech processing [6]. For a general introduction please consult [7].
Interesting research on the field of granular computation with the use of neural network techniques can be found in the works [8,9,10]. To the best of our knowledge, there is no similarity to our research in this context, so direct comparison is difficult. In addition, the aim of our work is to check whether the data prepared by our granulation techniques are learned through deep neural networks. It is not our goal to show that we have the best technique to reduce training systems.
Rough set theory [11] approaches data in set-theoretical terms by assuming that on each collection of vectors representing some objects, a partition is obtained, its classes representing distinct concepts/categories pertaining to those objects. A general concept, i.e., a set of objects in a given collection (the universe) is perceived through categories: some concepts can be expressed by categories in a deterministic way and some may not. The former are exact concepts (modulo the given partition into categories) while the latter are inexact (rough) concepts. Each rough concept can only be expressed in terms of its relation to categories by approximations: the lower approximation of a concept consists of categories (or, exact concepts) contained in the concept whereas the upper approximation consists of categories intersecting the given concept.
A means for dealing with data is provided by the notion of an information system (see Pawlak, op.cit.) which is a tuple ( U , A , V , f ) where U is a universe of objects, A is a set of attributes, V is a set of attribute values, and, f is a mapping which assigns to each object x U and each attribute a A , the value f ( x , a ) V . Categories obtained in this case are classes of the indiscernibility relation I N D B ( x , y ) = t r u e i f a n d o n l y i f a ( x ) = a ( y ) for each a B , where B is a subset of the set A of attributes. A special case of information systems is a decision system with signature ( U , A , V , f , d ) where d is a new attribute not in A, called the decision. A relation between sets I N D B and I N D d for some B is called a decision algorithm over B. For algorithmic methods of inducing decision rules please see [12,13]. A far reaching extension of rough set theory is rough mereology [14]. Rough mereology applies as its primitive notion that of a part to a degree [15]. Parts to a degree are subjected to a few basic restrictions which reflect properties of partial containment: each object is a part to itself to the degree of 1, if an object x is a part to a degree of 1 to an object y then for each object z, the degree to which z is contained in x is not greater than the degree to which z is contained in y. Rough mereology in turn was applied in a formal definition of granules of knowledge [16,17]. Formally, given a measure m of partial containment (called in Polkowski-Skowron a rough inclusion), a granule g ( x , r ) of the radius r about an object x is the collection of all objects which are parts of x to degrees of at least r. Consult [15] for a deeper discussion of computing with granules. On the basis of computing with rough mereological granules, an approach to data mining was proposed [16]. This approach consists in transforming a given decision system (data set) ( U , A , V , f , d ) into a granular decision system ( G , A , V , f , d , r ) where: G is a set of granules of radius r about objects in U which provides a covering of U; A is a set of attributes a for a A , each a maps each granule g into the value set V according to the formula a ( g ) = S ( a ( u ) : u g ) where S is a selected strategy like e.g., majority voting with random tie resolution; V is the value set unchanged; f ( a , g ) = a ( g ) ; d is defined in the same manner as a . To the granular decision system ( U , A , V , f , r ) any standard algorithm for rule induction can be applied for all plausible values of the radius r. This ends our introduction of the main ingredients in our approach. In the following sections, we give details of our approach and we present results.
In the work we are focusing on deep learning effectiveness on the reduced decision systems, we check the level of internal knowledge maintenance in terms of classification effectiveness.
The rest of the paper has the following content. In Section 2 there is a detailed description of granulation technique used in experimental part. In Section 3 we have described the artificial network architecture. In Section 4 we present the experimental part. We conclude our work in Section 5.

2. Reducing the Size of Decision-Making Systems Based on Their Granular Reflections

As a reference technique, we have chosen one of our best methods for the approximation of decision systems (concept-dependent granulation), which works analogously to the baseline procedure described in this section, while granule formation takes place separately within decision classes.
Granulation consists in reducing the size of the training decision-making system through the process of creating granular reflections of data.
The definition of the concept-dependent granule formulation is in the Section 2.2.
Let’s move on to the basic technique. Our methods are based on rough inclusions. Introduction to rough inclusions in the framework of rough mereology is available in [16,18]; a detailed, extensive discussion can be found in [15].
In the Polkowski’s granulation procedure, we can distinguish three basic steps.
  • First step: granulation. We begin with computation of granules around each training object using selected method. In the method used in this article, by surrounding the objects of the training system class with objects indiscernible to the degree determined by the granulation radius.
  • Second step: the process of covering. The training decision system is covered by selected granules. After the calculation of granules in point 1, a group of granules that cover the entire training system with their objects is searched for.
  • Third step: building the granular reflections. The granular reflection of original training decision system is derived from the granules selected in step 2. We form new objects by converting granules using majority voting.
We start with detailed description of the basic method—see [16].

2.1. Standard Granulation

For the sake of simplicity we use the following definition of decision system, it is triple ( U , A , d ) , where U is the universe of objects, A the set of conditional attributes, d A is the decision attribute, and r g r a n granulation radius from the set { 0 , 1 | A | , 2 | A | , , 1 }.
The standard rough inclusion μ , for u , v U and for selected r g r a n is defined as
μ ( v , u , r g r a n ) | I N D ( u , v ) | | A | r g r a n
where
I N D ( u , v ) = { a A : a ( u ) = a ( v ) } ,
For each object u U , and selected r g r a n , we compute the standard granule g r g r a n ( u ) as follows,
g r g r a n ( u ) is { v U : μ ( v , u , r g r a n ) } .
In the next step we use selected strategy to cover the training decision set U by computed granules—the random choice is the simplest among the most effective studied in [19]). All studied methods are available in [19] (pp. 105–220).
In the last step, granular reflection of training decision set is computed with use of Majority Voting procedure. The ties are resolved randomly.
The process of granulation can be tuned with help of the triangular part of granular indiscernibility matrix [ c i j ] ( i , j = 1 ) | U | , where
c i j = 1 , if | I N D ( u i , u j ) | | A | r g r a n 0 , else

2.2. Concept Dependent Granulation

A concept-dependent (cd) granule g r g r a n c d ( u ) of the radius r g r a n about u is defined as follows:
v g r g r a n c d ( u ) if and only if μ ( v , u , r g r a n ) and ( d ( u ) = d ( v ) )

2.2.1. Toy Example of Concept Dependent Granulation

For the decision system from Table 1, we have found concept-dependent granules.
For the granulation radius r g r a n = 0.25 , the granular concept-dependent indiscernibility matrix (gcdm) is shown in Table 2.
Hence, the granules in this case are
g 0.25 c d ( u 1 ) = { u 1 } ;
g 0.25 c d ( u 2 ) = { u 2 } ;
g 0.25 c d ( u 3 ) = { u 3 } ;
g 0.25 c d ( u 4 ) = { u 4 } ;
g 0.25 c d ( u 5 ) = { u 5 , u 6 } ;
g 0.25 c d ( u 6 ) = { u 5 , u 6 } .
Considering the random choice, the covering can be { g 0.25 c d ( u 1 ) , g 0.25 c d ( u 2 ) , g 0.25 c d ( u 3 ) , g 0.25 c d ( u 4 ) , g 0.25 c d ( u 6 ) } .
The concept dependent granular decision system formed from coverage is in Table 3.
The majority voting was applied only into the granule g 0.25 c d ( u 6 ) .

3. Design of the Experimental Part

A general scheme of the experimental part design is shown in in Figure 1. The neural network architecture was chosen experimentally. For each tabular dataset used we have run our experiment using the same network architecture to make it more comparable. We conducted 10 series of tests for each system tested. Since the data sets used are small in size, we have built a network with simple architecture. In addition, we have selected sets that have two decision classes at the output. In subsequent experiments, the diversity of sets will be increased.
Our network consists of an input layer, two hidden linear layers and an output layer. Input layer is the only layer which size is dynamical and depends on the dataset used. First hidden layer consists of 30 neurons and the second one of 20 neurons. Output layer consists of only two neurons as the decision classes are binary in all used datasets.
We used a hyperbolic tangent as an activation function in layer 1 and 2 and a softmax function in layer number 3. Our network is using an Adam optimizer and a learning rate equal to 0.001 . To calculate the value of the loss function Cross Entropy was used. Each iteration is being performed across 500 epochs.

4. Procedure for Performed Experiments

General scheme of the test carried out and detail neural network architecture we have on the Figure 1 and Figure 2. Let us present below the combined procedure of our experiments.The procedure of experimentation:
1. 
Data input (original decision system),
2. 
Data random split in the ratio 70-30 per cent TRN-TST,
3. 
Granulation step, covering step, new objects generation—see Section 2.2.1,
4. 
Neural network learning step for each set of objects (for each granulation radius) see Figure 1,
5. 
Classification step for each test set, based on approximated data see Figure 2,
6. 
Compute accuracy, time, number of objects and compare vs first set. The whole procedure is repeated 10 times.

The Results of Experiments

In this section we show the exemplary results for our selected technique, to show the effectiveness of deep learning in classification based on reduced training data. We have the results for Monte Cross Validation 10 method for selected data sets (see Table 4) from UCI repository [20].
The internal knowledge from the original training decision systems—measured by ability for classification—seems to be preserved in sufficient way (the accuracy of classification is comparable with nil case, without reduction). The nil case is for radius 1.
The results of the experiments showed the usefulness of learning neural networks on granular data.
Additionally to accuracy of classification vs the percentage of training size (the size after granulation) for 10 iteration of learning (see Figure 3, Figure 5, Figure 7 and Figure 9)—we have presented average results from 10 experiments considering as x ax the radii of granulation (see Figure 4, Figure 6, Figure 8 and Figure 10). The latter result visualizes the classification accuracy results presented in the Table 5, Table 6, Table 7 and Table 8.
When considering the results for Heart Disease data set (see Figure 5 and Figure 6 and Table 6), for a radius of 0.756 , with a reduction in the number of training objects of up to 67 percent, we get an accuracy of 0.756 compared to the original system of 0.825 . For a radius of 0.643 , with a reduction of nearly 42 percent, we get an accuracy of 0.781 , while for a radius of 0.714 , where granulation reduces 16 percent of the objects, we get a 0.823 .
In the case of results for the Australian Credit system (see Figure 3 and Figure 4 and Table 5), within a radius of 0.6 we get a reduction of about 70 percent and a classification accuracy of 0.813 compared to 856 percent on the original system. In the case of 0.667 radius, with a reduction of 41 per cent, we get an accuracy of 0.84 . In case of radius 0.733 , with reduction in training system size in the range od 14 percent, we have accuracy 0.853 .
In our next experiment for the Pima Indians Diabetes system (see results in Figure 7 and Figure 8 and Table 7), for a radius of 0.333 and a reduction in the number of objects of about 73 percent, we get an accuracy of 0.7 compared to 0.772 on non-granular data. In case of radius of 0.444 and a reduction in the number of objects of about 38 percent, we get an accuracy of 0.756 and finally for radius 0.55 with 11 percent reduction we obtained accuracy 0.772 .
As an additional result, we added the learning effect of on the Australian credit data set after the conversion of its symbolic attributes to Dummy variables. From the results of Figure 9 and Figure 10 and Table 8 we see that the classification after the conversion is comparable. And exemplary result for radius 0.775 , with 83 percent reduction in training set size, accuracy is equal 0.789 . In case of radius 0.885 , accuracy is 0.817 with 55 percent reduction. For radius 0.85 , with 31 percent reduction, we have reached accuracy 0.832 . For radius equal 0.875 with 11 percent reduction in training size, accuracy is 0.839 . In the nil case for dummy variant accuracy is equal 0.839 .
Despite the fact that the classification results are not the best among the techniques we have previously applied to granular data (among others Naive Bayes classifier [19]), SVM [21], Rough set based classifiers [19]), we are pleased that neural networks are able to maintain high classification efficiency by working on granular data. We treat the results as a trailer for future intensive research on the application of granular computing techniques in the context of learning neural networks.
The Table 5, Table 6, Table 7 and Table 8 present more information about our experiments. As an explanation please refer to the list of column labels and their explanation:
  • gran_rad—granulation radius as a percentage value,
  • no_of_gran_objects—number of new objects in tested decision system after the granulation process,
  • percentage_of_objects—percentage of objects in tested decision system comparing to the primary decision system size,
  • time_to_learn—time that was needed to complete the learning process using given data,
  • accuracy—classification accuracy for given neural network.

5. Conclusions

This paper contains results that show how usage of granular reflections of decision systems can be used in deep learning. For experimental purpose we have selected the most effective method among the studied concept dependent variant and performed learning on selected data from the UCI Repository based on the tensorflow library. It turned out that the designed neural network works on approximated data in effective way, when measured in classification accuracy. Patterns contained in the granulated data seem to be preserved in the neural network structures. Experiments from our work have shown that our approximation techniques for tabular decision making systems can be an effective pre-processing step before learning with deep neural networks. Reduced data, while retaining internal knowledge, gives the opportunity for faster learning of networks. In future works we are planning to check the set of neural networks architectures to use with our approximation methods. We are considering the use of granular structures in the convolutionary part of the preparation of data for learning by means of neural networks.

Author Contributions

Conceptualization, K.R. and P.A.; Methodology, K.R. and P.A.; Software, K.R. and P.A.; Validation, K.R. and P.A.; Formal analysis, K.R. and P.A.; Investigation, K.R. and P.A.; Resources, K.R. and P.A.; Data curation, K.R. and P.A.; Writing—original draft preparation, K.R. and P.A.; Writing—review and editing, K.R. and P.A.; Visualization, K.R. and P.A.; Supervision, K.R. and P.A.; Project administration, K.R. and P.A.; Funding acquisition, K.R. and P.A. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been fully supported by the grant from Ministry of Science and Higher Education of the Republic of Poland under the project number 23.610.007-000.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Haykin, S.S. Neural Networks: A Comprehensive Foundation; Prentice Hall: Upper Saddle River, NJ, USA, 1999; ISBN 978-0-13-273350-2. [Google Scholar]
  2. Połap, D.; Woźniak, M.; Wei, W.; Damaševičius, R. Multi-threaded learning control mechanism for neural networks. Future Gener. Comput. Syst. 2018, 87, 16–34. [Google Scholar] [CrossRef]
  3. Woźniak, M.; Połap, D. Intelligent Home Systems for Ubiquitous User Support by Using Neural Networks and Rule-Based Approach. IEEE Trans. Ind. Inform. 2020, 16, 2651–2658. [Google Scholar] [CrossRef]
  4. Novikoff, A.B. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, New York, NY, USA, 24–26 April 1962; Polytechnic Institute of Brooklyn: Brooklyn, NY, USA, 1962; Volume 12, pp. 615–622. [Google Scholar]
  5. Bryson, A.E.; Ho, Y.-C. Applied Optimal Control: Optimization, Estimation, and Control; Blaisdell Publishing Company: Waltham, MA, USA; Xerox College Publishing: Lexington, MA, USA, 1969; p. 481. [Google Scholar]
  6. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  7. Nielsen, M. Neural Networks and Deep Learning; Determination Press: San Francisco, CA, USA, 2015. [Google Scholar]
  8. Dick, S.; Kandel, A. Granular Computing in Neural Networks. In Granular Computing. Studies in Fuzziness and Soft Computing; Pedrycz, W., Ed.; Physica: Heidelberg, Germany, 2001; Volume 70. [Google Scholar]
  9. Leng, J.; Chen, Q.; Mao, N.; Jiang, P. Combining granular computing technique with deep learning for service planning under social manufacturing contexts. Knowl.-Based Syst. 2018, 143, 295–306, ISSN 0950-7051. [Google Scholar] [CrossRef]
  10. Ghiasi, B.; Sheikhian, H.; Zeynolabedin, A.; Niksokhan, M.H. Granular computing-neural network model for prediction of longitudinal dispersion coefficients in rivers. Water Sci. Technol. 2020. [Google Scholar] [CrossRef] [PubMed]
  11. Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning about Data; Kluwer: Alphen aan den Rijn, The Netherlands, 1991. [Google Scholar]
  12. Skowron, A.; Rauszer, C. The discernibility matrices and functions in information systems. In Intelligent Decision Support. Handbook of Applications and Advances of Rough Set Theory; Słowiński, R., Ed.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1992; pp. 331–362. [Google Scholar]
  13. Pawlak, Z.; Skowron, A. A rough set approach for decision rules generation. In Proceedings of the IJCAI’93 Workshop W12: The Management of Uncertainty in AI, Chambery Savoie, France, 30 August 1993; ICSResearch Report 23/93. Warsaw University of Technology: Warsaw, Poland, 1993. [Google Scholar]
  14. Polkowski, L.; Skowron, A. Rough mereology. In Proceedings of the ISMIS’94, Charlotte, NC, USA, 16–19 October 1994. LNCS 867. [Google Scholar]
  15. Polkowski, L. Approximate Reasoning by Parts. An Introduction to Rough Mereology; Springer: Berlin, Germany, 2011. [Google Scholar]
  16. Polkowski, L. A model of granular computing with applications. In Proceedings of the 2006 IEEE International Conference on Granular Computing, Atlanta, GA, USA, 10–12 May 2006. [Google Scholar]
  17. Polkowski, L. A unified approach to granulation of knowledge and granular computing based on rough mereology: A survey. In Handbook of Granular Computing; John Wiley and Sons: New York, NY, USA, 2008; pp. 375–401. [Google Scholar]
  18. Polkowski, L. Formal granular calculi based on rough inclusions. In Proceedings of the IEEE 2005 Conference on Granular Computing GrC05, Beijing, China, 25–27 July 2005; IEEE Press: New York, NY, USA; pp. 57–62. [Google Scholar]
  19. Polkowski, L.; Artiemjew, P. Granular Computing in Decision Approximation—An Application of Rough Mereology. In Intelligent Systems Reference Library 77; Springer: Berlin/Heidelberg, Germany, 2015; pp. 1–422. ISBN 978-3-319-12879-5. [Google Scholar]
  20. University of California. Irvine Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 5 March 2020).
  21. Szypulski, J.; Artiemjew, P. The Rough Granular Approach to Classifier Synthesis by Means of SVM. In Proceedings of the International Joint Conference on Rough Sets, IJCRS’15, Tianjin, China, 20–23 November 2015; Lecture Notes in Computer Science (LNCS). Springer: Heidelberg, Germany, 2015; pp. 256–263. [Google Scholar]
Figure 1. The diagram shows a scheme of our experimental part. The exact design of the neural network is in Figure 3. The data that is fed into the neural network is normalized after granulation to a range of < 0 , 1 > .
Figure 1. The diagram shows a scheme of our experimental part. The exact design of the neural network is in Figure 3. The data that is fed into the neural network is normalized after granulation to a range of < 0 , 1 > .
Algorithms 13 00063 g001
Figure 2. A diagram of the neural network used to learn the Australian credit system. Neural networks for the other two systems differ only in the number of inputs determined by the number of conditional attributes.
Figure 2. A diagram of the neural network used to learn the Australian credit system. Neural networks for the other two systems differ only in the number of inputs determined by the number of conditional attributes.
Algorithms 13 00063 g002
Figure 3. Results for 10 learning cycles, using 10 splits; for Australian credit data set; In ‘percentage of objects’ ax, we have the percentage size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.
Figure 3. Results for 10 learning cycles, using 10 splits; for Australian credit data set; In ‘percentage of objects’ ax, we have the percentage size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.
Algorithms 13 00063 g003
Figure 4. Mean result for 10 learning cycles, using 10 splits; for Australian credit data set; The only way to show the average values from the experiments was to calculate the average accuracy for specific granulation radii. Hence, on the x-axis we have the granulation radii (approximation levels). The figure shows the result from Table 5.
Figure 4. Mean result for 10 learning cycles, using 10 splits; for Australian credit data set; The only way to show the average values from the experiments was to calculate the average accuracy for specific granulation radii. Hence, on the x-axis we have the granulation radii (approximation levels). The figure shows the result from Table 5.
Algorithms 13 00063 g004
Figure 5. Results for 10 learning cycles, using 10 splits; for Heart Disease data set; In ‘percentage of objects’ ax, we have the percentage size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.
Figure 5. Results for 10 learning cycles, using 10 splits; for Heart Disease data set; In ‘percentage of objects’ ax, we have the percentage size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.
Algorithms 13 00063 g005
Figure 6. Mean result for 10 learning cycles, using 10 splits; for Heart disease data set; Mean result for 10 learning cycles, using 10 splits; for Australian credit data set; The only way to show the average values from the experiments was to calculate the average accuracy for specific granulation radii. Hence, on the x-axis we have the granulation radii (approximation levels). The figure shows the result from Table 6.
Figure 6. Mean result for 10 learning cycles, using 10 splits; for Heart disease data set; Mean result for 10 learning cycles, using 10 splits; for Australian credit data set; The only way to show the average values from the experiments was to calculate the average accuracy for specific granulation radii. Hence, on the x-axis we have the granulation radii (approximation levels). The figure shows the result from Table 6.
Algorithms 13 00063 g006
Figure 7. Results for 10 learning cycles, using 10 splits; for Pima Indians Diabetes data set; In ‘percentage of objects’ ax, we have the percentege size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.
Figure 7. Results for 10 learning cycles, using 10 splits; for Pima Indians Diabetes data set; In ‘percentage of objects’ ax, we have the percentege size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.
Algorithms 13 00063 g007
Figure 8. Mean results for 10 learning cycles, using 10 splits; for Pima Indians Diabetes data set; Mean result for 10 learning cycles, using 10 splits; for Australian credit data set; The only way to show the average values from the experiments was to calculate the average accuracy for specific granulation radii. Hence, on the x-axis we have the granulation radii (approximation levels). The figure shows the result from Table 7.
Figure 8. Mean results for 10 learning cycles, using 10 splits; for Pima Indians Diabetes data set; Mean result for 10 learning cycles, using 10 splits; for Australian credit data set; The only way to show the average values from the experiments was to calculate the average accuracy for specific granulation radii. Hence, on the x-axis we have the granulation radii (approximation levels). The figure shows the result from Table 7.
Algorithms 13 00063 g008
Figure 9. Results for 10 learning cycles, using 10 splits; for Australian credit data set converted to dummy variables (after conversion to dummy variables its 35 attributes); In ‘percentage of objects’ ax, we have the percentage size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.
Figure 9. Results for 10 learning cycles, using 10 splits; for Australian credit data set converted to dummy variables (after conversion to dummy variables its 35 attributes); In ‘percentage of objects’ ax, we have the percentage size of granulated data vs accuracy of classification in ‘Accuracy’ ax; in. The results are not perfectly evenly matched or at the same points on the x-axis, due to the fact that the size reduction levels of the training systems varied.
Algorithms 13 00063 g009
Figure 10. Dummy variables - mean result for 10 learning cycles, using 10 splits; for Australian credit data set; after conversion to Dummy variables its 35 attributes.
Figure 10. Dummy variables - mean result for 10 learning cycles, using 10 splits; for Australian credit data set; after conversion to Dummy variables its 35 attributes.
Algorithms 13 00063 g010
Table 1. The training decision system ( U , A , d ) .
Table 1. The training decision system ( U , A , d ) .
a 1 a 2 a 3 a 4 d
u 1 21211
u 2 32331
u 3 15121
u 4 62382
u 5 45862
u 6 51812
Table 2. gcdm ( 1 w h e n u i i s i n d i s c e r n i b l e b y a d e g r e e r g r a n = 0.25 f r o m u j , 0 o t h e r w i s e , d ( u i ) i s e q u a l d ( u j ) . ) .
Table 2. gcdm ( 1 w h e n u i i s i n d i s c e r n i b l e b y a d e g r e e r g r a n = 0.25 f r o m u j , 0 o t h e r w i s e , d ( u i ) i s e q u a l d ( u j ) . ) .
u 1 u 2 u 3 u 4 u 5 u 6
u 1 100xxx
u 2 010xxx
u 3 001xxx
u 4 xxx100
u 5 xxx011
u 6 xxx011
Table 3. Concept dependent granular decision system for ( U , A , d ) and radius 0.25.
Table 3. Concept dependent granular decision system for ( U , A , d ) and radius 0.25.
a 1 a 2 a 3 a 4 d
g 0.25 c d ( u 1 ) 21211
g 0.25 c d ( u 2 ) 32331
g 0.25 c d ( u 3 ) 15121
g 0.25 c d ( u 4 ) 62382
g 0.25 c d ( u 6 ) 55862
Table 4. Exemplary decision systems from UCI Machine Learning Repository [20]. We have chosen binary systems.
Table 4. Exemplary decision systems from UCI Machine Learning Repository [20]. We have chosen binary systems.
name attr no . obj no .
A u s t r a l i a n - c r e d i t 15690
D i a b e t e s 9768
H e a r t d i s e a s e 14270
Table 5. Results for Australian credit dataset (mean from 10 experiments).
Table 5. Results for Australian credit dataset (mean from 10 experiments).
no_of_gran_objectspercentage_of_objectstime_to_learnaccuracy
MeanMeanMeanMean
gran_rad
0.06672.00.41490.36660.5646
0.13332.00.41490.36070.5337
0.20003.40.70540.36910.5423
0.26675.11.05810.36850.5154
0.33338.21.70120.36960.5192
0.400016.03.31950.37780.5577
0.466731.66.55600.37770.6236
0.533365.313.54770.39160.7764
0.6000145.330.14520.42870.8125
0.6667283.858.87970.74640.8399
0.7333412.985.66390.82100.8534
0.8000468.897.26140.85850.8587
0.8667477.999.14940.85320.8553
0.9333479.399.43980.88170.8553
1.0000482.0100.00000.89950.8562
Table 6. Results for Heart disease dataset (mean from 10 experiments).
Table 6. Results for Heart disease dataset (mean from 10 experiments).
no_of_gran_objectspercentage_of_objectstime_to_learnaccuracy
MeanMeanMeanMean
gran_rad
0.07142.00.94340.37020.6801
0.14292.31.08490.37860.6505
0.21432.81.32080.37080.6231
0.28574.52.12260.37580.7132
0.35718.84.15090.39280.7110
0.428617.08.01890.40640.7187
0.500035.716.83960.42090.7198
0.571470.433.20750.42990.7560
0.6429122.757.87740.45710.7813
0.7143177.983.91510.48720.8231
0.7857204.696.50940.49420.8297
0.8571211.499.71700.49950.8220
0.9286211.499.71700.50230.8209
1.0000211.499.71700.50100.8253
Table 7. Results for Pima Indians Diabetes dataset (mean from 10 experiments).
Table 7. Results for Pima Indians Diabetes dataset (mean from 10 experiments).
no_of_gran_objectspercentage_of_objectstime_to_learnaccuracy
MeanMeanMeanMean
gran_rad
0.11112.00.37240.36790.5392
0.222232.66.07080.45440.5584
0.3333145.327.05770.48990.7009
0.4444331.061.63870.78950.7563
0.5556477.888.97580.84570.7723
0.6667533.099.25510.86430.7714
0.7778537.0100.00000.88820.7684
0.8889537.0100.00000.93130.7671
1.0000537.0100.00000.94170.7680
Table 8. Dummy variables—results for Australian credit dataset (mean from 10 experiments; after conversion to Dummy variables its 35 attributes).
Table 8. Dummy variables—results for Australian credit dataset (mean from 10 experiments; after conversion to Dummy variables its 35 attributes).
no_of_gran_objectspercentage_of_objectstime_to_learnaccuracy
MeanMeanMeanMean
gran_rad
0.0252.00.41490.35980.5534
0.0502.00.41490.42740.5647
0.0752.00.41490.43140.4836
0.1002.00.41490.43730.4744
0.1252.00.41490.43910.5536
0.1502.00.41490.43270.5841
0.1752.00.41490.43050.5778
0.2002.00.41490.43490.4928
0.2252.00.41490.44040.4826
0.2502.00.41490.43480.5048
0.2752.00.41490.43420.5082
0.3002.00.41490.45860.5261
0.3252.00.41490.44940.5652
0.3502.00.41490.44760.5130
0.3752.00.41490.43690.4797
0.4002.00.41490.46080.5329
0.4252.00.41490.44440.5256
0.4502.00.41490.44430.5179
0.4752.00.41490.45160.5135
0.5002.00.41490.44360.5855
0.5252.00.41490.44170.5034
0.5502.20.45640.45170.4638
0.5752.90.60170.44310.5063
0.6003.70.76760.44760.5546
0.6255.21.07880.45710.4947
0.6508.71.80500.45060.5594
0.67512.12.51040.47540.5237
0.70018.03.73440.47820.6222
0.72529.46.09960.50290.6961
0.75049.910.35270.50870.7140
0.77580.416.68050.52280.7889
0.800134.827.96680.75770.7903
0.825214.344.46060.82520.8169
0.850331.368.73440.87190.8319
0.875427.388.65150.92570.8386
0.900470.797.65560.95050.8319
0.925478.299.21160.96650.8309
0.950479.599.48130.97290.8377
0.975482.0100.00000.96880.8329
1.000482.0100.00000.96970.8386

Share and Cite

MDPI and ACS Style

Ropiak, K.; Artiemjew, P. On a Hybridization of Deep Learning and Rough Set Based Granular Computing. Algorithms 2020, 13, 63. https://doi.org/10.3390/a13030063

AMA Style

Ropiak K, Artiemjew P. On a Hybridization of Deep Learning and Rough Set Based Granular Computing. Algorithms. 2020; 13(3):63. https://doi.org/10.3390/a13030063

Chicago/Turabian Style

Ropiak, Krzysztof, and Piotr Artiemjew. 2020. "On a Hybridization of Deep Learning and Rough Set Based Granular Computing" Algorithms 13, no. 3: 63. https://doi.org/10.3390/a13030063

APA Style

Ropiak, K., & Artiemjew, P. (2020). On a Hybridization of Deep Learning and Rough Set Based Granular Computing. Algorithms, 13(3), 63. https://doi.org/10.3390/a13030063

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop