Abstract
Significant advances have been made in automatically constructing knowledge bases of relational facts derived from web corpora. These relational facts are linguistic in nature and are represented as ordered pairs of nouns (Winnipeg, Canada) belonging to a category (City_Country). One major problem is that these facts are abundant but mostly unlabeled. Hence, semi-supervised learning approaches have been successful in building knowledge bases where a small number of labeled examples are used as seed (training) instances and a large number of unlabeled instances are learnt in an iterative fashion. In this paper, we propose a novel fuzzy rough set-based semi-supervised learning algorithm (FRL) for categorizing relational facts derived from a given corpus. The proposed FRL algorithm is compared with a tolerance rough set-based learner (TPL) and the coupled pattern learner (CPL). The same ontology derived from a subset of corpus from never ending language learner system was used in all of the experiments. This paper has demonstrated that the proposed FRL outperforms both TPL and CPL in terms of precision. The paper also addresses the concept drift problem by using mutual exclusion constraints. The contributions of this paper are: (i) introduction of a formal fuzzy rough model for relations, (ii) a semi-supervised learning algorithm, (iii) experimental comparison with other machine learning algorithms: TPL and CPL, and (iv) a novel application of fuzzy rough sets.
Similar content being viewed by others
References
Banko M, Cafarella M, Soderland S, Broadhead M, Etzioni O (2007) Open information extraction from the web. In: Proceedings of IJCAI, pp 2670–2676
Bharadwaj A, Ramanna S (2017) Fuzzy rough set-based unstructured text categorization. In: Mouhoub M, Langlais P (eds) Canadian AI 2017, LNAI 10233, pp 335–340
Brin S (1999) Extracting patterns and relations from the world wide web. In: Selected papers from the international workshop on the world wide web and databases, WebDB’98, pp 172–183
Carlson A, Betteridge J, Wang RC, Hruschka Jr ER, Mitchell TM (2010) Coupled semi-supervised learning for information extraction. In: Proceedings of the 3rd ACM international conference on web search and data mining, pp 101–110
Cock MD, Cornelis C, Kerre EE (2004) Fuzzy rough sets: beyond the obvious. In: Proceedings of the 2004 IEEE international conference on fuzzy systems, vol 1, pp 103–108
Cornelis C, De Cock M, Radzikowska AM (2008) Fuzzy rough sets: from theory into practice. In: Pedrycz W, Skowron A, Kreinovich V (eds) Handbook of granular computing. Wiley, Hoboken, pp 533–552
Curran J, Murphy T, Scholz B (2007) Minimising semantic drift with mutual exclusion bootstrapping. In: Proc. of PACLING, pp 172–180
De Cock M, Cornelis C (2005) Fuzzy rough set based web query expansion. In: Proceedings of rough sets and soft computing in intelligent agent and web technology, pp 9–16
Dong XL, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD’14, New York, pp 601–610
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets*. Int J Gener Syst 17(2–3):191–209
Etzioni O, Fader A, Christensen J, Soderland S, Mausam (2011) Open information extraction: the second generation. In: International joint conference on artificial intelligence, pp 3–10
Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2013) A survey on concept drift adaptation. ACM Comput Surv 1(1):1–44
Ghahramani Z, Heller KA (2005) Bayesian sets. In: Advances in neural information processing systems, vol 18
Ho TB, Nguyen NB (2002) Nonhierarchical document clustering based on a tolerance rough set model. Int J Intell Syst 17:199–212
Jensen R, Shen Q (2008) Computational intelligence and feature selection: rough and fuzzy approaches, vol 8. Wiley, London
Kawasaki S, Nguyen NB, Ho TB (2000) Hierarchical document clustering based on tolerance rough set model. In: Proceedings of the 4th European conference on principles of data mining and knowledge discovery, pp 458–463
Mahdisoltani F, Biega J, Suchanek FM (2015) YAGO3: a knowledge base from multilingual wikipedias. In: 7th Biennial conference on innovative data systems research (CIDR 2015)
Miller GA (1995) Wordnet: a lexical database for english. Commun ACM 38(22):39–41
Mitchell T, Cohen W, Hruschka E, Talukdar P, Betteridge J, Carlson A, Dalvi B, Gardner M, Kisiel B, Krishnamurthy J, Lao N, Mazaitis K, Mohamed T, Nakashole N, Platanios E, Ritter A, Samadi M, Settles B, Wang R, Wijaya D, Gupta A, Chen X, Saparov A, Greaves M, Welling J (2018) Never-ending learning. Commun ACM 61(5):103–115
Ngo CL (2003) A tolerance rough set approach to clustering web search results. Master’s thesis, Warsaw University
Nguyen H, Ho TB (2008) Rough document clustering and the internet. In: Pedrycz W, Skowron A, Kreinovich V (eds) Handbook of granular computing. Wiley, Hoboken, pp 987–1003
Nguyen S, Swieboda W, Jaskiewicz G (2012) Extended document representation for search result clustering. In: Bembenik R, Skonieczny L, Rybinski H, Niezgodka M (eds) Intelligent tools for building a scient. Info. Plat. SCI, vol 390, pp 77–95
Pal SK, Skowron A (eds) (1999) Rough-fuzzy hybridization: a new trend in decision making, 1st edn. Springer, Secaucus
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Polkowski L, Skowron A, Zytkow J (1994) Tolerance based rough sets. In: Lin TY, Wildberger M (eds) Soft computing: rough sets, fuzzy logic, neural networks, uncertainty management, knowledge discovery. Simulation Councils Inc., San Diego, pp 55–58
Radzikowska AM, Kerre EE (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126:137–156
Ramanna S, Peters J, Sengoz C (2017) Application of tolerance rough sets in structured and unstructured text categorization: a survey. In: Wang G (ed) Thriving rough sets, studies in computational intelligence, vol 708. Springer, Cham, pp 119–137
Rebele T, Suchanek F, Hoffart J, Biega J, Kuzey E, Weikum G (2016) YAGO: a multilingual knowledge base from wikipedia, wordnet, and geonames. Springer, Cham, pp 177–185
Sengoz C (2014) A granular-based approach for semi-supervised web information labeling. Master’s thesis, University of Winnipeg
Sengoz C, Ramanna S (2014) A semi-supervised learning algorithm for web information extraction with tolerance rough sets. In: Active media technology 2014, Web Intelligence Conference 2014, LNCS 8610, pp 1–10
Sengoz C, Ramanna S (2015) Learning relational facts from the web: a tolerance rough set approach. Pattern Recogn Lett 67(P2):130–137
Shi L, Ma X, Xi L, Duan Q, Zhao J (2011) Rough set and ensemble learning based semi-supervised algorithm for text classification. Expert Syst Appl 38(5):6300–6306
Skowron A, Stepaniuk J (1996) Tolerance approximation spaces. Fundam Inf 27(2,3):245–253
Srinivasan P, Ruiz ME, Kraft DH, Chen J (2001) Vocabulary mining for information retrieval: rough sets and fuzzy sets. Inf Process Manag 37(1):15–38
Suchanek FM (2009) Automated construction and growth of a large ontology. PhD thesis, Natural Sciences and Technology of Saarland University
Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: 16th international world wide web conference (WWW 2007). ACM Press, New York, pp 697–706
Swieboda W, Meina M, Nguyen H (2013) Weight learning for document tolerance rough set model. In: RSKT 2013, LNAI 8171. Springer, Berlin, pp 386–396
Thanh NC, Yamada K, Unehara M (2011) A similarity rough set model for document representation and document clustering. J Adv Comput Intell Intell Inf 15(2):125–133
Verma S, Hruschka Jr ER (2012) Coupled Bayesian sets algorithm for semi-supervised learning and information extraction. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, pp 307–322
Virginia G, Nguyen HS (2013) Lexicon-based document representation. Fundam Inf 124(1–2):27–46
Virginia G, Nguyen HS (2015) A semantic text retrieval for indonesian using tolerance rough sets models. Trans Rough Sets LNCS 8988(XIX):138–224
Zadeh L (1997) Towards a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic. Fuzzy Sets Syst 177(19):111–127
Acknowledgements
Special thanks to Cenker Sengoz for sharing the dataset and for discussions regarding TPL. We are very grateful to Prof. Estevam R. Hruschka Jr. for the NELL dataset and Prof. Andrzej Skowron for helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research has been supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant. Special thanks to Cenker Sengoz for sharing the data set and for discussions regarding TPL. We are very grateful to Prof. Estevam R. Hruschka Jr. for the NELL dataset.
Rights and permissions
About this article
Cite this article
Bharadwaj, A., Ramanna, S. Categorizing relational facts from the web with fuzzy rough sets. Knowl Inf Syst 61, 1695–1713 (2019). https://doi.org/10.1007/s10115-018-1250-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-018-1250-6