Inductive Logic Programming for Multiple-Part Data: Applications on Structure-Activity Relationship Studies

Abstract

Inductive Logic Programming (ILP) becomes interesting when the expressive power of first-order representation provides comprehensibility to learning result and capability to handle more complex data consisting of their relations. Nevertheless, the bottleneck for learning first-order theory is enormous hypothesis search space which causes inefficient performance by the existing learning approaches compared to the propositional approaches. This paper introduces an improved ILP approach capable of handling more efficiently a kind of data called multiple-part data, i.e., one instance of data consists of several parts as well as relations among parts. This approach tries to find hypothesis describing class of each training example by using both individual and relational characteristics of its part which is similar to finding common substructures among the complex relational instances. The multiple-part data can be found in various domains especially on Structure-Activity Relationship (SAR) studies which aim to generate hypotheses describing activities or characteristics of chemical compounds from their own structures. Each compound is composed of atoms as parts, and various kinds of bond as relations among atoms. We then apply the proposed algorithm for SAR studies by conducting experiments on two real-world datasets: mutagenicity in nitroaromatic compounds and dopamine antagonist compounds. The experiment results were compared to the previous approaches in order to show the performance of proposed approach.

Content from these authors

Favorites & Alerts

Corresponding author

Register with J-STAGE for free!