Discovering structural association of semistructured data

K Wang, H Liu - IEEE Transactions on knowledge and data …, 2000 - ieeexplore.ieee.org
K Wang, H Liu
IEEE Transactions on knowledge and data engineering, 2000ieeexplore.ieee.org
Many semistructured objects are similarly, though not identically structured. We study the
problem of discovering" typical" substructures of a collection of semistructured objects. The
discovered structures can serve the following purposes: 1) the" table-of-contents" for gaining
general information of a source, 2) a road map for browsing and querying information
sources, 3) a basis for clustering documents, 4) partial schemas for providing standard
database access methods, and 5) user/customer interests and browsing patterns. The …
Many semistructured objects are similarly, though not identically structured. We study the problem of discovering "typical" substructures of a collection of semistructured objects. The discovered structures can serve the following purposes: 1) the "table-of-contents" for gaining general information of a source, 2) a road map for browsing and querying information sources, 3) a basis for clustering documents, 4) partial schemas for providing standard database access methods, and 5) user/customer interests and browsing patterns. The discovery task is affected by structural features of semistructured data in a nontrivial way and traditional data mining frameworks are inapplicable. We define this discovery problem and propose a solution.
ieeexplore.ieee.org