research-article

Structural Generalizability: The Case of Similarity Search

Authors:

Yodsawalai Chodpathumwan,

Zheng LiuAuthors Info & Claims

SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data

Pages 326 - 338

https://doi.org/10.1145/3448016.3457316

Published: 18 June 2021 Publication History

Get Access

Abstract

Supervised and Unsupervised ML algorithms are widely used over graphs. They use the structural properties of the data to deliver effective results. It is known that the same information can be represented under various graph structures. Thus, these algorithms may be effective on some structural variations of the data and ineffective on others. One would like to have an algorithm that is effective and generalizes to all structural variations of a data graph. We define the concept of structural generalizability for algorithms over graphs. We focus on the problem of similarity search, which is a popular task and the building block of many ML algorithms on graphs, and propose a structurally generalizable similarity search algorithm. As this algorithm may require users to specify features in a rather complex language, we modify this algorithm so that it requires only simple guidance from the user. Our extensive empirical study show that our algorithms are structurally generalizable while being efficient and more effective than current algorithms.

Supplementary Material

MP4 File (3448016.3457316.mp4)

Graph similarity search algorithms usually leverage the structural properties of a database. Hence, these algorithms are effective only on some structural variations of the data and are ineffective on other forms, which makes them hard to use. Ideally, one would like to design a data analytics algorithm that is structurally robust, i.e., it returns essentially the same accurate results over all possible structural variations of a dataset. We propose a novel approach to create a structurally robust similarity search algorithm over graph databases. We leverage the classic insight in the database literature that schematic variations are caused by having constraints in the database. We then present RelSim algorithm which is provably structurally robust under these variations. Our empirical studies show that our proposed algorithms are structurally robust while being efficient and as effective as or more effective than the state- of-the-art similarity search algorithms.

Download
29.42 MB

References

[1]

Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases: The Logical Level. Addison-Wesley.

Abstract

Supplementary Material

References

Cited By

Index Terms

Recommendations

Index Structures for Fast Similarity Search for Binary Vectors

Distance-Based Index Structures for Fast Similarity Search

Index Structures for Fast Similarity Search for Real Vectors. II*

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations