Projektseminar „Similarity Search Algorithms“

Dustin Lange¹,
Tobias Vogel¹,
Uwe Draisbach¹ &
…
Felix Naumann¹

257 Accesses
1 Citation
Explore all metrics

Zusammenfassung

Mithilfe von Verfahren aus dem Bereich Ähnlichkeitssuche können zu einer Anfrage an einen Datenbestand nicht nur exakte, sondern auch ähnliche Objekte gefunden werden, z. B. Bilder mit ähnlichen Motiven wie auf dem Anfragebild. Mit aktuellen Forschungsansätzen aus diesem Bereich befasste sich das Seminar „Similarity Search Algorithms“, welches wir in diesem Bericht vorstellen.

Das Ziel des Seminars war ein breiter Vergleich bekannter Indexierungsalgorithmen mit Datensätzen aus verschiedenen Bereichen. Die Studenten befassten sich mit je zwei Ähnlichkeitsmaßen für Datensätze aus fünf verschiedenen Domänen und mit je einem von sechs verschiedenen Indexstrukturen zur Ähnlichkeitssuche in metrischen Räumen. In diesem Bericht evaluieren wir die Kombination der Ähnlichkeitsmaße mit den Indexstrukturen bzgl. Indexaufbau und knn-Anfragen. Außerdem beschreiben wir die Durchführung des Seminars und werfen einen Blick auf lessons learned.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Notes

MAGIX AG. freedb.org. http://www.freedb.org, January 2011.
C. Sadowski and G. Levin. SimHash: Hash-based similarity detection. http://simhash.googlecode.com/svn/trunk/paper/SimHashWithBib.pdf, December 2007.
http://www.hpi-web.de/naumann/sites/SimSearch2010/.

Literatur

Bingham E, Mannila H (2001) Random projection in dimensionality reduction: applications to image and text data. In: KDD ’01: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, S 245–250
Chapter Google Scholar
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the Web of data. J Web Semant 7:154–165
Google Scholar
Bozkaya T, Ozsoyoglu M (1997) Distance-based indexing for high-dimensional metric spaces. In: Proceedings of the 1997 ACM SIGMOD international conference on management of data, SIGMOD ’97. ACM, New York, S 357–368
Chapter Google Scholar
Brin S (1995) Near neighbor search in large metric spaces. VLDB J 7(4):574–584
Google Scholar
Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: Proceedings of the 23rd international conference on very large data bases, VLDB ’97. Morgan Kaufmann, San Francisco, S 426–435
Google Scholar
Cohen WW, Ravikumar P, Fienberg SE (2003) A comparison of string distance metrics for name-matching tasks. In: Proceedings of IJCAI-03 workshop on information integration, S 73–78
Google Scholar
Curran T, Keller G, Ladd A (1998) SAP R/3 business blueprint: understanding the business process reference model. Prentice-Hall, Upper Saddle River
Google Scholar
Dohnal V, Gennaro C, Savino P, Zezula P (2003) D-index: distance searching index for metric data sets. Multimed Tools Appl 21(1):9–33
Article Google Scholar
Gotoh O (1982) An improved algorithm for matching biological sequences. J Mol Biol 162(3):705–708
Article Google Scholar
Jaccard P (1901) Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaud Sci Nat 37:547–579
Google Scholar
Jacobs CE, Finkelstein A, Salesin D (1995) Fast multiresolution image querying. In: SIGGRAPH, S 277–286
Google Scholar
Keller G, Teufel T (1998) SAP R/3 process oriented implementation, 1. Aufl. Addison-Wesley/Longman, Boston
Google Scholar
Levenshtein VI (1966) Binary codes capable of correcting deletions, insertions and reversals. Sov Phys Dokl 10:707–710
MathSciNet Google Scholar
Liu T, Rosenberg C, Rowley H (2007) Clustering billions of images with large scale nearest neighbor search. In: Proceedings of the eighth IEEE workshop on applications of computer vision. IEEE Comput Soc, Los Alamitos
Google Scholar
Martin D, Fowlkes C, Tal D, Malik J (2001) A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In: Proc 8th int’l conf computer vision, July 2001, Bd 2, S 416–423
Google Scholar
Micó ML, Oncina J, Vidal E (1994) A new version of the nearest-neighbour approximating and eliminating search algorithm (aesa) with linear preprocessing time and memory requirements. Pattern Recognit Lett 15(1):9–17
Article Google Scholar
Monge A, Elkan C (1996) The field matching problem: algorithms and applications. In: Proceedings of the second international conference on knowledge discovery and data mining, S 267–270
Google Scholar
Munkres J (1957) Algorithms for the assignment and transportation problems. J Soc Ind Appl Math 5(1):32–38
Article MATH MathSciNet Google Scholar
Olson C (1998) A probabilistic formulation for Hausdorff matching. In: Proceedings of the IEEE conference on computer vision and pattern recognition, S 150–156
Google Scholar
Pearson K (1896) Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. In: Phil Trans R Soc Lond, Bd 187, S 253–318
Google Scholar
Philips L (2000) The double metaphone search algorithm. C/C++ Users J 18:38–43
Google Scholar
Phillips W Jr, Bahn AK, Miyasaki M (1962) Person-matching by electronic methods. Commun ACM 5:404–407
Article Google Scholar
Postel H-J (1969) Die Kölner Phonetik – Ein Verfahren zur Identifizierung von Personennamen auf der Grundlage der Gestaltanalyse. IBM-Nachr 19:925–931
Google Scholar
Samet H (2006) Foundations of multidimensional and metric data structures. Morgan Kaufmann, San Mateo
MATH Google Scholar
Winkler WE (2003) Methods for evaluating and creating data quality. Inf Syst (Oxf) 29:531–550
Article Google Scholar
Yianilos PN (1993) Data structures and algorithms for nearest neighbor search in general metric spaces. In: SODA: ACM-SIAM symposium on discrete algorithms (A conference on theoretical and experimental analysis of discrete algorithms)
Google Scholar
Zezula P, Amato G, Dohnal V, Batko M (2006) Similarity search—the metric space approach. Springer, Berlin
MATH Google Scholar

Download references

Danksagung

Wir möchten uns bei allen Studenten bedanken, die erfolgreich und engagiert an unserem Seminar teilgenommen haben.

Author information

Authors and Affiliations

Hasso-Plattner-Institut, Universität Potsdam, Prof.-Dr.-Helmert-Str. 2–3, 14482, Potsdam, Deutschland
Dustin Lange, Tobias Vogel, Uwe Draisbach & Felix Naumann

Authors

Dustin Lange
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Vogel
View author publications
You can also search for this author in PubMed Google Scholar
Uwe Draisbach
View author publications
You can also search for this author in PubMed Google Scholar
Felix Naumann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dustin Lange.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lange, D., Vogel, T., Draisbach, U. et al. Projektseminar „Similarity Search Algorithms“. Datenbank Spektrum 11, 51–57 (2011). https://doi.org/10.1007/s13222-011-0046-6

Download citation

Received: 14 January 2011
Accepted: 04 February 2011
Published: 12 February 2011
Issue Date: April 2011
DOI: https://doi.org/10.1007/s13222-011-0046-6

Projektseminar „Similarity Search Algorithms“

Zusammenfassung

Access this article

Subscribe and save

Buy Now

Notes

Literatur

Danksagung

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Schlüsselwörter

Subscribe and save

Buy Now

Navigation

Projektseminar „Similarity Search Algorithms“

Zusammenfassung

Access this article

Subscribe and save

Buy Now

Notes

Literatur

Danksagung

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Schlüsselwörter

Subscribe and save

Buy Now

Search

Navigation