Meyerson et al., 2004 - Google Patents

A k-median algorithm with running time independent of data size

Meyerson et al., 2004

Document ID: 10722351027856968133
Author: Meyerson A; O'callaghan L; Plotkin S
Publication year: 2004
Publication venue: Machine Learning

External Links

Cited by

Snippet

We give a sampling-based algorithm for the k-Median problem, with running time O (k (k^ 2 ∈\log k)^ 2 log (k ∈\log k)), where k is the desired number of clusters and∈ is a confidence parameter. This is the first k-Median algorithm with fully polynomial running time that is …

Continue reading at link.springer.com (PDF) (other versions)

238000005070 sampling 0 abstract description 17

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30533—Other types of queries
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30477—Query execution
- G06F17/30483—Query execution of query operations
- G06F17/30486—Unary operations; data partitioning operations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30244—Information retrieval; Database structures therefor; File system structures therefor in image databases
- G06F17/30247—Information retrieval; Database structures therefor; File system structures therefor in image databases based on features automatically derived from the image data
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes

Similar Documents

Publication	Publication Date	Title
Meyerson et al.	2004	A k-median algorithm with running time independent of data size
Cunningham et al.	2020	k-Nearest neighbour classifiers: (with Python examples)
US7797323B1 (en)	2010-09-14	Producing representative hashes for segments of a file
US6507669B1 (en)	2003-01-14	Method of selecting clusters of items using a fuzzy histogram analysis
Dohnal et al.	2003	D-index: Distance searching index for metric data sets
Nguyen et al.	2007	Consensus clusterings
Lejsek et al.	2008	NV-Tree: An efficient disk-based index for approximate search in very large high-dimensional collections
Law et al.	2005	An adaptive nearest neighbor classification algorithm for data streams
US7475071B1 (en)	2009-01-06	Performing a parallel nearest-neighbor matching operation using a parallel hybrid spill tree
KR101266358B1 (en)	2013-05-22	A distributed index system based on multi-length signature files and method thereof
US7603370B2 (en)	2009-10-13	Method for duplicate detection and suppression
US20100106713A1 (en)	2010-04-29	Method for performing efficient similarity search
US6760718B2 (en)	2004-07-06	Database operation processor
WO2000028441A2 (en)	2000-05-18	A density-based indexing method for efficient execution of high-dimensional nearest-neighbor queries on large databases
JP2004199472A (en)	2004-07-15	Computer system for generating data structure for information retrieval, its method, computer executable program for generating data structure for information retrieval, computer readable storage medium storing computer executable program, information retrieval system, and graphical user interface
US7120624B2 (en)	2006-10-10	Optimization based method for estimating the results of aggregate queries
Tavenard et al.	2015	Improving the efficiency of traditional DTW accelerators
US20080065682A1 (en)	2008-03-13	Search index generation apparatus
Cayton et al.	2007	A learning framework for nearest neighbor search
Han et al.	2019	Ranking the big sky: efficient top-k skyline computation on massive data
Bansal et al.	2008	Ad-hoc aggregations of ranked lists in the presence of hierarchies
Andoni et al.	2021	Approximate nearest neighbors beyond space partitions
Erdinç et al.	2024	MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data
Zhang et al.	2023	PARROT: pattern-based correlation exploitation in big partitioned data series
Aslam et al.	2000	Scalable information organization