Meyerson et al., 2004 - Google Patents
A k-median algorithm with running time independent of data sizeMeyerson et al., 2004
View PDF- Document ID
- 10722351027856968133
- Author
- Meyerson A
- O'callaghan L
- Plotkin S
- Publication year
- Publication venue
- Machine Learning
External Links
Snippet
We give a sampling-based algorithm for the k-Median problem, with running time O (k (k^ 2 ∈\log k)^ 2 log (k ∈\log k)), where k is the desired number of clusters and∈ is a confidence parameter. This is the first k-Median algorithm with fully polynomial running time that is …
- 238000005070 sampling 0 abstract description 17
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30533—Other types of queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30477—Query execution
- G06F17/30483—Query execution of query operations
- G06F17/30486—Unary operations; data partitioning operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30705—Clustering or classification
- G06F17/3071—Clustering or classification including class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30312—Storage and indexing structures; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30244—Information retrieval; Database structures therefor; File system structures therefor in image databases
- G06F17/30247—Information retrieval; Database structures therefor; File system structures therefor in image databases based on features automatically derived from the image data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/3061—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F17/30613—Indexing
- G06F17/30619—Indexing indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30017—Multimedia data retrieval; Retrieval of more than one type of audiovisual media
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Meyerson et al. | A k-median algorithm with running time independent of data size | |
Cunningham et al. | k-Nearest neighbour classifiers: (with Python examples) | |
US7797323B1 (en) | Producing representative hashes for segments of a file | |
US6507669B1 (en) | Method of selecting clusters of items using a fuzzy histogram analysis | |
Dohnal et al. | D-index: Distance searching index for metric data sets | |
Nguyen et al. | Consensus clusterings | |
Lejsek et al. | NV-Tree: An efficient disk-based index for approximate search in very large high-dimensional collections | |
Law et al. | An adaptive nearest neighbor classification algorithm for data streams | |
US7475071B1 (en) | Performing a parallel nearest-neighbor matching operation using a parallel hybrid spill tree | |
KR101266358B1 (en) | A distributed index system based on multi-length signature files and method thereof | |
US7603370B2 (en) | Method for duplicate detection and suppression | |
US20100106713A1 (en) | Method for performing efficient similarity search | |
US6760718B2 (en) | Database operation processor | |
WO2000028441A2 (en) | A density-based indexing method for efficient execution of high-dimensional nearest-neighbor queries on large databases | |
JP2004199472A (en) | Computer system for generating data structure for information retrieval, its method, computer executable program for generating data structure for information retrieval, computer readable storage medium storing computer executable program, information retrieval system, and graphical user interface | |
US7120624B2 (en) | Optimization based method for estimating the results of aggregate queries | |
Tavenard et al. | Improving the efficiency of traditional DTW accelerators | |
US20080065682A1 (en) | Search index generation apparatus | |
Cayton et al. | A learning framework for nearest neighbor search | |
Han et al. | Ranking the big sky: efficient top-k skyline computation on massive data | |
Bansal et al. | Ad-hoc aggregations of ranked lists in the presence of hierarchies | |
Andoni et al. | Approximate nearest neighbors beyond space partitions | |
Erdinç et al. | MCMSTStream: applying minimum spanning tree to KD-tree-based micro-clusters to define arbitrary-shaped clusters in streaming data | |
Zhang et al. | PARROT: pattern-based correlation exploitation in big partitioned data series | |
Aslam et al. | Scalable information organization |