Feature selection methods for text classification
We consider feature selection for text classification both theoretically and empirically. Our
main result is an unsupervised feature selection strategy for which we give worst-case …
main result is an unsupervised feature selection strategy for which we give worst-case …
Sampling Algorithms and Coresets for Regression
The $\ell_p$ regression problem takes as input a matrix $A\in\mathbb{R}^{n\times d}$, a
vector $b\in\mathbb{R}^n$, and a number $p\in[1,\infty)$, and it returns as output a number ${\…
vector $b\in\mathbb{R}^n$, and a number $p\in[1,\infty)$, and it returns as output a number ${\…
[PDF][PDF] Applying webtables in practice
We started investigating the collection of HTML tables on the Web and developed the
WebTables system a few years ago [4]. Since then, our work has been motivated by applying …
WebTables system a few years ago [4]. Since then, our work has been motivated by applying …
Wavelet synopsis for data streams: minimizing non-euclidean error
S Guha, B Harb - Proceedings of the eleventh ACM SIGKDD …, 2005 - dl.acm.org
We consider the wavelet synopsis construction problem for data streams where given n
numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much …
numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much …
[BOOK][B] Algorithms for linear and nonlinear approximation of large data
B Harb - 2007 - search.proquest.com
A central problem in approximation theory is the concise representation of functions. Given a
function or signal described as a vector in high-dimensional space, the goal is to represent …
function or signal described as a vector in high-dimensional space, the goal is to represent …
Approximation algorithms for wavelet transform coding of data streams
S Guha, B Harb - IEEE Transactions on Information Theory, 2008 - ieeexplore.ieee.org
This paper addresses the problem of finding a B -term wavelet representation of a given
discrete function fepsiR n whose distance from is minimized. The problem is well understood …
discrete function fepsiR n whose distance from is minimized. The problem is well understood …
[PDF][PDF] Weighted isotonic regression under the L1 norm
Isotonic regression, the problem of finding values that best fit given observations and conform
to specific ordering constraints, has found many applications in biomedical research and …
to specific ordering constraints, has found many applications in biomedical research and …
Query language modeling for voice search
…, J Schalkwyk, T Brants, V Ha, B Harb… - 2010 IEEE Spoken …, 2010 - ieeexplore.ieee.org
The paper presents an empirical exploration of google.com query stream language modeling.
We describe the normalization of the typed query stream resulting in out-of-vocabulary (…
We describe the normalization of the typed query stream resulting in out-of-vocabulary (…
Approximating the Best-Fit Tree Under L p Norms
B Harb, S Kannan, A McGregor - … 2005 and 9th International Workshop on …, 2005 - Springer
We consider the problem of fitting an n× n distance matrix M by a tree metric T. We give a
factor O( min {n 1/p ,(klogn) 1/p }) approximation algorithm for finding the closest ultrametric T …
factor O( min {n 1/p ,(klogn) 1/p }) approximation algorithm for finding the closest ultrametric T …
[PDF][PDF] Back-off language model compression.
With the availability of large amounts of training data relevant to speech recognition scenarios,
scalability becomes a very productive way to improve language model performance. We …
scalability becomes a very productive way to improve language model performance. We …