Feature selection methods for text classification

A Dasgupta, P Drineas, B Harb, V Josifovski… - Proceedings of the 13th …, 2007 - dl.acm.org
We consider feature selection for text classification both theoretically and empirically. Our
main result is an unsupervised feature selection strategy for which we give worst-case …

Sampling Algorithms and Coresets for Regression

A Dasgupta, P Drineas, B Harb, R Kumar… - SIAM Journal on …, 2009 - SIAM
The $\ell_p$ regression problem takes as input a matrix $A\in\mathbb{R}^{n\times d}$, a
vector $b\in\mathbb{R}^n$, and a number $p\in[1,\infty)$, and it returns as output a number ${\…

[PDF][PDF] Applying webtables in practice

S Balakrishnan, A Halevy, B Harb, H Lee, J Madhavan… - 2015 - cidrdb.org
We started investigating the collection of HTML tables on the Web and developed the
WebTables system a few years ago [4]. Since then, our work has been motivated by applying …

Wavelet synopsis for data streams: minimizing non-euclidean error

S Guha, B Harb - Proceedings of the eleventh ACM SIGKDD …, 2005 - dl.acm.org
We consider the wavelet synopsis construction problem for data streams where given n
numbers we wish to estimate the data by constructing a synopsis, whose size, say B is much …

[BOOK][B] Algorithms for linear and nonlinear approximation of large data

B Harb - 2007 - search.proquest.com
A central problem in approximation theory is the concise representation of functions. Given a
function or signal described as a vector in high-dimensional space, the goal is to represent …

Approximation algorithms for wavelet transform coding of data streams

S Guha, B Harb - IEEE Transactions on Information Theory, 2008 - ieeexplore.ieee.org
This paper addresses the problem of finding a B -term wavelet representation of a given
discrete function fepsiR n whose distance from is minimized. The problem is well understood …

[PDF][PDF] Weighted isotonic regression under the L1 norm

S Angelov, B Harb, S Kannan… - Proceedings of the …, 2006 - researchgate.net
Isotonic regression, the problem of finding values that best fit given observations and conform
to specific ordering constraints, has found many applications in biomedical research and …

Query language modeling for voice search

…, J Schalkwyk, T Brants, V Ha, B Harb… - 2010 IEEE Spoken …, 2010 - ieeexplore.ieee.org
The paper presents an empirical exploration of google.com query stream language modeling.
We describe the normalization of the typed query stream resulting in out-of-vocabulary (…

Approximating the Best-Fit Tree Under L p Norms

B Harb, S Kannan, A McGregor - … 2005 and 9th International Workshop on …, 2005 - Springer
We consider the problem of fitting an n× n distance matrix M by a tree metric T. We give a
factor O( min {n 1/p ,(klogn) 1/p }) approximation algorithm for finding the closest ultrametric T …

[PDF][PDF] Back-off language model compression.

B Harb, C Chelba, J Dean, S Ghemawat - INTERSPEECH, 2009 - isca-archive.org
With the availability of large amounts of training data relevant to speech recognition scenarios,
scalability becomes a very productive way to improve language model performance. We …