### Results for "David Woodruff"

total 38000took 0.14s
Sketching as a Tool for Numerical Linear AlgebraNov 17 2014Feb 10 2015This survey highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by multiplying it by a (usually) random ... More
(1+eps)-approximate Sparse RecoveryOct 19 2011Dec 26 2011The problem central to sparse recovery and compressive sensing is that of stable sparse recovery: we want a distribution of matrices A in R^{m\times n} such that, for any x \in R^n and with probability at least 2/3 over A, there is an algorithm to recover ... More
Distributed Statistical Estimation of Matrix Products with ApplicationsJul 02 2018We consider statistical estimations of a matrix product over the integers in a distributed setting, where we have two parties Alice and Bob; Alice holds a matrix $A$ and Bob holds a matrix $B$, and they want to estimate statistics of $A \cdot B$. We focus ... More
Separating k-Player from t-Player One-Way Communication, with Applications to Data StreamsMay 17 2019In a $k$-party communication problem, the $k$ players with inputs $x_1, x_2, \ldots, x_k$, respectively, want to evaluate a function $f(x_1, x_2, \ldots, x_k)$ using as little communication as possible. We consider the message-passing model, in which ... More
Low Rank Approximation with Entrywise $\ell_1$-Norm ErrorNov 03 2016We study the $\ell_1$-low rank approximation problem, where for a given $n \times d$ matrix $A$ and approximation factor $\alpha \geq 1$, the goal is to output a rank-$k$ matrix $\widehat{A}$ for which $$\|A-\widehat{A}\|_1 \leq \alpha \cdot \min_{\textrm{rank-}k\textrm{ ... More Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel k-means ClusteringMay 15 2019We present tight lower bounds on the number of kernel evaluations required to approximately solve kernel ridge regression (KRR) and kernel k-means clustering (KKMC) on n input points. For KRR, our bound for relative error approximation to the minimizer ... More Faster Algorithms for High-Dimensional Robust Covariance EstimationJun 11 2019We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal ... More Dimensionality Reduction for Tukey RegressionMay 14 2019We give the first dimensionality reduction methods for the overconstrained Tukey regression problem. The Tukey loss function \|y\|_M = \sum_i M(y_i) has M(y_i) \approx |y_i|^p for residual errors y_i smaller than a prescribed threshold \tau, but ... More Conditional Sparse \ell_p-norm Regression With Optimal ProbabilityJun 26 2018We consider the following conditional linear regression problem: the task is to identify both (i) a k-DNF condition c and (ii) a linear rule f such that the probability of c is (approximately) at least some given bound \mu, and f minimizes ... More Sample-Optimal Low-Rank Approximation of Distance MatricesJun 02 2019A distance matrix A \in \mathbb R^{n \times m} represents all pairwise distances, A_{ij}=\mathrm{d}(x_i,y_j), between two point sets x_1,...,x_n and y_1,...,y_m in an arbitrary metric space (\mathcal Z, \mathrm{d}). Such matrices arise in various ... More Communication Efficient Distributed Kernel Principal Component AnalysisMar 23 2015Feb 13 2016Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate ... More New Algorithms for Heavy Hitters in Data StreamsMar 05 2016An old and fundamental problem in databases and data streams is that of finding the heavy hitters, also known as the top-k, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem, which quantify ... More Lower Bounds for Adaptive Sparse RecoveryMay 15 2012Oct 21 2012We give lower bounds for the problem of stable sparse recovery from /adaptive/ linear measurements. In this problem, one would like to estimate a vector x \in \R^n from m linear measurements A_1x,..., A_mx. One may choose each vector A_i based ... More Optimal CUR Matrix DecompositionsMay 30 2014Jul 16 2014The CUR decomposition of an m \times n matrix A finds an m \times c matrix C with a subset of c < n columns of A, together with an r \times n matrix R with a subset of r < m rows of A, as well as a c \times r low-rank matrix U ... More Subspace Embeddings and \ell_p-Regression Using Exponential Random VariablesMay 23 2013Mar 17 2014Oblivious low-distortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real p, 1 \leq p < \infty, given a matrix M \in \mathbb{R}^{n \times d} with n \gg d, with constant probability we can ... More When Distributed Computation is Communication ExpensiveApr 16 2013Jul 26 2013We consider a number of fundamental statistical and graph problems in the message-passing model, where we have k machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the k data sets. ... More On Approximating Functions of the Singular Values in a StreamApr 29 2016For any real number p > 0, we nearly completely characterize the space complexity of estimating \|A\|_p^p = \sum_{i=1}^n \sigma_i^p for n \times n matrices A in which each row and each column has O(1) non-zero entries and whose entries are presented ... More Distributed Low Rank Approximation of Implicit Functions of a MatrixJan 28 2016We study distributed low rank approximation in which the matrix to be approximated is only implicitly represented across the different servers. For example, each of s servers may have an n \times d matrix A^t, and we may be interested in computing ... More Strong Coresets for k-Median and Subspace Approximation: Goodbye DimensionSep 09 2018We obtain the first strong coresets for the k-median and subspace approximation problems with sum of distances objective function, on n points in d dimensions, with a number of weighted points that is independent of both n and d; namely, our ... More How Robust are Linear Sketches to Adaptive Inputs?Nov 05 2012Linear sketches are powerful algorithmic tools that turn an n-dimensional input into a concise lower-dimensional representation via a linear transformation. Such sketches have seen a wide range of applications including norm estimation over data streams, ... More A Near-Optimal Algorithm for L1-DifferenceApr 13 2009We give the first L_1-sketching algorithm for integer vectors which produces nearly optimal sized sketches in nearly linear time. This answers the first open problem in the list of open problems from the 2006 IITK Workshop on Algorithms for Data Streams. ... More Tight Bounds for Distributed Functional MonitoringDec 21 2011Jun 12 2013We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008). In this model there are k sites each tracking their input and communicating with a central coordinator ... More Tight Bounds for \ell_p Oblivious Subspace EmbeddingsJan 13 2018Apr 06 2018An \ell_p oblivious subspace embedding is a distribution over r \times n matrices \Pi such that for any fixed n \times d matrix A,$$\Pr_{\Pi}[\textrm{for all }x, \ \|Ax\|_p \leq \|\Pi Ax\|_p \leq \kappa \|Ax\|_p] \geq 9/10,$$where r is the ... More Optimal Random Sampling from Distributed Streams RevisitedMar 28 2019We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator ... More Tight Bounds for \ell_p Oblivious Subspace EmbeddingsJan 13 2018An \ell_p oblivious subspace embedding is a distribution over r \times n matrices \Pi such that for any fixed n \times d matrix A,$$\Pr_{\Pi}[\textrm{for all }x, \ \|Ax\|_p \leq \|\Pi Ax\|_p \leq \kappa \|Ax\|_p] \geq 9/10,where r is the ... More The Round Complexity of Small Set IntersectionApr 05 2013Apr 09 2013The set disjointness problem is one of the most fundamental and well-studied problems in communication complexity. In this problem Alice and Bob hold sets S, T \subseteq [n], respectively, and the goal is to decide if S \cap T = \emptyset. Reductions ... More Input Sparsity and Hardness for Robust Subspace ApproximationOct 20 2015In the subspace approximation problem, we seek a k-dimensional subspace F of R^d that minimizes the sum of p-th powers of Euclidean distances to a given set of n points a_1, ..., a_n in R^d, for p >= 1. More generally than minimizing sum_i dist(a_i,F)^p,we ... More Learning Two Layer Rectified Neural Networks in Polynomial TimeNov 05 2018Consider the following fundamental learning problem: given input examples x \in \mathbb{R}^d and their vector-valued labels, as defined by an underlying generative neural network, recover the weight matrices of this network. We consider two-layer networks, ... More Leveraging Well-Conditioned Bases: Streaming \& Distributed Summaries in Minkowski p-NormsJul 06 2018Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm \ell_2. We study other \ell_p ... More Approximation Algorithms for \ell_0-Low Rank ApproximationOct 30 2017Oct 01 2018We study the \ell_0-Low Rank Approximation Problem, where the goal is, given an m \times n matrix A, to output a rank-k matrix A' for which \|A'-A\|_0 is minimized. Here, for a matrix B, \|B\|_0 denotes the number of its non-zero entries. ... More Low Rank Approximation and Regression in Input Sparsity TimeJul 26 2012Apr 05 2013We design a new distribution over \poly(r \eps^{-1}) \times n matrices S so that for any fixed n \times d matrix A of rank r, with probability at least 9/10, \norm{SAx}_2 = (1 \pm \eps)\norm{Ax}_2 simultaneously for all x \in \mathbb{R}^d. ... More Applications of Uniform Sampling: Densest Subgraph and BeyondJun 15 2015Jul 29 2015Recently [Bhattacharya et al., STOC 2015] provide the first non-trivial algorithm for the densest subgraph problem in the streaming model with additions and deletions to its edges, i.e., for dynamic graph streams. They present a (0.5-\epsilon)-approximation ... More An Optimal Algorithm for l1-Heavy Hitters in Insertion Streams and Related ProblemsMar 01 2016We give the first optimal bounds for returning the \ell_1-heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of m items in \{1, 2, \dots, n\} and parameters ... More Towards a Zero-One Law for Entrywise Low Rank ApproximationNov 04 2018There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank-k matrix B minimizing the sum of absolute values of differences to a given matrix A, \min_{\textrm{rank-}k~B}\|A-B\|_1, or more ... More Optimal Principal Component Analysis in Distributed and Streaming ModelsApr 25 2015Jul 12 2016We study the Principal Component Analysis (PCA) problem in the distributed and streaming models of computation. Given a matrix A \in R^{m \times n}, a rank parameter k < rank(A), and an accuracy parameter 0 < \epsilon < 1, we want to output an m ... More Weighted Maximum Independent Set of Geometric Objects in Turnstile StreamsFeb 27 2019We study the Maximum Independent Set problem for geometric objects given in the data stream model. A set of geometric objects is said to be independent if the objects are pairwise disjoint. We consider geometric objects in one and two dimensions, i.e., ... More On Deterministic Sketching and Streaming for Sparse Recovery and Norm EstimationJun 25 2012We study classic streaming and sparse recovery problems using deterministic linear sketches, including l1/l1 and linf/l1 sparse recovery problems (the latter also being known as l1-heavy hitters), norm estimation, and approximate inner product. We focus ... More On the Power of Adaptivity in Sparse RecoveryOct 17 2011The goal of (stable) sparse recovery is to recover a k-sparse approximation x* of a vector x from linear measurements of x. Specifically, the goal is to recover x* such that ||x-x*||_p <= C min_{k-sparse x'} ||x-x'||_q for some constant C ... More The Sketching Complexity of Graph CutsMar 27 2014Nov 10 2014We study the problem of sketching an input graph, so that given the sketch, one can estimate the weight of any cut in the graph within factor 1+\epsilon. We present lower and upper bounds on the size of a randomized sketch, focusing on the dependence ... More Near Optimal Sketching of Low-Rank Tensor RegressionSep 20 2017We study the least squares regression problem \begin{align*} \min_{\Theta \in \mathcal{S}_{\odot D,R}} \|A\Theta-b\|_2, \end{align*} where \mathcal{S}_{\odot D,R} is the set of \Theta for which \Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots ... More On Sketching the q to p normsJun 17 2018We initiate the study of data dimensionality reduction, or sketching, for the q\to p norms. Given an n \times d matrix A, the q\to p norm, denoted \|A\|_{q \to p} = \sup_{x \in \mathbb{R}^d \backslash \vec{0}} \frac{\|Ax\|_p}{\|x\|_q}, is a ... More Low-Rank Approximation from Communication ComplexityApr 22 2019In low-rank approximation with missing entries, given A\in \mathbb{R}^{n\times n} and binary W \in \{0,1\}^{n\times n}, the goal is to find a rank-k matrix L for which:cost(L)=\sum_{i=1}^{n} \sum_{j=1}^{n}W_{i,j}\cdot (A_{i,j} - L_{i,j})^2\le ... More
Tight Bounds for the Subspace Sketch Problem with ApplicationsApr 11 2019In the subspace sketch problem one is given an $n\times d$ matrix $A$ with $O(\log(nd))$ bit entries, and would like to compress it in an arbitrary way to build a small space data structure $Q_p$, so that for any given $x \in \mathbb{R}^d$, with probability ... More
Principal Component Analysis and Higher Correlations for Distributed DataApr 10 2013Jun 29 2014We consider algorithmic problems in the setting in which the input data has been partitioned arbitrarily on many servers. The goal is to compute a function of all the data, and the bottleneck is the communication used by the algorithm. We present algorithms ... More
Fast Regression with an $\ell_\infty$ GuaranteeMay 30 2017Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an $n \times d$ matrix $A$, with $n \gg d$, as well as an $n \times 1$ vector ... More
Relative Error Tensor Low Rank ApproximationApr 26 2017Mar 29 2018We consider relative error low rank approximation of $tensors$ with respect to the Frobenius norm: given an order-$q$ tensor $A \in \mathbb{R}^{\prod_{i=1}^q n_i}$, output a rank-$k$ tensor $B$ for which $\|A-B\|_F^2 \leq (1+\epsilon)$OPT, where OPT $= ... More Sublinear Optimization for Machine LearningOct 21 2010We give sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, ... More Improved Distributed Principal Component AnalysisAug 25 2014Dec 23 2014We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the ... More Fast Moment Estimation in Data Streams in Optimal SpaceJul 23 2010We give a space-optimal algorithm with update time O(log^2(1/eps)loglog(1/eps)) for (1+eps)-approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream. This provides a nearly exponential improvement in the update ... More Faster Kernel Ridge Regression Using Sketching and PreconditioningNov 10 2016Random feature maps, such as random Fourier features, have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods such as kernel ridge regression. However, random feature maps only provide crude approximations ... More Improved Algorithms for Adaptive Compressed SensingApr 25 2018In the problem of adaptive compressed sensing, one wants to estimate an approximately$k$-sparse vector$x\in\mathbb{R}^n$from$m$linear measurements$A_1 x, A_2 x,\ldots, A_m x$, where$A_i$can be chosen based on the outcomes$A_1 x,\ldots, A_{i-1} ... More
Beating CountSketch for Heavy Hitters in Insertion StreamsNov 02 2015Given a stream $p_1, \ldots, p_m$ of items from a universe $\mathcal{U}$, which, without loss of generality we identify with the set of integers $\{1, 2, \ldots, n\}$, we consider the problem of returning all $\ell_2$-heavy hitters, i.e., those items ... More
Revisiting Frequency Moment Estimation in Random Order StreamsMar 06 2018We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments $F_p$ for $0 < p < 2$ of an underlying $n$-dimensional vector presented as a sequence of additive updates in a stream. It is well-known ... More
Optimal approximate matrix product in terms of stable rankJul 08 2015Mar 02 2016We prove, using the subspace embedding guarantee in a black box way, that one can achieve the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map having $m = O(\tilde{r}/\varepsilon^2)$ rows. Here $\tilde{r}$ ... More
On Coresets for Logistic RegressionMay 22 2018Sep 13 2018Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly sublinear sized ... More
A Sketching Algorithm for Spectral Graph SparsificationDec 28 2014We study the problem of compressing a weighted graph $G$ on $n$ vertices, building a "sketch" $H$ of $G$, so that given any vector $x \in \mathbb{R}^n$, the value $x^T L_G x$ can be approximated up to a multiplicative $1+\epsilon$ factor from only $H$ ... More
How to Fake Multiply by a Gaussian MatrixJun 18 2016Have you ever wanted to multiply an $n \times d$ matrix $X$, with $n \gg d$, on the left by an $m \times n$ matrix $\tilde G$ of i.i.d. Gaussian random variables, but could not afford to do it because it was too slow? In this work we propose a new randomized ... More
Frequent Directions : Simple and Deterministic Matrix SketchingJan 08 2015Apr 21 2015We describe a new algorithm called Frequent Directions for deterministic matrix sketching in the row-updates model. The algorithm is presented an arbitrary input matrix $A \in R^{n \times d}$ one row at a time. It performed $O(d \times \ell)$ operations ... More
Communication-Optimal Distributed ClusteringFeb 01 2017Clustering large datasets is a fundamental problem with a number of applications in machine learning. Data is often collected on different sites and clustering needs to be performed in a distributed manner with low communication. We would like the quality ... More
Sketching for Kronecker Product Regression and P-splinesDec 27 2017TensorSketch is an oblivious linear sketch introduced in Pagh'13 and later used in Pham, Pagh'13 in the context of SVMs for polynomial kernels. It was shown in Avron, Nguyen, Woodruff'14 that TensorSketch provides a subspace embedding, and therefore can ... More
Weighted Reservoir Sampling from Distributed StreamsApr 08 2019We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights are equal, ... More
The Communication Complexity of OptimizationJun 13 2019We consider the communication complexity of a number of distributed optimization problems. We start with the problem of solving a linear system. Suppose there is a coordinator together with $s$ servers $P_1, \ldots, P_s$, the $i$-th of which holds a subset ... More
Querying a Matrix through Matrix-Vector ProductsJun 13 2019We consider algorithms with access to an unknown matrix $M\in\mathbb{F}^{n \times d}$ via matrix-vector products, namely, the algorithm chooses vectors $\mathbf{v}^1, \ldots, \mathbf{v}^q$, and observes $M\mathbf{v}^1,\ldots, M\mathbf{v}^q$. Here the ... More
Lower Bounds for Sparse RecoveryJun 02 2011Jun 03 2011We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover x' satisfying ||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1. It is known that there exist matrices A with this ... More
Revisiting Norm Estimation in Data StreamsNov 21 2008Apr 09 2009The problem of estimating the pth moment F_p (p nonnegative and real) in data streams is as follows. There is a vector x which starts at 0, and many updates of the form x_i <-- x_i + v come sequentially in a stream. The algorithm also receives an error ... More
Faster Kernel Ridge Regression Using Sketching and PreconditioningNov 10 2016Nov 26 2016Kernel Ridge Regression (KRR) is a simple yet powerful technique for non-parametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are ... More
Sharper Bounds for Regression and Low-Rank Approximation with RegularizationNov 10 2016The technique of matrix sketching, such as the use of random projections, has been shown in recent years to be a powerful tool for accelerating many important statistical learning techniques. Research has so far focused largely on using sketching for ... More
On The Communication Complexity of Linear Algebraic Problems in the Message Passing ModelJul 17 2014We study the communication complexity of linear algebraic problems over finite fields in the multi-player message passing model, proving a number of tight lower bounds. Specifically, for a matrix which is distributed among a number of players, we consider ... More
Testing Matrix Rank, OptimallyOct 18 2018We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ ... More
Cell2Fire: A Cell Based Forest Fire Growth ModelMay 22 2019Cell2Fire is a new cell-based forest and wildland landscape fire growth simulator that is open-source and exploits parallelism to support the modelling of fire growth cross large spatial and temporal scales in a timely manner. The fire environment is ... More
EMFS: Repurposing SMTP and IMAP for Data Storage and SynchronizationJan 29 2016Cloud storage has become a massive and lucrative business, with companies like Apple, Microsoft, Google, and Dropbox providing hundreds of millions of clients with synchronized and redundant storage. These services often command price-to-storage ratios ... More
Robust Communication-Optimal Distributed Clustering AlgorithmsMar 02 2017Mar 06 2019In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on gaining a finer ... More
On Sketching Quadratic FormsNov 19 2015We undertake a systematic study of sketching a quadratic form: given an $n \times n$ matrix $A$, create a succinct sketch $\textbf{sk}(A)$ which can produce (without further access to $A$) a multiplicative $(1+\epsilon)$-approximation to $x^T A x$ for ... More
Streaming Space Complexity of Nearly All Functions of One Variable on Frequency VectorsJan 27 2016A central problem in the theory of algorithms for data streams is to determine which functions on a stream can be approximated in sublinear, and especially sub-polynomial or poly-logarithmic, space. Given a function $g$, we study the space complexity ... More
BPTree: an $\ell_2$ heavy hitters algorithm using constant memoryMar 02 2016Mar 08 2016The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. In sub-polynomial space, the strongest guarantee available is the $\ell_2$ guarantee, which requires finding all items that occur at least ... More
Transitive-Closure SpannersAug 13 2008Given a directed graph G = (V,E) and an integer k>=1, a k-transitive-closure-spanner (k-TC-spanner) of G is a directed graph H = (V, E_H) that has (1) the same transitive-closure as G and (2) diameter at most k. These spanners were implicitly studied ... More
Studying Neutral Current Elastic Scattering and the Strange Axial Form Factor in MicroBooNEJan 13 2019One of the least constrained contributions to the neutral current (NC) elastic neutrino-proton cross section is the strange axial form factor, which represents the strange quark spin contribution to the spin structure of the proton. This becomes the net ... More
Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and HardnessApr 13 2017Jan 03 2019Understanding the singular value spectrum of a matrix $A \in \mathbb{R}^{n \times n}$ is a fundamental task in countless applications. In matrix multiplication time, it is possible to perform a full SVD and directly compute the singular values $\sigma_1,...,\sigma_n$. ... More
Optimal lower bounds for universal relation, and for samplers and finding duplicates in streamsApr 03 2017In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x, y \in\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We ... More
Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing InequalityJun 24 2015May 10 2016We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the $m$ machines receives $n$ data points ... More
The Fast Cauchy Transform and Faster Robust Linear RegressionJul 19 2012Apr 05 2014We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: for an $n\times d$ input matrix $A$ and vector $b\in\mathbb{R}^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_p$ to the same problem ... More
Nearly-optimal bounds for sparse recovery in generic norms, with applications to $k$-median sketchingApr 05 2015We initiate the study of trade-offs between sparsity and the number of measurements in sparse recovery schemes for generic norms. Specifically, for a norm $\|\cdot\|$, sparsity parameter $k$, approximation factor $K>0$, and probability of failure $P>0$, ... More
The One-Way Communication Complexity of Dynamic Time Warping DistanceMar 08 2019We resolve the randomized one-way communication complexity of Dynamic Time Warping (DTW) distance. We show that there is an efficient one-way communication protocol using $\widetilde{O}(n/\alpha)$ bits for the problem of computing an $\alpha$-approximation ... More
Algorithms for $\ell_p$ Low Rank ApproximationMay 18 2017We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem. We obtain the first provably good approximation ... More
Fast approximation of matrix coherence and statistical leverageSep 18 2011Dec 05 2012The statistical leverage scores of a matrix $A$ are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix ... More
Steiner Transitive-Closure Spanners of d-Dimensional PosetsNov 28 2010Given a directed graph G and an integer k >= 1, a k-transitive-closure-spanner (k-TCspanner) of G is a directed graph H that has (1) the same transitive-closure as G and (2) diameter at most k. In some applications, the shortcut paths added to the graph ... More
Nearly Optimal Distinct Elements and Heavy Hitters on Sliding WindowsMay 01 2018Aug 03 2018We study the distinct elements and $\ell_p$-heavy hitters problems in the sliding window model, where only the most recent $n$ elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential ... More
A PTAS for $\ell_p$-Low Rank ApproximationJul 16 2018Nov 19 2018A number of recent works have studied algorithms for entrywise $\ell_p$-low rank approximation, namely, algorithms which given an $n \times d$ matrix $A$ (with $n \geq d$), output a rank-$k$ matrix $B$ minimizing $\|A-B\|_p^p=\sum_{i,j}|A_{i,j}-B_{i,j}|^p$ ... More
Sustainability and the Astrobiological Perspective: Framing Human Futures in a Planetary ContextOct 14 2013We explore how questions related to developing a sustainable human civilization can be cast in terms of astrobiology. In particular we show how ongoing astrobiological studies of the coupled relationship between life, planets and their co-evolution can ... More
An optical mechanism for aberration of starlightOct 18 2011We present a physical-optics based theory of the physical mechanism for aberration of starlight. We apply non-relativistic and relativistic theories for wavefront image formation and include the effects of optically transmitting media within the sensor. ... More
Aberration of starlight experimentJul 27 2013We propose an experiment using a conventional optical telescope to determine whether aberration of starlight results from special relativistic effects external to a measurement sensor or from optical effects within a sensor. The proposed measurements ... More
Critical points of master functions and mKdV hierarchy of type $A^{(2)}_{2n}$Feb 20 2017We consider the population of critical points generated from the critical point of the master function with no variables, which is associated with the trivial representation of the twisted affine Lie algebra $A^{(2)}_{2n}$. The population is naturally ... More
Michelson interferometer null may confirm transverse Doppler EffectDec 17 2013Jun 09 2014We analyze fringe formation within Michelson-like experiments as viewed by relativistic inertial observers. Our analysis differs from previous work because we include optical misalignment of the beamsplitter of the interferometer due to the anamorphic ... More
Critical points of master functions and mKdV hierarchy of type $C^{(1)}_{n}$Nov 03 2018We consider the population of critical points, generated from the critical point of the master function with no variables, which is associated with the trivial representation of the twisted affine Lie algebra $C_n^{(1)}$. The population is naturally partitioned ... More
Putting Fairness Principles into Practice: Challenges, Metrics, and ImprovementsJan 14 2019As more researchers have become aware of and passionate about algorithmic fairness, there has been an explosion in papers laying out new metrics, suggesting algorithms to address issues, and calling attention to issues in existing applications of machine ... More
Automated proton track identification in MicroBooNE using gradient boosted decision treesOct 02 2017MicroBooNE is a liquid argon time projection chamber (LArTPC) neutrino experiment that is currently running in the Booster Neutrino Beam at Fermilab. LArTPC technology allows for high-resolution, three-dimensional representations of neutrino interactions. ... More
Detecting User Engagement in Everyday ConversationsOct 13 2004This paper presents a novel application of speech emotion recognition: estimation of the level of conversational engagement between users of a voice communication system. We begin by using machine learning techniques, such as the support vector machine ... More
Tap Tips: Lightweight Discovery of Touchscreen TargetsJan 26 2001We describe tap tips, a technique for providing touch-screen target location hints. Tap tips are lightweight in that they are non-modal, appear only when needed, require a minimal number of user gestures, and do not add to the standard touchscreen gesture ... More
Sabbath Day Home Automation: "It's Like Mixing Technology and Religion"Apr 27 2007We present a qualitative study of 20 American Orthodox Jewish families' use of home automation for religious purposes. These lead users offer insight into real-life, long-term experience with home automation technologies. We discuss how automation was ... More
Exploring nucleon spin structure through neutrino neutral-current interactions in MicroBooNEFeb 02 2017The net contribution of the strange quark spins to the proton spin, $\Delta s$, can be determined from neutral current elastic neutrino-proton interactions at low momentum transfer combined with data from electron-proton scattering. The probability of ... More