total 38000took 0.14s

Sketching as a Tool for Numerical Linear AlgebraNov 17 2014Feb 10 2015This survey highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by multiplying it by a (usually) random ... More

(1+eps)-approximate Sparse RecoveryOct 19 2011Dec 26 2011The problem central to sparse recovery and compressive sensing is that of stable sparse recovery: we want a distribution of matrices A in R^{m\times n} such that, for any x \in R^n and with probability at least 2/3 over A, there is an algorithm to recover ... More

Distributed Statistical Estimation of Matrix Products with ApplicationsJul 02 2018We consider statistical estimations of a matrix product over the integers in a distributed setting, where we have two parties Alice and Bob; Alice holds a matrix $A$ and Bob holds a matrix $B$, and they want to estimate statistics of $A \cdot B$. We focus ... More

Separating k-Player from t-Player One-Way Communication, with Applications to Data StreamsMay 17 2019In a $k$-party communication problem, the $k$ players with inputs $x_1, x_2, \ldots, x_k$, respectively, want to evaluate a function $f(x_1, x_2, \ldots, x_k)$ using as little communication as possible. We consider the message-passing model, in which ... More

Low Rank Approximation with Entrywise $\ell_1$-Norm ErrorNov 03 2016We study the $\ell_1$-low rank approximation problem, where for a given $n \times d$ matrix $A$ and approximation factor $\alpha \geq 1$, the goal is to output a rank-$k$ matrix $\widehat{A}$ for which $$\|A-\widehat{A}\|_1 \leq \alpha \cdot \min_{\textrm{rank-}k\textrm{ ... More

Tight Kernel Query Complexity of Kernel Ridge Regression and Kernel $k$-means ClusteringMay 15 2019We present tight lower bounds on the number of kernel evaluations required to approximately solve kernel ridge regression (KRR) and kernel $k$-means clustering (KKMC) on $n$ input points. For KRR, our bound for relative error approximation to the minimizer ... More

Faster Algorithms for High-Dimensional Robust Covariance EstimationJun 11 2019We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal ... More

Dimensionality Reduction for Tukey RegressionMay 14 2019We give the first dimensionality reduction methods for the overconstrained Tukey regression problem. The Tukey loss function $\|y\|_M = \sum_i M(y_i)$ has $M(y_i) \approx |y_i|^p$ for residual errors $y_i$ smaller than a prescribed threshold $\tau$, but ... More

Conditional Sparse $\ell_p$-norm Regression With Optimal ProbabilityJun 26 2018We consider the following conditional linear regression problem: the task is to identify both (i) a $k$-DNF condition $c$ and (ii) a linear rule $f$ such that the probability of $c$ is (approximately) at least some given bound $\mu$, and $f$ minimizes ... More

Sample-Optimal Low-Rank Approximation of Distance MatricesJun 02 2019A distance matrix $A \in \mathbb R^{n \times m}$ represents all pairwise distances, $A_{ij}=\mathrm{d}(x_i,y_j)$, between two point sets $x_1,...,x_n$ and $y_1,...,y_m$ in an arbitrary metric space $(\mathcal Z, \mathrm{d})$. Such matrices arise in various ... More

Communication Efficient Distributed Kernel Principal Component AnalysisMar 23 2015Feb 13 2016Kernel Principal Component Analysis (KPCA) is a key machine learning algorithm for extracting nonlinear features from data. In the presence of a large volume of high dimensional data collected in a distributed fashion, it becomes very costly to communicate ... More

New Algorithms for Heavy Hitters in Data StreamsMar 05 2016An old and fundamental problem in databases and data streams is that of finding the heavy hitters, also known as the top-$k$, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem, which quantify ... More

Lower Bounds for Adaptive Sparse RecoveryMay 15 2012Oct 21 2012We give lower bounds for the problem of stable sparse recovery from /adaptive/ linear measurements. In this problem, one would like to estimate a vector $x \in \R^n$ from $m$ linear measurements $A_1x,..., A_mx$. One may choose each vector $A_i$ based ... More

Optimal CUR Matrix DecompositionsMay 30 2014Jul 16 2014The CUR decomposition of an $m \times n$ matrix $A$ finds an $m \times c$ matrix $C$ with a subset of $c < n$ columns of $A,$ together with an $r \times n$ matrix $R$ with a subset of $r < m$ rows of $A,$ as well as a $c \times r$ low-rank matrix $U$ ... More

Subspace Embeddings and $\ell_p$-Regression Using Exponential Random VariablesMay 23 2013Mar 17 2014Oblivious low-distortion subspace embeddings are a crucial building block for numerical linear algebra problems. We show for any real $p, 1 \leq p < \infty$, given a matrix $M \in \mathbb{R}^{n \times d}$ with $n \gg d$, with constant probability we can ... More

When Distributed Computation is Communication ExpensiveApr 16 2013Jul 26 2013We consider a number of fundamental statistical and graph problems in the message-passing model, where we have $k$ machines (sites), each holding a piece of data, and the machines want to jointly solve a problem defined on the union of the $k$ data sets. ... More

On Approximating Functions of the Singular Values in a StreamApr 29 2016For any real number $p > 0$, we nearly completely characterize the space complexity of estimating $\|A\|_p^p = \sum_{i=1}^n \sigma_i^p$ for $n \times n$ matrices $A$ in which each row and each column has $O(1)$ non-zero entries and whose entries are presented ... More

Distributed Low Rank Approximation of Implicit Functions of a MatrixJan 28 2016We study distributed low rank approximation in which the matrix to be approximated is only implicitly represented across the different servers. For example, each of $s$ servers may have an $n \times d$ matrix $A^t$, and we may be interested in computing ... More

Strong Coresets for k-Median and Subspace Approximation: Goodbye DimensionSep 09 2018We obtain the first strong coresets for the $k$-median and subspace approximation problems with sum of distances objective function, on $n$ points in $d$ dimensions, with a number of weighted points that is independent of both $n$ and $d$; namely, our ... More

How Robust are Linear Sketches to Adaptive Inputs?Nov 05 2012Linear sketches are powerful algorithmic tools that turn an n-dimensional input into a concise lower-dimensional representation via a linear transformation. Such sketches have seen a wide range of applications including norm estimation over data streams, ... More

A Near-Optimal Algorithm for L1-DifferenceApr 13 2009We give the first L_1-sketching algorithm for integer vectors which produces nearly optimal sized sketches in nearly linear time. This answers the first open problem in the list of open problems from the 2006 IITK Workshop on Algorithms for Data Streams. ... More

Tight Bounds for Distributed Functional MonitoringDec 21 2011Jun 12 2013We resolve several fundamental questions in the area of distributed functional monitoring, initiated by Cormode, Muthukrishnan, and Yi (SODA, 2008). In this model there are $k$ sites each tracking their input and communicating with a central coordinator ... More

Tight Bounds for $\ell_p$ Oblivious Subspace EmbeddingsJan 13 2018Apr 06 2018An $\ell_p$ oblivious subspace embedding is a distribution over $r \times n$ matrices $\Pi$ such that for any fixed $n \times d$ matrix $A$, $$\Pr_{\Pi}[\textrm{for all }x, \ \|Ax\|_p \leq \|\Pi Ax\|_p \leq \kappa \|Ax\|_p] \geq 9/10,$$ where $r$ is the ... More

Optimal Random Sampling from Distributed Streams RevisitedMar 28 2019We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator ... More

Tight Bounds for $\ell_p$ Oblivious Subspace EmbeddingsJan 13 2018An $\ell_p$ oblivious subspace embedding is a distribution over $r \times n$ matrices $\Pi$ such that for any fixed $n \times d$ matrix $A$, $$\Pr_{\Pi}[\textrm{for all }x, \ \|Ax\|_p \leq \|\Pi Ax\|_p \leq \kappa \|Ax\|_p] \geq 9/10,$$ where $r$ is the ... More

The Round Complexity of Small Set IntersectionApr 05 2013Apr 09 2013The set disjointness problem is one of the most fundamental and well-studied problems in communication complexity. In this problem Alice and Bob hold sets $S, T \subseteq [n]$, respectively, and the goal is to decide if $S \cap T = \emptyset$. Reductions ... More

Input Sparsity and Hardness for Robust Subspace ApproximationOct 20 2015In the subspace approximation problem, we seek a k-dimensional subspace F of R^d that minimizes the sum of p-th powers of Euclidean distances to a given set of n points a_1, ..., a_n in R^d, for p >= 1. More generally than minimizing sum_i dist(a_i,F)^p,we ... More

Learning Two Layer Rectified Neural Networks in Polynomial TimeNov 05 2018Consider the following fundamental learning problem: given input examples $x \in \mathbb{R}^d$ and their vector-valued labels, as defined by an underlying generative neural network, recover the weight matrices of this network. We consider two-layer networks, ... More

Leveraging Well-Conditioned Bases: Streaming \& Distributed Summaries in Minkowski $p$-NormsJul 06 2018Work on approximate linear algebra has led to efficient distributed and streaming algorithms for problems such as approximate matrix multiplication, low rank approximation, and regression, primarily for the Euclidean norm $\ell_2$. We study other $\ell_p$ ... More

Approximation Algorithms for $\ell_0$-Low Rank ApproximationOct 30 2017Oct 01 2018We study the $\ell_0$-Low Rank Approximation Problem, where the goal is, given an $m \times n$ matrix $A$, to output a rank-$k$ matrix $A'$ for which $\|A'-A\|_0$ is minimized. Here, for a matrix $B$, $\|B\|_0$ denotes the number of its non-zero entries. ... More

Low Rank Approximation and Regression in Input Sparsity TimeJul 26 2012Apr 05 2013We design a new distribution over $\poly(r \eps^{-1}) \times n$ matrices $S$ so that for any fixed $n \times d$ matrix $A$ of rank $r$, with probability at least 9/10, $\norm{SAx}_2 = (1 \pm \eps)\norm{Ax}_2$ simultaneously for all $x \in \mathbb{R}^d$. ... More

Applications of Uniform Sampling: Densest Subgraph and BeyondJun 15 2015Jul 29 2015Recently [Bhattacharya et al., STOC 2015] provide the first non-trivial algorithm for the densest subgraph problem in the streaming model with additions and deletions to its edges, i.e., for dynamic graph streams. They present a $(0.5-\epsilon)$-approximation ... More

An Optimal Algorithm for l1-Heavy Hitters in Insertion Streams and Related ProblemsMar 01 2016We give the first optimal bounds for returning the $\ell_1$-heavy hitters in a data stream of insertions, together with their approximate frequencies, closing a long line of work on this problem. For a stream of $m$ items in $\{1, 2, \dots, n\}$ and parameters ... More

Towards a Zero-One Law for Entrywise Low Rank ApproximationNov 04 2018There are a number of approximation algorithms for NP-hard versions of low rank approximation, such as finding a rank-$k$ matrix $B$ minimizing the sum of absolute values of differences to a given matrix $A$, $\min_{\textrm{rank-}k~B}\|A-B\|_1$, or more ... More

Optimal Principal Component Analysis in Distributed and Streaming ModelsApr 25 2015Jul 12 2016We study the Principal Component Analysis (PCA) problem in the distributed and streaming models of computation. Given a matrix $A \in R^{m \times n},$ a rank parameter $k < rank(A)$, and an accuracy parameter $0 < \epsilon < 1$, we want to output an $m ... More

Weighted Maximum Independent Set of Geometric Objects in Turnstile StreamsFeb 27 2019We study the Maximum Independent Set problem for geometric objects given in the data stream model. A set of geometric objects is said to be independent if the objects are pairwise disjoint. We consider geometric objects in one and two dimensions, i.e., ... More

On Deterministic Sketching and Streaming for Sparse Recovery and Norm EstimationJun 25 2012We study classic streaming and sparse recovery problems using deterministic linear sketches, including l1/l1 and linf/l1 sparse recovery problems (the latter also being known as l1-heavy hitters), norm estimation, and approximate inner product. We focus ... More

On the Power of Adaptivity in Sparse RecoveryOct 17 2011The goal of (stable) sparse recovery is to recover a $k$-sparse approximation $x*$ of a vector $x$ from linear measurements of $x$. Specifically, the goal is to recover $x*$ such that ||x-x*||_p <= C min_{k-sparse x'} ||x-x'||_q for some constant $C$ ... More

The Sketching Complexity of Graph CutsMar 27 2014Nov 10 2014We study the problem of sketching an input graph, so that given the sketch, one can estimate the weight of any cut in the graph within factor $1+\epsilon$. We present lower and upper bounds on the size of a randomized sketch, focusing on the dependence ... More

Near Optimal Sketching of Low-Rank Tensor RegressionSep 20 2017We study the least squares regression problem \begin{align*} \min_{\Theta \in \mathcal{S}_{\odot D,R}} \|A\Theta-b\|_2, \end{align*} where $\mathcal{S}_{\odot D,R}$ is the set of $\Theta$ for which $\Theta = \sum_{r=1}^{R} \theta_1^{(r)} \circ \cdots ... More

On Sketching the $q$ to $p$ normsJun 17 2018We initiate the study of data dimensionality reduction, or sketching, for the $q\to p$ norms. Given an $n \times d$ matrix $A$, the $q\to p$ norm, denoted $\|A\|_{q \to p} = \sup_{x \in \mathbb{R}^d \backslash \vec{0}} \frac{\|Ax\|_p}{\|x\|_q}$, is a ... More

Low-Rank Approximation from Communication ComplexityApr 22 2019In low-rank approximation with missing entries, given $A\in \mathbb{R}^{n\times n}$ and binary $W \in \{0,1\}^{n\times n}$, the goal is to find a rank-$k$ matrix $L$ for which: $$cost(L)=\sum_{i=1}^{n} \sum_{j=1}^{n}W_{i,j}\cdot (A_{i,j} - L_{i,j})^2\le ... More

Tight Bounds for the Subspace Sketch Problem with ApplicationsApr 11 2019In the subspace sketch problem one is given an $n\times d$ matrix $A$ with $O(\log(nd))$ bit entries, and would like to compress it in an arbitrary way to build a small space data structure $Q_p$, so that for any given $x \in \mathbb{R}^d$, with probability ... More

Principal Component Analysis and Higher Correlations for Distributed DataApr 10 2013Jun 29 2014We consider algorithmic problems in the setting in which the input data has been partitioned arbitrarily on many servers. The goal is to compute a function of all the data, and the bottleneck is the communication used by the algorithm. We present algorithms ... More

Fast Regression with an $\ell_\infty$ GuaranteeMay 30 2017Sketching has emerged as a powerful technique for speeding up problems in numerical linear algebra, such as regression. In the overconstrained regression problem, one is given an $n \times d$ matrix $A$, with $n \gg d$, as well as an $n \times 1$ vector ... More

Relative Error Tensor Low Rank ApproximationApr 26 2017Mar 29 2018We consider relative error low rank approximation of $tensors$ with respect to the Frobenius norm: given an order-$q$ tensor $A \in \mathbb{R}^{\prod_{i=1}^q n_i}$, output a rank-$k$ tensor $B$ for which $\|A-B\|_F^2 \leq (1+\epsilon)$OPT, where OPT $= ... More

Sublinear Optimization for Machine LearningOct 21 2010We give sublinear-time approximation algorithms for some optimization problems arising in machine learning, such as training linear classifiers and finding minimum enclosing balls. Our algorithms can be extended to some kernelized versions of these problems, ... More

Improved Distributed Principal Component AnalysisAug 25 2014Dec 23 2014We study the distributed computing setting in which there are multiple servers, each holding a set of points, who wish to compute functions on the union of their point sets. A key task in this setting is Principal Component Analysis (PCA), in which the ... More

Fast Moment Estimation in Data Streams in Optimal SpaceJul 23 2010We give a space-optimal algorithm with update time O(log^2(1/eps)loglog(1/eps)) for (1+eps)-approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream. This provides a nearly exponential improvement in the update ... More

Faster Kernel Ridge Regression Using Sketching and PreconditioningNov 10 2016Random feature maps, such as random Fourier features, have recently emerged as a powerful technique for speeding up and scaling the training of kernel-based methods such as kernel ridge regression. However, random feature maps only provide crude approximations ... More

Improved Algorithms for Adaptive Compressed SensingApr 25 2018In the problem of adaptive compressed sensing, one wants to estimate an approximately $k$-sparse vector $x\in\mathbb{R}^n$ from $m$ linear measurements $A_1 x, A_2 x,\ldots, A_m x$, where $A_i$ can be chosen based on the outcomes $A_1 x,\ldots, A_{i-1} ... More

Beating CountSketch for Heavy Hitters in Insertion StreamsNov 02 2015Given a stream $p_1, \ldots, p_m$ of items from a universe $\mathcal{U}$, which, without loss of generality we identify with the set of integers $\{1, 2, \ldots, n\}$, we consider the problem of returning all $\ell_2$-heavy hitters, i.e., those items ... More

Revisiting Frequency Moment Estimation in Random Order StreamsMar 06 2018We revisit one of the classic problems in the data stream literature, namely, that of estimating the frequency moments $F_p$ for $0 < p < 2$ of an underlying $n$-dimensional vector presented as a sequence of additive updates in a stream. It is well-known ... More

Optimal approximate matrix product in terms of stable rankJul 08 2015Mar 02 2016We prove, using the subspace embedding guarantee in a black box way, that one can achieve the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map having $m = O(\tilde{r}/\varepsilon^2)$ rows. Here $\tilde{r}$ ... More

On Coresets for Logistic RegressionMay 22 2018Sep 13 2018Coresets are one of the central methods to facilitate the analysis of large data sets. We continue a recent line of research applying the theory of coresets to logistic regression. First, we show a negative result, namely, that no strongly sublinear sized ... More

A Sketching Algorithm for Spectral Graph SparsificationDec 28 2014We study the problem of compressing a weighted graph $G$ on $n$ vertices, building a "sketch" $H$ of $G$, so that given any vector $x \in \mathbb{R}^n$, the value $x^T L_G x$ can be approximated up to a multiplicative $1+\epsilon$ factor from only $H$ ... More

How to Fake Multiply by a Gaussian MatrixJun 18 2016Have you ever wanted to multiply an $n \times d$ matrix $X$, with $n \gg d$, on the left by an $m \times n$ matrix $\tilde G$ of i.i.d. Gaussian random variables, but could not afford to do it because it was too slow? In this work we propose a new randomized ... More

Frequent Directions : Simple and Deterministic Matrix SketchingJan 08 2015Apr 21 2015We describe a new algorithm called Frequent Directions for deterministic matrix sketching in the row-updates model. The algorithm is presented an arbitrary input matrix $A \in R^{n \times d}$ one row at a time. It performed $O(d \times \ell)$ operations ... More

Communication-Optimal Distributed ClusteringFeb 01 2017Clustering large datasets is a fundamental problem with a number of applications in machine learning. Data is often collected on different sites and clustering needs to be performed in a distributed manner with low communication. We would like the quality ... More

Sketching for Kronecker Product Regression and P-splinesDec 27 2017TensorSketch is an oblivious linear sketch introduced in Pagh'13 and later used in Pham, Pagh'13 in the context of SVMs for polynomial kernels. It was shown in Avron, Nguyen, Woodruff'14 that TensorSketch provides a subspace embedding, and therefore can ... More

Weighted Reservoir Sampling from Distributed StreamsApr 08 2019We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights are equal, ... More

The Communication Complexity of OptimizationJun 13 2019We consider the communication complexity of a number of distributed optimization problems. We start with the problem of solving a linear system. Suppose there is a coordinator together with $s$ servers $P_1, \ldots, P_s$, the $i$-th of which holds a subset ... More

Querying a Matrix through Matrix-Vector ProductsJun 13 2019We consider algorithms with access to an unknown matrix $M\in\mathbb{F}^{n \times d}$ via matrix-vector products, namely, the algorithm chooses vectors $\mathbf{v}^1, \ldots, \mathbf{v}^q$, and observes $M\mathbf{v}^1,\ldots, M\mathbf{v}^q$. Here the ... More

Lower Bounds for Sparse RecoveryJun 02 2011Jun 03 2011We consider the following k-sparse recovery problem: design an m x n matrix A, such that for any signal x, given Ax we can efficiently recover x' satisfying ||x-x'||_1 <= C min_{k-sparse} x"} ||x-x"||_1. It is known that there exist matrices A with this ... More

Revisiting Norm Estimation in Data StreamsNov 21 2008Apr 09 2009The problem of estimating the pth moment F_p (p nonnegative and real) in data streams is as follows. There is a vector x which starts at 0, and many updates of the form x_i <-- x_i + v come sequentially in a stream. The algorithm also receives an error ... More

Faster Kernel Ridge Regression Using Sketching and PreconditioningNov 10 2016Nov 26 2016Kernel Ridge Regression (KRR) is a simple yet powerful technique for non-parametric regression whose computation amounts to solving a linear system. This system is usually dense and highly ill-conditioned. In addition, the dimensions of the matrix are ... More

Sharper Bounds for Regression and Low-Rank Approximation with RegularizationNov 10 2016The technique of matrix sketching, such as the use of random projections, has been shown in recent years to be a powerful tool for accelerating many important statistical learning techniques. Research has so far focused largely on using sketching for ... More

On The Communication Complexity of Linear Algebraic Problems in the Message Passing ModelJul 17 2014We study the communication complexity of linear algebraic problems over finite fields in the multi-player message passing model, proving a number of tight lower bounds. Specifically, for a matrix which is distributed among a number of players, we consider ... More

Testing Matrix Rank, OptimallyOct 18 2018We show that for the problem of testing if a matrix $A \in F^{n \times n}$ has rank at most $d$, or requires changing an $\epsilon$-fraction of entries to have rank at most $d$, there is a non-adaptive query algorithm making $\widetilde{O}(d^2/\epsilon)$ ... More

Cell2Fire: A Cell Based Forest Fire Growth ModelMay 22 2019Cell2Fire is a new cell-based forest and wildland landscape fire growth simulator that is open-source and exploits parallelism to support the modelling of fire growth cross large spatial and temporal scales in a timely manner. The fire environment is ... More

EMFS: Repurposing SMTP and IMAP for Data Storage and SynchronizationJan 29 2016Cloud storage has become a massive and lucrative business, with companies like Apple, Microsoft, Google, and Dropbox providing hundreds of millions of clients with synchronized and redundant storage. These services often command price-to-storage ratios ... More

Robust Communication-Optimal Distributed Clustering AlgorithmsMar 02 2017Mar 06 2019In this work, we study the $k$-median and $k$-means clustering problems when the data is distributed across many servers and can contain outliers. While there has been a lot of work on these problems for worst-case instances, we focus on gaining a finer ... More

On Sketching Quadratic FormsNov 19 2015We undertake a systematic study of sketching a quadratic form: given an $n \times n$ matrix $A$, create a succinct sketch $\textbf{sk}(A)$ which can produce (without further access to $A$) a multiplicative $(1+\epsilon)$-approximation to $x^T A x$ for ... More

Streaming Space Complexity of Nearly All Functions of One Variable on Frequency VectorsJan 27 2016A central problem in the theory of algorithms for data streams is to determine which functions on a stream can be approximated in sublinear, and especially sub-polynomial or poly-logarithmic, space. Given a function $g$, we study the space complexity ... More

BPTree: an $\ell_2$ heavy hitters algorithm using constant memoryMar 02 2016Mar 08 2016The task of finding heavy hitters is one of the best known and well studied problems in the area of data streams. In sub-polynomial space, the strongest guarantee available is the $\ell_2$ guarantee, which requires finding all items that occur at least ... More

Transitive-Closure SpannersAug 13 2008Given a directed graph G = (V,E) and an integer k>=1, a k-transitive-closure-spanner (k-TC-spanner) of G is a directed graph H = (V, E_H) that has (1) the same transitive-closure as G and (2) diameter at most k. These spanners were implicitly studied ... More

Studying Neutral Current Elastic Scattering and the Strange Axial Form Factor in MicroBooNEJan 13 2019One of the least constrained contributions to the neutral current (NC) elastic neutrino-proton cross section is the strange axial form factor, which represents the strange quark spin contribution to the spin structure of the proton. This becomes the net ... More

Spectrum Approximation Beyond Fast Matrix Multiplication: Algorithms and HardnessApr 13 2017Jan 03 2019Understanding the singular value spectrum of a matrix $A \in \mathbb{R}^{n \times n}$ is a fundamental task in countless applications. In matrix multiplication time, it is possible to perform a full SVD and directly compute the singular values $\sigma_1,...,\sigma_n$. ... More

Optimal lower bounds for universal relation, and for samplers and finding duplicates in streamsApr 03 2017In the communication problem $\mathbf{UR}$ (universal relation) [KRW95], Alice and Bob respectively receive $x, y \in\{0,1\}^n$ with the promise that $x\neq y$. The last player to receive a message must output an index $i$ such that $x_i\neq y_i$. We ... More

Communication Lower Bounds for Statistical Estimation Problems via a Distributed Data Processing InequalityJun 24 2015May 10 2016We study the tradeoff between the statistical error and communication cost of distributed statistical estimation problems in high dimensions. In the distributed sparse Gaussian mean estimation problem, each of the $m$ machines receives $n$ data points ... More

The Fast Cauchy Transform and Faster Robust Linear RegressionJul 19 2012Apr 05 2014We provide fast algorithms for overconstrained $\ell_p$ regression and related problems: for an $n\times d$ input matrix $A$ and vector $b\in\mathbb{R}^n$, in $O(nd\log n)$ time we reduce the problem $\min_{x\in\mathbb{R}^d} \|Ax-b\|_p$ to the same problem ... More

Nearly-optimal bounds for sparse recovery in generic norms, with applications to $k$-median sketchingApr 05 2015We initiate the study of trade-offs between sparsity and the number of measurements in sparse recovery schemes for generic norms. Specifically, for a norm $\|\cdot\|$, sparsity parameter $k$, approximation factor $K>0$, and probability of failure $P>0$, ... More

The One-Way Communication Complexity of Dynamic Time Warping DistanceMar 08 2019We resolve the randomized one-way communication complexity of Dynamic Time Warping (DTW) distance. We show that there is an efficient one-way communication protocol using $\widetilde{O}(n/\alpha)$ bits for the problem of computing an $\alpha$-approximation ... More

Algorithms for $\ell_p$ Low Rank ApproximationMay 18 2017We consider the problem of approximating a given matrix by a low-rank matrix so as to minimize the entrywise $\ell_p$-approximation error, for any $p \geq 1$; the case $p = 2$ is the classical SVD problem. We obtain the first provably good approximation ... More

Fast approximation of matrix coherence and statistical leverageSep 18 2011Dec 05 2012The statistical leverage scores of a matrix $A$ are the squared row-norms of the matrix containing its (top) left singular vectors and the coherence is the largest leverage score. These quantities are of interest in recently-popular problems such as matrix ... More

Steiner Transitive-Closure Spanners of d-Dimensional PosetsNov 28 2010Given a directed graph G and an integer k >= 1, a k-transitive-closure-spanner (k-TCspanner) of G is a directed graph H that has (1) the same transitive-closure as G and (2) diameter at most k. In some applications, the shortcut paths added to the graph ... More

Nearly Optimal Distinct Elements and Heavy Hitters on Sliding WindowsMay 01 2018Aug 03 2018We study the distinct elements and $\ell_p$-heavy hitters problems in the sliding window model, where only the most recent $n$ elements in the data stream form the underlying set. We first introduce the composable histogram, a simple twist on the exponential ... More

A PTAS for $\ell_p$-Low Rank ApproximationJul 16 2018Nov 19 2018A number of recent works have studied algorithms for entrywise $\ell_p$-low rank approximation, namely, algorithms which given an $n \times d$ matrix $A$ (with $n \geq d$), output a rank-$k$ matrix $B$ minimizing $\|A-B\|_p^p=\sum_{i,j}|A_{i,j}-B_{i,j}|^p$ ... More

Sustainability and the Astrobiological Perspective: Framing Human Futures in a Planetary ContextOct 14 2013We explore how questions related to developing a sustainable human civilization can be cast in terms of astrobiology. In particular we show how ongoing astrobiological studies of the coupled relationship between life, planets and their co-evolution can ... More

An optical mechanism for aberration of starlightOct 18 2011We present a physical-optics based theory of the physical mechanism for aberration of starlight. We apply non-relativistic and relativistic theories for wavefront image formation and include the effects of optically transmitting media within the sensor. ... More

Aberration of starlight experimentJul 27 2013We propose an experiment using a conventional optical telescope to determine whether aberration of starlight results from special relativistic effects external to a measurement sensor or from optical effects within a sensor. The proposed measurements ... More

Critical points of master functions and mKdV hierarchy of type $A^{(2)}_{2n}$Feb 20 2017We consider the population of critical points generated from the critical point of the master function with no variables, which is associated with the trivial representation of the twisted affine Lie algebra $A^{(2)}_{2n}$. The population is naturally ... More

Michelson interferometer null may confirm transverse Doppler EffectDec 17 2013Jun 09 2014We analyze fringe formation within Michelson-like experiments as viewed by relativistic inertial observers. Our analysis differs from previous work because we include optical misalignment of the beamsplitter of the interferometer due to the anamorphic ... More

Critical points of master functions and mKdV hierarchy of type $C^{(1)}_{n}$Nov 03 2018We consider the population of critical points, generated from the critical point of the master function with no variables, which is associated with the trivial representation of the twisted affine Lie algebra $C_n^{(1)}$. The population is naturally partitioned ... More

Putting Fairness Principles into Practice: Challenges, Metrics, and ImprovementsJan 14 2019As more researchers have become aware of and passionate about algorithmic fairness, there has been an explosion in papers laying out new metrics, suggesting algorithms to address issues, and calling attention to issues in existing applications of machine ... More

Automated proton track identification in MicroBooNE using gradient boosted decision treesOct 02 2017MicroBooNE is a liquid argon time projection chamber (LArTPC) neutrino experiment that is currently running in the Booster Neutrino Beam at Fermilab. LArTPC technology allows for high-resolution, three-dimensional representations of neutrino interactions. ... More

Detecting User Engagement in Everyday ConversationsOct 13 2004This paper presents a novel application of speech emotion recognition: estimation of the level of conversational engagement between users of a voice communication system. We begin by using machine learning techniques, such as the support vector machine ... More

Tap Tips: Lightweight Discovery of Touchscreen TargetsJan 26 2001We describe tap tips, a technique for providing touch-screen target location hints. Tap tips are lightweight in that they are non-modal, appear only when needed, require a minimal number of user gestures, and do not add to the standard touchscreen gesture ... More

Sabbath Day Home Automation: "It's Like Mixing Technology and Religion"Apr 27 2007We present a qualitative study of 20 American Orthodox Jewish families' use of home automation for religious purposes. These lead users offer insight into real-life, long-term experience with home automation technologies. We discuss how automation was ... More

Exploring nucleon spin structure through neutrino neutral-current interactions in MicroBooNEFeb 02 2017The net contribution of the strange quark spins to the proton spin, $\Delta s$, can be determined from neutral current elastic neutrino-proton interactions at low momentum transfer combined with data from electron-proton scattering. The probability of ... More