total 212604took 0.14s

Sharp thresholds for high-dimensional and noisy recovery of sparsityMay 30 2006The problem of consistently estimating the sparsity pattern of a vector $\betastar \in \real^\mdim$ based on observations contaminated by noise arises in various contexts, including subset selection in regression, structure estimation in graphical models, ... More

Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learningMay 15 2019Motivated by the study of $Q$-learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasi-contractivity conditions with respect to an underlying cone. We ... More

Variance-reduced $Q$-learning is minimax optimalJun 11 2019We introduce and analyze a form of variance-reduced $Q$-learning. For $\gamma$-discounted MDPs with finite state space $\mathcal{X}$ and action space $\mathcal{U}$, we prove that it yields an $\epsilon$-accurate estimate of the optimal $Q$-function in ... More

Discussion: Latent variable graphical model selection via convex optimizationNov 05 2012Discussion of "Latent variable graphical model selection via convex optimization" by Venkat Chandrasekaran, Pablo A. Parrilo and Alan S. Willsky [arXiv:1008.1290].

Variance-reduced $Q$-learning is minimax optimalJun 11 2019Aug 08 2019We introduce and analyze a form of variance-reduced $Q$-learning. For $\gamma$-discounted MDPs with finite state space $\mathcal{X}$ and action space $\mathcal{U}$, we prove that it yields an $\epsilon$-accurate estimate of the optimal $Q$-function in ... More

Stochastic approximation with cone-contractive operators: Sharp $\ell_\infty$-bounds for $Q$-learningMay 15 2019Jun 24 2019Motivated by the study of $Q$-learning algorithms in reinforcement learning, we study a class of stochastic approximation procedures based on operators that satisfy monotonicity and quasi-contractivity conditions with respect to an underlying cone. We ... More

Inconsistent parameter estimation in Markov random fields: Benefits in the computation-limited settingFeb 27 2006Consider the problem of joint parameter estimation and prediction in a Markov random field: i.e., the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, ... More

Information-theoretic limits on sparsity recovery in the high-dimensional and noisy settingFeb 11 2007Feb 20 2007The problem of recovering the sparsity pattern of a fixed but unknown vector $\beta^* \in \real^p based on a set of $n$ noisy observations arises in a variety of settings, including subset selection in regression, graphical model selection, signal denoising, ... More

Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guaranteesSep 10 2015Optimization problems with rank constraints arise in many applications, including matrix regression, structured PCA, matrix completion and matrix decomposition problems. An attractive heuristic for solving such problems is to factorize the low-rank matrix, ... More

Estimation of (near) low-rank matrices with noise and high-dimensional scalingDec 27 2009High-dimensional inference refers to problems of statistical estimation in which the ambient dimension of the data may be comparable to or possibly even larger than the sample size. We study an instance of high-dimensional inference in which the goal ... More

Randomized Sketches of Convex Programs with Sharp GuaranteesApr 29 2014Random projection (RP) is a classical technique for reducing storage and computational costs. We analyze RP-based approximations of convex programs, in which the original optimization problem is approximated by the solution of a lower-dimensional problem. ... More

Network-based consensus averaging with general noisy channelsMay 04 2008This paper focuses on the consensus averaging problem on graphs under general noisy channels. We study a particular class of distributed consensus algorithms based on damped updates, and using the ordinary differential equation method, we prove that the ... More

Restricted strong convexity and weighted matrix completion: Optimal bounds with noiseSep 10 2010May 15 2011We consider the matrix completion problem under a form of row/column weighted entrywise sampling, including the case of uniform entrywise sampling as a special case. We analyze the associated random observation operator, and prove that with high probability, ... More

Low-density constructions can achieve the Wyner-Ziv and Gelfand-Pinsker boundsMay 21 2006We describe and analyze sparse graphical code constructions for the problems of source coding with decoder side information (the Wyner-Ziv problem), and channel coding with encoder side information (the Gelfand-Pinsker problem). Our approach relies on ... More

Low density codes achieve the rate-distortion boundJan 30 2006We propose a new construction for low-density source codes with multiple parameters that can be tuned to optimize the performance of the code. In addition, we introduce a set of analysis techniques for deriving upper bounds for the expected distortion ... More

Convergence guarantees for a class of non-convex and non-smooth optimization problemsApr 25 2018We consider the problem of finding critical points of functions that are non-convex and non-smooth. Studying a fairly broad class of such problems, we analyze the behavior of three gradient-based methods (gradient descent, proximal update, and Frank-Wolfe ... More

High-dimensional subset recovery in noise: Sparsified measurements without loss of statistical efficiencyMay 20 2008We consider the problem of estimating the support of a vector $\beta^* \in \mathbb{R}^{p}$ based on observations contaminated by noise. A significant body of work has studied behavior of $\ell_1$-relaxations when applied to measurement matrices drawn ... More

Universal Quantile Estimation with Feedback in the Communication-Constrained SettingJun 05 2007We consider the following problem of decentralized statistical inference: given i.i.d. samples from an unknown distribution, estimate an arbitrary quantile subject to limits on the number of bits exchanged. We analyze a standard fusion-based architecture, ... More

The local geometry of testing in ellipses: Tight control via localized Kolmogorov widthsDec 03 2017Jan 03 2018We study the local geometry of testing a mean vector within a high-dimensional ellipse against a compound alternative. Given samples of a Gaussian random vector, the goal is to distinguish whether the mean is equal to a known vector within an ellipse, ... More

Stochastic Belief Propagation: A Low-Complexity Alternative to the Sum-Product AlgorithmNov 04 2011May 25 2012The sum-product or belief propagation (BP) algorithm is a widely-used message-passing algorithm for computing marginal distributions in graphical models with discrete variables. At the core of the BP message updates, when applied to a graphical model ... More

Information-theoretic limits of selecting binary graphical models in high dimensionsMay 16 2009The problem of graphical model selection is to correctly estimate the graph structure of a Markov random field given samples from the underlying distribution. We analyze the information-theoretic limitations of the problem of graph selection for binary ... More

Low-density graph codes that are optimal for source/channel coding and binningApr 13 2007We describe and analyze the joint source/channel coding properties of a class of sparse graphical codes based on compounding a low-density generator matrix (LDGM) code with a low-density parity check (LDPC) code. Our first pair of theorems establish that ... More

Newton Sketch: A Linear-time Optimization Algorithm with Linear-Quadratic ConvergenceMay 09 2015We propose a randomized second-order method for optimization known as the Newton Sketch: it is based on performing an approximate Newton step using a randomly projected or sub-sampled Hessian. For self-concordant functions, we prove that the algorithm ... More

Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative GuaranteesDec 16 2012The sum-product or belief propagation (BP) algorithm is a widely used message-passing technique for computing approximate marginals in graphical models. We introduce a new technique, called stochastic orthogonal series message-passing (SOSMP), for computing ... More

Lossy source encoding via message-passing and decimation over generalized codewords of LDGM codesAug 15 2005We describe message-passing and decimation approaches for lossy source coding using low-density generator matrix (LDGM) codes. In particular, this paper addresses the problem of encoding a Bernoulli(0.5) source: for randomly generated LDGM codes with ... More

Analysis of LDGM and compound codes for lossy compression and binningFeb 13 2006Recent work has suggested that low-density generator matrix (LDGM) codes are likely to be effective for lossy source coding problems. We derive rigorous upper bounds on the effective rate-distortion function of LDGM codes for the binary symmetric source, ... More

Iterative Hessian sketch: Fast and accurate solution approximation for constrained least-squaresNov 03 2014We study randomized sketching methods for approximately solving least-squares problem with a general convex constraint. The quality of a least-squares approximation can be assessed in different ways: either in terms of the value of the quadratic objective ... More

High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexitySep 16 2011Sep 25 2012Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context ... More

Information-theoretic limits on sparse signal recovery: Dense versus sparse measurement matricesJun 03 2008We study the information-theoretic limits of exactly recovering the support of a sparse signal using noisy projections defined by various classes of measurement matrices. Our analysis is high-dimensional in nature, in which the number of observations ... More

Minimax-optimal rates for sparse additive models over kernel classes via convex programmingAug 21 2010Dec 18 2011Sparse additive models are families of $d$-variate functions that have the additive decomposition $f^* = \sum_{j \in S} f^*_j$, where $S$ is an unknown subset of cardinality $s \ll d$. In this paper, we consider the case where each univariate component ... More

Convexified Convolutional Neural NetworksSep 04 2016We describe the class of convexified convolutional neural networks (CCNNs), which capture the parameter sharing of convolutional neural networks in a convex manner. By representing the nonlinear convolutional filters as vectors in a reproducing kernel ... More

Guessing Facets: Polytope Structure and Improved LP DecoderAug 05 2006Aug 11 2006A new approach for decoding binary linear codes by solving a linear program (LP) over a relaxed codeword polytope was recently proposed by Feldman et al. In this paper we investigate the structure of the polytope used in the LP relaxation decoding. We ... More

Randomized sketches for kernels: Fast and optimal non-parametric regressionJan 25 2015Kernel ridge regression (KRR) is a standard method for performing non-parametric regression over reproducing kernel Hilbert spaces. Given $n$ samples, the time and space complexity of computing the KRR estimate scale as $\mathcal{O}(n^3)$ and $\mathcal{O}(n^2)$ ... More

From Gauss to Kolmogorov: Localized Measures of Complexity for EllipsesMar 21 2018The Gaussian width is a fundamental quantity in probability, statistics and geometry, known to underlie the intrinsic difficulty of estimation and hypothesis testing. In this work, we show how the Gaussian width, when localized to any given point of an ... More

Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensionsJul 18 2012We develop and analyze stochastic optimization algorithms for problems in which the expected loss is strongly convex, and the optimum is (approximately) sparse. Previous approaches are able to exploit only one of these two structures, yielding an $\order(\pdim/T)$ ... More

Distance-based and continuum Fano inequalities with applications to statistical estimationNov 12 2013Dec 31 2013In this technical note, we give two extensions of the classical Fano inequality in information theory. The first extends Fano's inequality to the setting of estimation, providing lower bounds on the probability that an estimator of a discrete quantity ... More

Statistical guarantees for the EM algorithm: From population to sample-based analysisAug 09 2014We develop a general framework for proving rigorous guarantees on the performance of the EM algorithm and a variant known as gradient EM. Our analysis is divided into two parts: a treatment of these algorithms at the population level (in the limit of ... More

Towards Optimal Estimation of Bivariate Isotonic Matrices with Unknown PermutationsJun 25 2018Many applications, including rank aggregation, crowd-labeling, and graphon estimation, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and columns. We consider the problem of estimating such a matrix ... More

Simple, Robust and Optimal Ranking from Pairwise ComparisonsDec 30 2015Apr 27 2016We consider data in the form of pairwise comparisons of n items, with the goal of precisely identifying the top k items for some value of k < n, or alternatively, recovering a ranking of all the items. We analyze the Copeland counting algorithm that ranks ... More

Minimax rates of estimation for high-dimensional linear regression over $\ell_q$-ballsOct 11 2009Consider the standard linear regression model $\y = \Xmat \betastar + w$, where $\y \in \real^\numobs$ is an observation vector, $\Xmat \in \real^{\numobs \times \pdim}$ is a design matrix, $\betastar \in \real^\pdim$ is the unknown regression vector, ... More

Statistical and Computational Guarantees for the Baum-Welch AlgorithmDec 27 2015The Hidden Markov Model (HMM) is one of the mainstays of statistical modeling of discrete time series, with applications including speech recognition, computational biology, computer vision and econometrics. Estimating an HMM from its observation process ... More

Sampled forms of functional PCA in reproducing kernel Hilbert spacesSep 15 2011Feb 13 2013We consider the sampling problem for functional PCA (fPCA), where the simplest example is the case of taking time samples of the underlying functional components. More generally, we model the sampling operation as a continuous linear map from $\mathcal{H}$ ... More

Early stopping for kernel boosting algorithms: A general analysis with localized complexitiesJul 05 2017Mar 13 2018Early stopping of iterative algorithms is a widely-used form of regularization in statistics, commonly used in conjunction with boosting and related gradient-type algorithms. Although consistency results have been established in some settings, such estimators ... More

Structure estimation for discrete graphical models: Generalized covariance matrices and their inversesDec 03 2012Jan 06 2014We investigate the relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator ... More

Breaking the $1/\sqrt{n}$ Barrier: Faster Rates for Permutation-based Models in Polynomial TimeFeb 27 2018Jun 05 2018Many applications, including rank aggregation and crowd-labeling, can be modeled in terms of a bivariate isotonic matrix with unknown permutations acting on its rows and columns. We consider the problem of estimating such a matrix based on noisy observations ... More

Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optimaMay 10 2013Jan 01 2015We provide novel theoretical results regarding local optima of regularized $M$-estimators, allowing for nonconvexity in both loss and penalty functions. Under restricted strong convexity on the loss and suitable regularity conditions on the penalty, we ... More

Approximation properties of certain operator-induced norms on Hilbert spacesMay 31 2011We consider a class of operator-induced norms, acting as finite-dimensional surrogates to the L2 norm, and study their approximation properties over Hilbert subspaces of L2 . The class includes, as a special case, the usual empirical norm encountered, ... More

The geometry of kernelized spectral clusteringApr 29 2014Apr 07 2015Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover ... More

Early stopping and non-parametric regression: An optimal data-dependent stopping ruleJun 15 2013The strategy of early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. Focusing on non-parametric regression in a reproducing kernel Hilbert space, we analyze the early stopping strategy for a form of ... More

Support recovery without incoherence: A case for nonconvex regularizationDec 17 2014We demonstrate that the primal-dual witness proof method may be used to establish variable selection consistency and $\ell_\infty$-bounds for sparse regression problems, even when the loss function and/or regularizer are nonconvex. Using this method, ... More

The geometry of hypothesis testing over convex cones: Generalized likelihood tests and minimax radiiMar 20 2017Mar 26 2018We consider a compound testing problem within the Gaussian sequence model in which the null and alternative are specified by a pair of closed, convex cones. Such cone testing problem arise in various applications, including detection of treatment effects, ... More

High-dimensional analysis of semidefinite relaxations for sparse principal componentsMar 27 2008Aug 26 2009Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the ``large $p$, small $n$'' setting, in ... More

A More Powerful Two-Sample Test in High Dimensions using Random ProjectionAug 11 2011Sep 13 2015We consider the hypothesis testing problem of detecting a shift between the means of two multivariate normal distributions in the high-dimensional setting, allowing for the data dimension p to exceed the sample size n. Specifically, we propose a new test ... More

Linear Regression with an Unknown Permutation: Statistical and Computational LimitsAug 09 2016Consider a noisy linear observation model with an unknown permutation, based on observing $y = \Pi^* A x^* + w$, where $x^* \in \mathbb{R}^d$ is an unknown vector, $\Pi^*$ is an unknown $n \times n$ permutation matrix, and $w \in \mathbb{R}^n$ is additive ... More

Fast MCMC sampling algorithms on polytopesOct 23 2017Mar 06 2019We propose and analyze two new MCMC sampling algorithms, the Vaidya walk and the John walk, for generating samples from the uniform distribution over a polytope. Both random walks are sampling algorithms derived from interior point methods. The former ... More

A framework for Multi-A(rmed)/B(andit) testing with online FDR controlJun 16 2017Nov 18 2017We propose an alternative framework to existing setups for controlling false alarms when multiple A/B tests are run over time. This setup arises in many practical applications, e.g. when pharmaceutical companies test new treatment options against control ... More

A Permutation-based Model for Crowd Labeling: Optimal Estimation and RobustnessJun 30 2016The aggregation and denoising of crowd labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a ... More

Learning to Explain: An Information-Theoretic Perspective on Model InterpretationFeb 21 2018Jun 14 2018We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize ... More

DAGGER: A sequential algorithm for FDR control on DAGsSep 29 2017Dec 04 2018We propose a linear-time, single-pass, top-down algorithm for multiple testing on directed acyclic graphs (DAGs), where nodes represent hypotheses and edges specify a partial ordering in which hypotheses must be tested. The procedure is guaranteed to ... More

Lower bounds on the performance of polynomial-time algorithms for sparse linear regressionFeb 09 2014May 21 2014Under a standard assumption in complexity theory (NP not in P/poly), we demonstrate a gap between the minimax prediction risk for sparse linear regression that can be achieved by polynomial-time algorithms, and that achieved by optimal algorithms. In ... More

Fast mixing of Metropolized Hamiltonian Monte Carlo: Benefits of multi-step gradientsMay 29 2019Hamiltonian Monte Carlo (HMC) is a state-of-the-art Markov chain Monte Carlo sampling algorithm for drawing samples from smooth probability densities over continuous spaces. We study the variant most widely used in practice, Metropolized HMC with the ... More

High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergenceNov 21 2008Given i.i.d. observations of a random vector $X \in \mathbb{R}^p$, we study the problem of estimating both its covariance matrix $\Sigma^*$, and its inverse covariance or concentration matrix {$\Theta^* = (\Sigma^*)^{-1}$.} We estimate $\Theta^*$ by minimizing ... More

A New Look at Survey Propagation and its GeneralizationsSep 08 2004Oct 31 2005This paper provides a new conceptual perspective on survey propagation, which is an iterative algorithm recently introduced by the statistical physics community that is very effective in solving random k-SAT problems even with densities close to the satisfiability ... More

Geographic Gossip: Efficient Averaging for Sensor NetworksSep 25 2007Gossip algorithms for distributed computation are attractive due to their simplicity, distributed nature, and robustness in noisy and uncertain environments. However, using standard gossip algorithms can lead to a significant waste in energy by repeatedly ... More

Estimating divergence functionals and the likelihood ratio by convex risk minimizationSep 04 2008Apr 22 2009We develop and analyze $M$-estimation methods for divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a non-asymptotic variational characterization of $f$-divergences, which allows the problem of estimating ... More

Active Ranking from Pairwise Comparisons and when Parametric Assumptions Don't HelpJun 28 2016Sep 23 2016We consider sequential or active ranking of a set of n items based on noisy pairwise comparisons. Items are ranked according to the probability that a given item beats a randomly chosen item, and ranking refers to partitioning the items into sets of pre-specified ... More

Stochastically Transitive Models for Pairwise Comparisons: Statistical and Computational IssuesOct 19 2015Sep 28 2016There are various parametric models for analyzing pairwise comparison data, including the Bradley-Terry-Luce (BTL) and Thurstone models, but their reliance on strong parametric assumptions is limiting. In this work, we study a flexible model for pairwise ... More

Randomized Smoothing for Stochastic OptimizationMar 22 2011Apr 07 2012We analyze convergence rates of stochastic optimization procedures for non-smooth convex optimization problems. By combining randomized smoothing techniques with accelerated gradient methods, we obtain convergence rates of stochastic optimization procedures, ... More

Function-Specific Mixing Times and Concentration Away from EquilibriumMay 06 2016Sep 30 2016Slow mixing is the central hurdle when working with Markov chains, especially those used for Monte Carlo approximations (MCMC). In many applications, it is only of interest to estimate the stationary expectations of a small set of functions, and so the ... More

Divide and Conquer Kernel Ridge Regression: A Distributed Algorithm with Minimax Optimal RatesMay 22 2013Apr 29 2014We establish optimal convergence rates for a decomposition-based scalable approach to kernel ridge regression. The method is simple to describe: it randomly partitions a dataset of size N into m subsets of equal size, computes an independent kernel ridge ... More

High-dimensional Ising model selection using ${\ell_1}$-regularized logistic regressionOct 02 2010We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on $\ell_1$-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic ... More

HopSkipJumpAttack: A Query-Efficient Decision-Based AttackApr 03 2019Jun 03 2019The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate ... More

Guessing Facets: Polytope Structure and Improved LP DecodingSep 25 2007In this paper we investigate the structure of the fundamental polytope used in the Linear Programming decoding introduced by Feldman, Karger and Wainwright. We begin by showing that for expander codes, every fractional pseudocodeword always has at least ... More

Approximate Ranking from Pairwise ComparisonsJan 04 2018A common problem in machine learning is to rank a set of n items based on pairwise comparisons. Here ranking refers to partitioning the items into sets of pre-specified sizes according to their scores, which includes identification of the top-k items ... More

Optimal Rates and Tradeoffs in Multiple TestingMay 15 2017Multiple hypothesis testing is a central topic in statistics, but despite abundant work on the false discovery rate (FDR) and the corresponding Type-II error concept known as the false non-discovery rate (FNR), a fine-grained understanding of the fundamental ... More

Distributed Estimation of Generalized Matrix Rank: Efficient Algorithms and Lower BoundsFeb 05 2015Feb 06 2015We study the following generalized matrix rank estimation problem: given an $n \times n$ matrix and a constant $c \geq 0$, estimate the number of eigenvalues that are greater than $c$. In the distributed setting, the matrix of interest is the sum of $m$ ... More

Local Privacy and Minimax Bounds: Sharp Rates for Probability EstimationMay 26 2013We provide a detailed study of the estimation of probability distributions---discrete and continuous---in a stringent setting in which data is kept private even from the statistician. We give sharp minimax rates of convergence for estimation in these ... More

Support union recovery in high-dimensional multivariate regressionAug 05 2008Mar 07 2011In multivariate regression, a $K$-dimensional response vector is regressed upon a common set of $p$ covariates, with a matrix $B^*\in\mathbb{R}^{p\times K}$ of regression coefficients. We study the behavior of the multivariate group Lasso, in which block ... More

Kernel Feature Selection via Conditional Covariance MinimizationJul 04 2017Oct 20 2018We propose a method for feature selection that employs kernel-based measures of independence to find a subset of covariates that is maximally predictive of the response. Building on past work in kernel dimension reduction, we show how to perform feature ... More

A Unified Framework for High-Dimensional Analysis of M-Estimators with Decomposable RegularizersOct 13 2010Mar 12 2013High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless $p/n\rightarrow0$, a line of recent ... More

MAP estimation via agreement on (hyper)trees: Message-passing and linear programmingAug 15 2005We develop and analyze methods for computing provably optimal {\em maximum a posteriori} (MAP) configurations for a subclass of Markov random fields defined on graphs with cycles. By decomposing the original distribution into a convex combination of tree-structured ... More

Log-concave sampling: Metropolis-Hastings algorithms are fastJan 08 2018Mar 06 2019We consider the problem of sampling from a strongly log-concave density in $\mathbb{R}^d$, and prove a non-asymptotic upper bound on the mixing time of the Metropolis-adjusted Langevin algorithm (MALA). The method draws samples by simulating a Markov ... More

L-Shapley and C-Shapley: Efficient Model Interpretation for Structured DataAug 08 2018We study instancewise feature importance scoring as a method for model interpretation. Any such method yields, for each predicted instance, a vector of importance scores associated with the feature vector. Methods based on the Shapley score have been ... More

On the Computational Complexity of High-Dimensional Bayesian Variable SelectionMay 29 2015We study the computational complexity of Markov chain Monte Carlo (MCMC) methods for high-dimensional Bayesian linear regression under sparsity constraints. We first show that a Bayesian approach can achieve variable-selection consistency under relatively ... More

Privacy Aware LearningOct 07 2012Oct 10 2013We study statistical risk minimization problems under a privacy model in which the data is kept confidential even from the learner. In this local privacy framework, we establish sharp upper and lower bounds on the convergence rates of statistical estimation ... More

Information-theoretic lower bounds on the oracle complexity of stochastic convex optimizationSep 03 2010Nov 20 2011Relative to the large literature on upper bounds on complexity of convex optimization, lesser attention has been paid to the fundamental hardness of these problems. Given the extensive use of convex optimization in machine learning and statistics, gaining ... More

Fast global convergence of gradient methods for high-dimensional statistical recoveryApr 25 2011Jul 25 2012Many statistical $M$-estimators are based on convex optimization problems formed by the combination of a data-dependent loss function with a norm-based regularizer. We analyze the convergence rates of projected gradient and composite gradient methods ... More

High-Dimensional Graphical Model Selection Using $\ell_1$-Regularized Logistic RegressionApr 26 2008We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on $\ell_1$-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic ... More

Geographic Gossip: Efficient Aggregation for Sensor NetworksFeb 20 2006Gossip algorithms for aggregation have recently received significant attention for sensor network applications because of their simplicity and robustness in noisy and uncertain environments. However, gossip algorithms can waste significant energy by essentially ... More

Online control of the false discovery rate with decaying memoryOct 02 2017In the online multiple testing problem, p-values corresponding to different null hypotheses are observed one by one, and the decision of whether or not to reject the current hypothesis must be made immediately, after which the next p-value is observed. ... More

Optimal prediction for sparse linear models? Lower bounds for coordinate-separable M-estimatorsMar 11 2015Nov 30 2015For the problem of high-dimensional sparse linear regression, it is known that an $\ell_0$-based estimator can achieve a $1/n$ "fast" rate on the prediction error without any conditions on the design matrix, whereas in absence of restrictive conditions ... More

Local Privacy, Data Processing Inequalities, and Statistical Minimax RatesFeb 13 2013Aug 27 2014Working under a model of privacy in which data remains private even from the statistician, we study the tradeoff between privacy guarantees and the utility of the resulting statistical estimators. We prove bounds on information-theoretic quantities, including ... More

On optimal quantization rules for some problems in sequential decentralized detectionAug 22 2006Nov 26 2008We consider the design of systems for sequential decentralized detection, a problem that entails several interdependent choices: the choice of a stopping rule (specifying the sample size), a global decision function (a choice between two competing hypotheses), ... More

Feeling the Bern: Adaptive Estimators for Bernoulli Probabilities of Pairwise ComparisonsMar 22 2016We study methods for aggregating pairwise comparison data in order to estimate outcome probabilities for future comparisons among a collection of n items. Working within a flexible framework that imposes only a form of strong stochastic transitivity (SST), ... More

Low Permutation-rank Matrices: Structural Properties and Noisy CompletionSep 01 2017We consider the problem of noisy matrix completion, in which the goal is to reconstruct a structured matrix whose entries are partially observed in noise. Standard approaches to this underdetermined inverse problem are based on assuming that the underlying ... More

HopSkipJumpAttack: A Query-Efficient Decision-Based AttackApr 03 2019Jun 10 2019The goal of a decision-based adversarial attack on a trained model is to generate adversarial examples based solely on observing output labels returned by the targeted model. We develop HopSkipJumpAttack, a family of algorithms based on a novel estimate ... More

On surrogate loss functions and $f$-divergencesOct 25 2005Apr 01 2009The goal of binary classification is to estimate a discriminant function $\gamma$ from observations of covariate vectors and corresponding binary labels. We consider an elaboration of this problem in which the covariates are not available directly but ... More

Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensionsFeb 23 2011Mar 06 2012We analyze a class of estimators based on convex relaxation for solving high-dimensional matrix decomposition problems. The observations are noisy realizations of a linear transformation $\mathfrak{X}$ of the sum of an approximately) low rank matrix $\Theta^\star$ ... More

Improved Bounds for Discretization of Langevin Diffusions: Near-Optimal Rates without ConvexityJul 25 2019We present an improved analysis of the Euler-Maruyama discretization of the Langevin diffusion. Our analysis does not require global contractivity, and yields polynomial dependence on the time horizon. Compared to existing approaches, we make an additional ... More

Non-Asymptotic Analysis of an Optimal Algorithm for Network-Constrained Averaging with Noisy LinksFeb 08 2013The problem of network-constrained averaging is to compute the average of a set of values distributed throughout a graph G using an algorithm that can pass messages only along graph edges. We study this problem in the noisy setting, in which the communication ... More