Latest in math.st

total 12790took 0.15s
Maximum Likelihood Estimation of Toric Fano VarietiesMay 17 2019We study the maximum likelihood estimation problem for several classes of toric Fano models. We start by exploring the maximum likelihood degree for all 2-dimensional Gorenstein toric Fano varieties. We show that the ML degree is equal to the degree of ... More
Pair Matching: When bandits meet stochastic block modelMay 17 2019The pair-matching problem appears in many applications where one wants to discover good matches between pairs of individuals. Formally, the set of individuals is represented by the nodes of a graph where the edges, unobserved at first, represent the good ... More
Analytic Basis Expansions for Functional SnippetsMay 16 2019Estimation of mean and covariance functions is fundamental for functional data analysis. While this topic has been studied extensively in the literature, a key assumption is that there are enough data in the domain of interest to estimate both the mean ... More
NANUQ: A method for inferring species networks from gene trees under the coalescent modelMay 16 2019Species networks generalize the notion of species trees to allow for hybridization or other lateral gene transfer. Under the Network Multispecies Coalescent Model, individual gene trees arising from a network can have any topology, but arise with frequencies ... More
Simplicial splines for representation of density functionsMay 16 2019In the context of functional data analysis, probability density functions as non-negative functions are characterized by specific properties of scale invariance and relative scale which enable to represent them with the unit integral constraint without ... More
Stochastic precedence and minima among dependent variables. A study based on the multivariate conditional hazard ratesMay 16 2019The notion of stochastic precedence between two random variables emerges as a relevant concept in several fields of applied probability. When one consider a vector of random variables $X_1,...,X_n$, this notion has a preeminent role in the analysis of ... More
When random initializations help: a study of variational inference for community detectionMay 16 2019Variational approximation has been widely used in large-scale Bayesian inference recently, the simplest kind of which involves imposing a mean field assumption to approximate complicated latent structures. Despite the computational scalability of mean ... More
Adaptive estimation in the linear random coefficients model when regressors have limited variationMay 16 2019We consider a linear model where the coefficients-intercept and slopes-are random and independent from regressors which support is a proper subset. When the density has finite weighted L 2 norm, for well chosen weights, the joint density of the random ... More
Moment-based Estimation of Mixtures of Regression ModelsMay 15 2019Finite mixtures of regression models provide a flexible modeling framework for many phenomena. Using moment-based estimation of the regression parameters, we develop unbiased estimators with a minimum of assumptions on the mixture components. In particular, ... More
Compound Dirichlet ProcessesMay 15 2019The compound Poisson process and the Dirichlet process are the pillar structures of Renewal theory and Bayesian nonparametric theory, respectively. Both processes have many useful extensions to fulfill the practitioners needs to model the particularities ... More
Transfer Entropy in Continuous TimeMay 15 2019Transfer entropy (TE) was introduced by Schreiber in 2000 as a measurement of the predictive capacity of one stochastic process with respect to another. Originally stated for discrete time processes, we expand the theory of TE to stochastic processes ... More
Which principal components are most sensitive to distributional changes?May 15 2019PCA is often used in anomaly detection and statistical process control tasks. For bivariate data, we prove that the minor projection (the least varying projection) of the PCA-rotated data is the most sensitive to distributional changes, where sensitivity ... More
Revisiting High Dimensional Bayesian Model Selection for Gaussian RegressionMay 15 2019Model selection for regression problems with an increasing number of covariates continues to be an important problem both theoretically and in applications. Model selection consistency and mean structure reconstruction depend on the interplay between ... More
A New Confidence Interval for the Mean of a Bounded Random VariableMay 15 2019We present a new method for constructing a confidence interval for the mean of a bounded random variable from samples of the random variable. We conjecture that the confidence interval has guaranteed coverage, i.e., that it contains the mean with high ... More
Robust change point tests by bounded transformationsMay 15 2019Classical moment based change point tests like the cusum test are very powerful in case of Gaussian time series with one change point but behave poorly under heavy tailed distributions and corrupted data. A new class of robust change point tests based ... More
Iterative Alpha Expansion for estimating gradient-sparse signals from linear measurementsMay 15 2019We consider estimating a piecewise-constant image, or a gradient-sparse signal on a general graph, from noisy linear measurements. We propose and study an iterative algorithm to minimize a penalized least-squares objective, with a penalty given by the ... More
Information criteria for non-normalized modelsMay 15 2019Many statistical models are given in the form of non-normalized densities with an intractable normalization constant. Since maximum likelihood estimation is computationally intensive for these models, several estimation methods have been developed which ... More
Measuring Bayesian Robustness Using Rényi's Divergence and Relationship with Prior-Data ConflictMay 15 2019This paper deals with measuring the Bayesian robustness of classes of contaminated priors. Two different classes of priors in the neighbourhood of the elicited prior are considered. The first one is the well-known $\epsilon$-contaminated class, while ... More
Minimax rates of estimation for smooth optimal transport mapsMay 14 2019Brenier's theorem is a cornerstone of optimal transport that guarantees the existence of an optimal transport map $T$ between two probability distributions $P$ and $Q$ over $\mathbb{R}^d$ under certain regularity conditions. The main goal of this work ... More
Approximation of Optimal Transport problems with marginal moments constraintsMay 14 2019Optimal Transport (OT) problems arise in a wide range of applications, from physics to economics. Getting numerical approximate solution of these problems is a challenging issue of practical importance. In this work, we investigate the relaxation of the ... More
Sample Efficient Toeplitz Covariance EstimationMay 14 2019May 15 2019We study the sample complexity of estimating the covariance matrix $T$ of a distribution $\mathcal{D}$ over $d$-dimensional vectors, under the assumption that $T$ is Toeplitz. This assumption arises in many signal processing problems, where the covariance ... More
Sample Efficient Toeplitz Covariance EstimationMay 14 2019We study the query complexity of estimating the covariance matrix $T$ of a distribution $\mathcal{D}$ over $d$-dimensional vectors, under the assumption that $T$ is Toeplitz. This assumption arises in many signal processing problems, where the covariance ... More
Multivariate Ranks and Quantiles using Optimal Transportation and Applications to Goodness-of-fit TestingMay 14 2019In this paper we study multivariate ranks and quantiles, defined using the theory of optimal transportation, and build on the work of Chernozhukov et al. (2017) and del Barrio et al. (2018). We study the characterization and properties of these multivariate ... More
Modeling failures times with dependent renewal type models via exchangeabilityMay 13 2019Failure times of a machinery cannot always be assumed independent and identically distributed, e.g. if after reparations the machinery is not restored to a same-as-new condition. Framed within the renewal processes approach, a generalization that considers ... More
Moment Identifiability of Homoscedastic Gaussian MixturesMay 13 2019We consider the problem of identifying a mixture of Gaussian distributions with same unknown covariance matrix by their sequence of moments up to certain order. Our approach rests on studying the moment varieties obtained by taking special secants to ... More
Exact high-dimensional asymptotics for support vector machineMay 13 2019Support vector machine (SVM) is one of the most widely used classification methods. In this paper, we consider soft margin support vector machine used on data points with independent features, where the sample size $n$ and the feature dimension $p$ grows ... More
Partially Specified Space Time Autoregressive Model with Artificial Neural NetworkMay 13 2019The space time autoregressive model has been widely applied in science, in areas such as economics, public finance, political science, agricultural economics, environmental studies and transportation analyses. The classical space time autoregressive model ... More
Sub-Weibull distributions: generalizing sub-Gaussian and sub-Exponential properties to heavier-tailed distributionsMay 13 2019We propose the notion of sub-Weibull distributions, which are characterised by tails lighter than (or equally light as) the right tail of a Weibull distribution. This novel class generalises the sub-Gaussian and sub-Exponential to potentially heavier-tailed ... More
Is Volatility Rough ?May 13 2019Rough volatility models are continuous time stochastic volatility models where the volatility process is driven by a fractional Brownian motion with the Hurst parameter less than half, and have attracted much attention since a seminal paper titled "Volatility ... More
Functional Correlations in the Pursuit of Performance Assessment of ClassifiersMay 12 2019In statistical classification, machine learning, social and other sciences, a number of measures of association have been developed and used for assessing and comparing individual classifiers, raters, and their groups. Among the measures, we find the ... More
ACF estimation via difference schemes for a semiparametric model with $m$-dependent errorsMay 11 2019In this manuscript, we discuss a class of difference-based estimators of the autocovariance structure in a semiparametric regression model where the signal is discontinuous and the errors are serially correlated. The signal in this model consists of a ... More
Prediction and outlier detection: a distribution-free prediction set with a balanced objectiveMay 10 2019We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction set) that constructs a prediction ... More
Prediction and outlier detection: a distribution-free prediction set with a balanced objectiveMay 10 2019May 14 2019We consider the multi-class classification problem when the training data and the out-of-sample test data may have different distributions and propose a method called BCOPS (balanced and conformal optimized prediction set) that constructs a prediction ... More
Hyperparameter Estimation in Bayesian MAP Estimation: Parameterizations and ConsistencyMay 10 2019The Bayesian formulation of inverse problems is attractive for three primary reasons: it provides a clear modelling framework; means for uncertainty quantification; and it allows for principled learning of hyperparameters. The posterior distribution may ... More
Robust high dimensional learning for Lipschitz and convex lossesMay 10 2019We establish risk bounds for Regularized Empirical Risk Minimizers (RERM) when the loss is Lipschitz and convex and the regularization function is a norm. We obtain these results in the i.i.d. setup under subgaussian assumptions on the design. In a second ... More
Why scoring functions cannot assess tail propertiesMay 10 2019Motivated by the growing interest in sound forecast evaluation techniques with an emphasis on distribution tails rather than average behaviour, we investigate a fundamental question arising in this context: Can statistical features of distribution tails ... More
Large scale in transit computation of quantiles for ensemble runsMay 10 2019The classical approach for quantiles computation requires availability of the full sample before ranking it. In uncertainty quantification of numerical simulation models, this approach is not suitable at exascale as large ensembles of simulation runs ... More
Illumination depthMay 10 2019The concept of illumination bodies studied in convex geometry is used to amend the halfspace depth for multivariate data. The proposed notion of illumination enables finer resolution of the sample points, naturally breaks ties in the associated depth-based ... More
On limit theorems for persistent Betti numbers from dependent dataMay 10 2019We study persistent Betti numbers and persistence diagrams obtained from a time series and random fields. It is well known that the persistent Betti function is an efficient descriptor of the topology of a point cloud. So far, convergence results for ... More
Optimal rates for F-score binary classificationMay 10 2019We study the minimax settings of binary classification with F-score under the $\beta$-smoothness assumptions on the regression function $\eta(x) = \mathbb{P}(Y = 1|X = x)$ for $x \in \mathbb{R}^d$. We propose a classification procedure which under the ... More
Extreme events evaluation using CRPS distributionsMay 10 2019Verification of ensemble forecasts for extreme events remains a challenging question. The general public as well as the media naturely pay particular attention on extreme events and conclude about the global predictive performance of ensembles, which ... More
Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPsMay 09 2019This paper establishes that optimistic algorithms attain gap-dependent and non-asymptotic logarithmic regret for episodic MDPs. In contrast to prior work, our bounds do not suffer a dependence on diameter-like quantities or ergodicity, and smoothly interpolate ... More
On Semi-parametric Bernstein-von Mises Theorems for BARTMay 09 2019Few methods in Bayesian non-parametric statistics/ machine learning have received as much attention as Bayesian Additive Regression Trees (BART). While BART is now routinely performed for prediction tasks, its theoretical properties began to be understood ... More
Stein Point Markov Chain Monte CarloMay 09 2019An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising ... More
Conformal prediction for exponential families and generalized linear modelsMay 09 2019Conformal prediction methods construct prediction regions for iid data that are valid in finite samples. Distribution-free conformal prediction methods have been proposed for regression. Generalized linear models (GLMs) are a widely used class of regression ... More
Double-calibration estimators accounting for under-coverage and nonresponse in socio-economic surveysMay 09 2019Under-coverage and nonresponse problems are jointly present in most socio-economic surveys. The purpose of this paper is to propose a completely design-based estimation strategy that accounts for both problems without resorting to models but simply performing ... More
Non-Asymptotic Sequential Tests for Overlapping Hypotheses and application to near optimal arm identification in bandit modelsMay 09 2019In this paper, we study sequential testing problems with \emph{overlapping} hypotheses. We first focus on the simple problem of assessing if the mean $\mu$ of a Gaussian distribution is $\geq -\epsilon$ or $\leq \epsilon$; if $\mu\in(-\epsilon,\epsilon)$, ... More
Regression from Dependent ObservationsMay 08 2019The standard linear and logistic regression models assume that the response variables are independent, but share the same linear relationship to their corresponding vectors of covariates. The assumption that the response variables are independent is, ... More
Bounding distributional errors via density ratiosMay 08 2019We present some new and explicit error bounds for the approximation of distributions. The approximation error is quantified by the maximal density ratio of the distribution $Q$ to be approximated and its proxy $P$. This non-symmetric measure is more informative ... More
Bounding distributional errors via density ratiosMay 08 2019May 14 2019We present some new and explicit error bounds for the approximation of distributions. The approximation error is quantified by the maximal density ratio of the distribution $Q$ to be approximated and its proxy $P$. This non-symmetric measure is more informative ... More
Sliced Latin hypercube designs with arbitrary run sizesMay 07 2019Latin hypercube designs achieve optimal univariate stratifications and are useful for computer experiments. Sliced Latin hypercube designs are Latin hypercube designs that can be partitioned into smaller Latin hypercube designs. In this work, we give, ... More
Ergodic branching diffusions with immigration: properties of invariant occupation measure, identification of particles under high-frequency observation, and estimation of the diffusion coefficient at nonparametric ratesMay 07 2019In branching diffusions with immigration (BDI), particles travel on independent diffusion paths in $\mathbb{R}^d$, branch at position-dependent rates and leave offspring -- randomly scattered around the parent's death position -- according to position-dependent ... More
Minimax Hausdorff estimation of density level setsMay 07 2019Given a random sample of points from some unknown density, we propose a data-driven method for estimating density level sets under the r-convexity assumption. This shape condition generalizes the convexity property. However, the main problem in practice ... More
Moderate deviations in a class of stable but nearly unstable processesMay 07 2019We consider a stable but nearly unstable autoregressive process of any order. The bridge between stability and instability is expressed by a time-varying companion matrix $A_{n}$ with spectral radius $\rho(A_{n}) < 1$ satisfying $\rho(A_{n}) \rightarrow ... More
Tail dependence and smoothnessMay 07 2019The risk of catastrophes is related to the possibility of occurring extreme values. Several statistical methodologies have been developed in order to evaluate the propensity of a process for the occurrence of high values and the permanence of these in ... More
On the assumption of independent right censoringMay 07 2019Various assumptions on a right-censoring mechanism to ensure consistency of the Kaplan--Meier and Aalen--Johansen estimators in a competing risks setting are studied. Specifically, eight different assumptions are seen to fall in two categories: a weaker ... More
One-class classification with application to forensic analysisMay 07 2019The analysis of broken glass is forensically important to reconstruct the events of a criminal act. In particular, the comparison between the glass fragments found on a suspect (recovered cases) and those collected on the crime scene (control cases) may ... More
Estimating Piecewise Monotone SignalsMay 06 2019We study the problem of estimating piecewise monotone vectors. This problem can be seen as a generalization of the isotonic regression that allows a small number of order-violating changepoints. We focus mainly on the performance of the nearly-isotonic ... More
Exact Largest Eigenvalue Distribution for Doubly Singular Beta EnsembleMay 06 2019In [1] beta type I and II doubly singular distributions were introduced and their densities and the joint densities of nonzero eigenvalues were derived. We found simple formula to compute largest root distribution for doubly singular beta ensemble in ... More
Exact Largest Eigenvalue Distribution for Doubly Singular Beta EnsembleMay 06 2019May 07 2019In [1] beta type I and II doubly singular distributions were introduced and their densities and the joint densities of nonzero eigenvalues were derived. We found simple formula to compute largest root distribution for doubly singular beta ensemble in ... More
Free Component Analysis: Theory, Algorithms & ApplicationsMay 05 2019We describe a method for unmixing mixtures of freely independent random variables in a manner analogous to the independent component analysis (ICA) based method for unmixing independent random variables from their additive mixtures. Random matrices play ... More
Improved Classification Rates for Localized SVMsMay 04 2019One of the main characteristics of localized support vector machines that solve SVMs on many spatially defined small chunks is, besides the computational benefit compared to global SVMs, the freedom of choosing arbitrary kernel and regularization parameter ... More
De-biased graphical Lasso for high-frequency dataMay 04 2019This paper develops a new statistical inference theory for the precision matrix of high-frequency data in a high-dimensional setting. The focus is not only on point estimation but also on interval estimation and hypothesis testing for entries of the precision ... More
Projection Theorems, Estimating Equations, and Power-Law DistributionsMay 04 2019Projection theorems of divergence functionals reduce certain estimation problems on some specific families of probability distributions to a linear problem. Most of these divergences are also popular in the context of robust statistics. In this paper ... More
Test for homogeneity with unordered paired observationsMay 04 2019In some applications, an experimental unit is composed of two distinct but related subunits. The response from such a unit is $(X_{1}, X_{2})$ but we observe only $Y_1 = \min\{X_{1},X_{2}\}$ and $Y_2 = \max\{X_{1},X_{2}\}$, i.e., the subunit identities ... More
Learning Some Popular Gaussian Graphical Models without Condition Number BoundsMay 03 2019Gaussian Graphical Models (GGMs) have wide-ranging applications in machine learning and the natural and social sciences. In most of the settings in which they are applied, the number of observed samples is much smaller than the dimension and they are ... More
A Uniform Bound of the Operator Norm of Random Element Matrices and Operator Norm Minimizing EstimationMay 03 2019In this paper, we derive a uniform stochastic bound of the operator norm (or equivalently, the largest singular value) of random matrices whose elements are indexed by parameters. As an application, we propose a new estimator that minimizes the operator ... More
High dimensional VAR with low rank transitionMay 02 2019We propose a vector auto-regressive (VAR) model with a low-rank constraint on the transition matrix. This new model is well suited to predict high-dimensional series that are highly correlated, or that are driven by a small number of hidden factors. We ... More
Consistent Inversion of Noisy Non-Abelian X-Ray TransformsMay 02 2019For $M$ a simple surface, the non-linear and non-convex statistical inverse problem of recovering a matrix field $\Phi: M \to \mathfrak{so}(n)$ from discrete, noisy measurements of the $SO(n)$-valued scattering data $C_\Phi$ of a solution of a matrix ... More
Sparsity Double Robust Inference of Average Treatment EffectsMay 02 2019Many popular methods for building confidence intervals on causal effects under high-dimensional confounding require strong "ultra-sparsity" assumptions that may be difficult to validate in practice. To alleviate this difficulty, we here study a new method ... More
Functional central limit theorems for conditional Poisson samplingMay 02 2019This paper provides refined versions of some known functional central limit theorems for conditional Poisson sampling which are more suitable for applications. The theorems presented in this paper are generalizations of some results that have recently ... More
Total positivity in structured binary distributionsMay 01 2019We study binary distributions that are multivariate totally positive of order 2 (MTP2). Binary distributions can be represented as an exponential family and we show that MTP2 exponential families are convex. Moreover, MTP2 quadratic exponential families, ... More
Stochastic ordering results in parallel and series systems with Gumble distributed random variablesMay 01 2019The stochastic comparisons of parallel and series system are worthy of study. In this paper, we present some stochastic comparisons of parallel and series systems having independent components from Gumble distribution with two parameters (one location ... More
On the excursion area of perturbed Gaussian fieldsMay 01 2019We investigate Lipschitz-Killing curvatures for excursion sets of random fields on $\mathbb R^2$ under small spatial-invariant random perturbations. An expansion formula for mean curvatures is derived when the magnitude of the perturbation vanishes, which ... More
Asymptotically optimal sequential FDR and pFDR control with (or without) prior information on the number of signalsMay 01 2019We investigate asymptotically optimal multiple testing procedures for streams of sequential data in the context of prior information on the number of false null hypotheses ("signals"). We show that the "gap" and "gap-intersection" procedures, recently ... More
First digit law from Laplace transformApr 30 2019The occurrence of digits 1 through 9 as the leftmost nonzero digit of numbers from real-world sources is distributed unevenly according to an empirical law, known as Benford's law or the first digit law. It remains obscure why a variety of data sets generated ... More
Estimating Proportion of True Null Hypotheses based on Sum of p-values and application to microarraysApr 30 2019A new estimator of proportion of true null hypotheses based on sum of all p- values has been proposed in this work which removes the problem of choosing tuning parameters in the existent estimators. Normality of gene expression levels and common t-test ... More
Asymptotic Distribution of the Score Test for Detecting Marks in Hawkes ProcessesApr 30 2019The asymptotic distribution of the score test of the null hypothesis that marks do not impact the intensity of a Hawkes marked self-exciting point process is shown to be chi-squared. For local asymptotic power, the distribution against local alternatives ... More
Convergence rates for ordinal embeddingApr 30 2019We prove optimal bounds for the convergence rate of ordinal embedding (also known as non-metric multidimensional scaling) in the 1-dimensional case. The examples witnessing optimality of our bounds arise from a result in additive number theory on sets ... More
Extreme Nonlinear Correlation for Multiple Random Variables and Stochastic Processes with Applications to Additive ModelsApr 29 2019The maximum correlation of functions of a pair of random variables is an important measure of stochastic dependence. It is known that this maximum nonlinear correlation is identical to the absolute value of the Pearson correlation for a pair of Gaussian ... More
Individualized Treatment Selection: An Optimal Hypothesis Testing Approach In High-dimensional ModelsApr 29 2019The ability to predict individualized treatment effects (ITEs) based on a given patient's profile is essential for personalized medicine. The prediction of ITEs enables the comparison of the effectiveness of two treatment procedures for a specific individual. ... More
Asymptotic regime for improperness tests of complex random vectorsApr 29 2019Improperness testing for complex-valued vectors and signals has been considered lately due to potential applications in complex-valued time series analysisencountered in many applications from communications to oceanography. This paper provides new results ... More
Asymptotic regime for improperness tests of complex random vectorsApr 29 2019May 02 2019Improperness testing for complex-valued vectors and signals has been considered lately due to potential applications in complex-valued time series analysisencountered in many applications from communications to oceanography. This paper provides new results ... More
Exact Testing of Many Moment Inequalities Against Multiple ViolationsApr 29 2019This paper considers the problem of testing many moment inequalities, where the number of moment inequalities ($p$) is possibly larger than the sample size ($n$). Chernozhukov et al. (2018) proposed asymptotic tests for this problem using the maximum ... More
Properties of discrete Fisher information: Cramer-Rao-type and log-Sobolev-type inequalitiesApr 29 2019The Fisher information have connections with the standard deviation and the Shannon differential entropy through the Cramer-Rao bound and the log-Sobolev inequality. These inequalities hold for continuous distributions. In this paper, we introduce the ... More
Consistency of least squares estimation to the parameter for stochastic differential equations under distribution uncertaintyApr 29 2019Under distribution uncertainty, on the basis of discrete data we investigate the consistency of the least squares estimator (LSE) of the parameter for the stochastic differential equation (SDE) where the noise are characterized by $G$-Brownian motion. ... More
Minimax semi-supervised confidence sets for multi-class classificationApr 29 2019In this work we study the semi-supervised framework of confidence set classification with controlled expected size in minimax settings. We obtain semi-supervised minimax rates of convergence under the margin assumption and a H{\"o}lder condition on the ... More
A Closed Form Approximation of Moments of New Generalization of Negative Binomial DistributionApr 29 2019In this paper, we propose a closed form approximation to the mean and variance of a new generalization of negative binomial (NGNB) distribution arising from the Extended COM-Poisson (ECOMP) distribution developed by Chakraborty and Imoto (2016)(see [4]). ... More
Schwartz type model selection for ergodic stochastic differential equation modelsApr 28 2019We study the construction of the theoretical foundation of model comparison for ergodic stochastic differential equation (SDE) models and an extension of the applicable scope of the conventional Bayesian information criterion. Different from previous ... More
Nonparametric maximum likelihood estimation under a likelihood ratio orderApr 28 2019Comparison of two univariate distributions based on independent samples from them is a fundamental problem in statistics, with applications in a wide variety of scientific disciplines. In many situations, we might hypothesize that the two distributions ... More
Nonparametric maximum likelihood estimation under a likelihood ratio orderApr 28 2019May 07 2019Comparison of two univariate distributions based on independent samples from them is a fundamental problem in statistics, with applications in a wide variety of scientific disciplines. In many situations, we might hypothesize that the two distributions ... More
Nonparametric maximum likelihood estimation under a likelihood ratio orderApr 28 2019May 02 2019Comparison of two univariate distributions based on independent samples from them is a fundamental problem in statistics, with applications in a wide variety of scientific disciplines. In many situations, we might hypothesize that the two distributions ... More
Linearized two-layers neural networks in high dimensionApr 27 2019We consider the problem of learning an unknown function $f_{\star}$ on the $d$-dimensional sphere with respect to the square loss, given i.i.d. samples $\{(y_i,{\boldsymbol x}_i)\}_{i\le n}$ where ${\boldsymbol x}_i$ is a feature vector uniformly distributed ... More
Ready-to-Use Unbiased Estimators for Multivariate Cumulants Including One That Outperforms $\overline{x^3}$Apr 27 2019We present multivariate unbiased estimators for second, third, and fourth order cumulants $C_2(x,y)$, $C_3(x,y,z)$, and $C_4(x,y,z,w)$. Many relevant new estimators are derived for cases where some variables are average-free or pairs of variables have ... More
Optimal Bayesian Estimation for Random Dot Product GraphsApr 26 2019We propose a Bayesian approach, called the posterior spectral embedding, for estimating the latent positions in random dot product graphs, and prove its optimality. Unlike the classical spectral-based adjacency/Laplacian spectral embedding, the posterior ... More
Sample Amplification: Increasing Dataset Size even when Learning is ImpossibleApr 26 2019Given data drawn from an unknown distribution, $D$, to what extent is it possible to ``amplify'' this dataset and output an even larger set of samples that appear to have been drawn from $D$? We formalize this question as follows: an $(n,m)$ $\text{amplification ... More
On the Dependence between Functions of Quantile and Dispersion EstimatorsApr 26 2019In this paper, we derive the joint asymptotic distributions of functions of quantile estimators (the non-parametric sample quantile and the parametric location-scale quantile estimator) with functions of measure of dispersion estimators (the sample variance, ... More
Evaluating the boundary and Stieltjes transform of limiting spectral distributions for random matrices with a separable variance profileApr 26 2019We present numerical algorithms for solving two problems encountered in random matrix theory and its applications. First, we compute the boundary of the limiting spectral distribution for random matrices with a separable variance profile. Second, we evaluate ... More
Parametric Scenario Optimization under Limited Data: A Distributionally Robust Optimization ViewApr 25 2019We consider optimization problems with uncertain constraints that need to be satisfied probabilistically. When data are available, a common method to obtain feasible solutions for such problems is to impose sampled constraints, following the so-called ... More
Reference Bayesian analysis for hierarchical modelsApr 25 2019This paper proposes an alternative approach for constructing invariant Jeffreys prior distributions tailored for hierarchical or multilevel models. In particular, our proposal is based on a flexible decomposition of the Fisher information for hierarchical ... More