How Hard Is Robust Mean Estimation?Mar 19 2019Robust mean estimation is the problem of estimating the mean $\mu \in \mathbb{R}^d$ of a $d$-dimensional distribution $D$ from a list of independent samples, an $\epsilon$-fraction of which have been arbitrarily corrupted by a malicious adversary. Recent ... More
Relative Efficiency of Higher Normed Estimators Over the Least Squares EstimatorMar 19 2019In this article, we study the performance of the estimator that minimizes $L_{2k}- $ order loss function (for $ k \ge \; 2 )$ against the estimators which minimizes the $L_2-$ order loss function (or the least squares estimator). Commonly occurring examples ... More
Signal recovery by Stochastic OptimizationMar 18 2019We discuss an approach to signal recovery in Generalized Linear Models (GLM) in which the signal estimation problem is reduced to the problem of solving a stochastic monotone variational inequality (VI). The solution to the stochastic VI can be found ... More
Bi-log-concavity: some properties and some remarks towards a multi-dimensional extensionMar 18 2019Bi-log-concavity of probability measures is a univariate extension of the notion of log-concavity that has been recently proposed in a statistical literature. Among other things, it has the nice property from a modelisation perspective to admit some multimodal ... More
On Generalized q-logistic Distribution and its CharacterizationsMar 18 2019Several generalizations of the logistic distribution, and certain related models, are proposed by many authors for modeling various random phenomena such as those encountered in data engineering, pattern recognition, and reliability assessment studies. ... More
Topp-Leone generated q-exponential distribution and its applicationsMar 17 2019Topp-Leone distribution is a continuous model distribution used for modelling lifetime phenomena. The main purpose of this paper is to introduce a new framework for generating lifetime distributions, called the Topp-Leone generated q-exponential family ... More
Nonparametric estimation for linear SPDEs from local measurementsMar 16 2019We estimate the coefficient function of the leading differential operator in a linear stochastic partial differential equation (SPDE). The estimation is based on continuous time observations which are localised in space. For the asymptotic regime with ... More
Ordering properties of the smallest order statistic from Weibull G random variablesMar 16 2019In this paper we compare the minimums of two heterogeneous samples each following Weibull-G distribution under three scenarios. In the Fifirst scenario, the units of the samples are assumed to be independently distributed and the comparisons are carried ... More
Minimax rates for the covariance estimation of multi-dimensional Lévy processes with high-frequency dataMar 15 2019This article studies nonparametric methods to estimate the co-integrated volatility for multi-dimensional L\'evy processes with high frequency data. We construct a spectral estimator for the co-integrated volatility and prove minimax rates for an appropriate ... More
A nonasymptotic law of iterated logarithm for robust online estimatorsMar 15 2019In this paper, we provide tight deviation bounds for M-estimators, which are valid with a prescribed probability for every sample size. M-estimators are ubiquitous in machine learning and statistical learning theory. They are used both for defining prediction ... More
Inference Without CompatibilityMar 14 2019We consider hypothesis testing problems for a single covariate in the context of a linear model with Gaussian design when $p>n$. Under minimal sparsity conditions of their type and without any compatibility condition, we construct an asymptotically Gaussian ... More
Discrete Statistical Models with Rational Maximum Likelihood EstimatorMar 14 2019A discrete statistical model is a subset of a probability simplex. Its maximum likelihood estimator (MLE) is a retraction from that simplex onto the model. We characterize all models for which this retraction is a rational function. This is a contribution ... More
High-dimensional nonparametric density estimation via symmetry and shape constraintsMar 14 2019We tackle the problem of high-dimensional nonparametric density estimation by taking the class of log-concave densities on $\mathbb{R}^p$ and incorporating within it symmetry assumptions, which facilitate scalable estimation algorithms and can mitigate ... More
Markov-chain-inspired search for MH370Mar 14 2019Markov-chain models are constructed for the probabilistic description of the drift of marine debris from Malaysian Airlines flight MH370. En route from Kuala Lumpur to Beijing, the MH370 mysteriously disappeared in the southeastern Indian Ocean on 8 March ... More
Bayesian/Graphoid intersection property for factorisation modelsMar 14 2019We remark that the Graphoid intersection property, also called intersection property in Bayesian networks, is a particular case of an intersection property, in the sense of intersection of coverings, for factorisation spaces, also called factorisation ... More
Rejoinder: "Gene Hunting with Hidden Markov Model Knockoffs"Mar 13 2019In this paper we deepen and enlarge the reflection on the possible advantages of a knockoff approach to genome wide association studies (Sesia et al., 2018), starting from the discussions in Bottolo & Richardson (2019); Jewell & Witten (2019); Rosenblatt ... More
Matrix factorization for multivariate time series analysisMar 13 2019Matrix factorization is a powerful data analysis tool. It has been used in multivariate time series analysis, leading to the decomposition of the series in a small set of latent factors. However, little is known on the statistical performances of matrix ... More
The Log-Concave Maximum Likelihood Estimator is Optimal in High DimensionsMar 13 2019We study the problem of learning a $d$-dimensional log-concave distribution from $n$ i.i.d. samples with respect to both the squared Hellinger and the total variation distances. We show that for all $d \ge 4$ the maximum likelihood estimator achieves ... More
Computational Bayes-Predictive Stochastic Programming: Finite Sample BoundMar 12 2019We study stochastic programming models where the stochastic variable is only known up to a parametrized distribution function, which must be estimated from a set of independent and identically distributed (i.i.d.) samples. We take a Bayesian approach, ... More
The All-or-Nothing Phenomenon in Sparse Linear RegressionMar 12 2019We study the problem of recovering a hidden binary $k$-sparse $p$-dimensional vector $\beta$ from $n$ noisy linear observations $Y=X\beta+W$ where $X_{ij}$ are i.i.d. $\mathcal{N}(0,1)$ and $W_i$ are i.i.d. $\mathcal{N}(0,\sigma^2)$. A closely related ... More
ECKO: Ensemble of Clustered Knockoffs for multivariate inference on fMRI dataMar 12 2019Continuous improvement in medical imaging techniques allows the acquisition of higher-resolution images. When these are used in a predictive setting, a greater number of explanatory variables are potentially related to the dependent variable (the response). ... More
Dimension reduction as an optimization problem over a set of generalized functionsMar 12 2019Classical dimension reduction problem can be loosely formulated as a problem of finding a $k$-dimensional affine subspace of ${\mathbb R}^n$ onto which data points ${\mathbf x}_1,\cdots, {\mathbf x}_N$ can be projected without loss of valuable information. ... More
The limits of distribution-free conditional predictive inferenceMar 12 2019We consider the problem of distribution-free predictive inference, with the goal of producing predictive coverage guarantees that hold conditionally rather than marginally. Existing methods such as conformal prediction offer marginal coverage guarantees, ... More
Calibrating dependence between random elementsMar 11 2019Attempts to quantify dependence between random elements X and Y via maximal correlation go back to Gebelein (1941) and R\'{e}nyi (1959). After summarizing properties (including some new) of the R\'{e}nyi measure of dependence, a calibrated scale of dependence ... More
Generalized Sparse Additive ModelsMar 11 2019We present a unified framework for estimation and analysis of generalized additive models in high dimensions. The framework defines a large class of penalized regression estimators, encompassing many existing methods. An efficient computational algorithm ... More
Diffusion $K$-means clustering on manifolds: provable exact recovery via semidefinite relaxationsMar 11 2019We introduce the {\it diffusion $K$-means} clustering method on Riemannian submanifolds, which maximizes the within-cluster connectedness based on the diffusion distance. The diffusion $K$-means constructs a random walk on the similarity graph with vertices ... More
Consistency of the maximum likelihood and variational estimators in a dynamic stochastic block modelMar 11 2019We consider a dynamic version of the stochastic block model, in which the nodes are partitioned into latent classes and the connection between two nodes is drawn from a Bernoulli distribution depending on the classes of these two nodes. The temporal evolution ... More
Maximum pseudo-likelihood estimation based on estimated residuals in copula semiparametric modelsMar 11 2019This paper deals with a situation when one is interested in the dependence structure of a multidimensional response variable in the presence of a multivariate covariate. It is assumed that the covariate affects only the marginal distributions through ... More
Fitting Tractable Convex Sets to Support Function EvaluationsMar 11 2019The geometric problem of estimating an unknown compact convex set from evaluations of its support function arises in a range of scientific and engineering applications. Traditional approaches typically rely on estimators that minimize the error over all ... More
Extreme events of higher-order Markov chains: hidden tail chains and extremal Yule-Walker equationsMar 10 2019We derive some key extremal features for kth order Markov chains, which can be used to understand how the process moves to and fro between the body of the process and an extreme state. The chains are studied given that there is an exceedance of a threshold, ... More
On the convergence of the maximum likelihood estimator for the transition rate under a 2-state symmetric modelMar 10 2019Maximum likelihood estimators are used extensively to estimate unknown parameters of stochastic trait evolution models on phylogenetic trees. Although the MLE has been proven to converge to the true value in the independent-sample case, we cannot appeal ... More
Quantitative spectral gap estimate and Wasserstein contraction of simple slice samplingMar 09 2019We prove Wasserstein contraction of simple slice sampling for approximate sampling w.r.t. distributions with log-concave and rotational invariant Lebesgue densities. This yields, in particular, an explicit quantitative lower bound of the spectral gap ... More
Consistent Bayesian Sparsity Selection for High-dimensional Gaussian DAG Models with Multiplicative and Beta-mixture PriorsMar 08 2019Estimation of the covariance matrix for high-dimensional multivariate datasets is a challenging and important problem in modern statistics. In this paper, we focus on high-dimensional Gaussian DAG models where sparsity is induced on the Cholesky factor ... More
Local asymptotic normality for shape and periodicity of a signal in the drift of a degenerate diffusion with internal variablesMar 08 2019Taking a multidimensional time-homogeneous dynamical system and adding a randomly perturbed time-dependent deterministic signal to some of its components gives rise to a high-dimensional system of stochastic differential equations which is driven by possibly ... More
Kernel Based Estimation of Spectral Risk MeasuresMar 08 2019Spectral risk measures (SRMs) belongs to the family of coherent risk measures. A natural estimator for the class of spectral risk measures (SRMs) has the form of $L$-statistics. In the literature, various authors have studied and derived the asymptotic ... More
On the asymptotic normality of persistent Betti numbersMar 08 2019Persistent Betti numbers are a major tool in persistent homology, a subfield of topological data analysis. Many tools in persistent homology rely on the properties of persistent Betti numbers considered as a two-dimensional stochastic process $ (r,s) ... More
Solutions to Sparse Multilevel Matrix ProblemsMar 07 2019We define and solve classes of sparse matrix problems that arise in multilevel modeling and data analysis. The classes are indexed by the number of nested units, with two-level problems corresponding to the common situation in which data on level 1 units ... More
Rigorous Analysis of Spectral Methods for Random Orthogonal MatricesMar 07 2019Phase retrieval refers to algorithmic methods for recovering a signal from its phaseless measurements. Local search algorithms that work directly on the non-convex formulation of the problem have been very popular recently. Due to the nonconvexity of ... More
Integral Transform Methods in Goodness-of-Fit Testing, II: The Wishart DistributionsMar 06 2019We initiate the study of goodness-of-fit testing when the data consist of positive definite matrices. Motivated by the recent appearance of the cone of positive definite matrices in numerous areas of applied research, including diffusion tensor imaging, ... More
Nonparametric Change Point Detection in RegressionMar 06 2019This paper considers an important problem of change-point detection in regression. The study suggests a novel testing procedure featuring a fully data-driven calibration scheme. The method is essentially a black box, requiring no tuning from practitioner. ... More
Parameter estimation for the Rosenblatt Ornstein-Uhlenbeck process with periodic meanMar 06 2019We study the least squares estimator for the drift parameter of the Langevin stochastic equation driven by the Rosenblatt process. Using the techniques of the Malliavin calculus and the stochastic integration with respect to the Rosenblatt process, we ... More
Generalized $k$-variations and Hurst parameter estimation for the fractional wave equation via Malliavin calculusMar 06 2019We analyze the generalized $k$-variations for the solution to the wave equation driven by an additive Gaussian noise which behaves as a fractional Brownian with Hurst parameter $H>\frac{1}{2}$ in time and which is white in space. The $k$-variations are ... More
Hurst index estimation in stochastic differential equations driven by fractional Brownian motionMar 06 2019We consider the problem of Hurst index estimation for solutions of stochastic differential equations driven by an additive fractional Brownian motion. Using techniques of the Malliavin calculus, we analyze the asymptotic behavior of the quadratic variations ... More
Hoeffding-Type and Bernstein-Type Inequalities for Right Censored DataMar 05 2019We present Hoeffding-type and Bernstein-type inequalities for right-censored data. The inequalities bound the difference between an inverse of the probability of censoring weighting (IPCW) estimator and its expectation. We first discuss the asymptotic ... More
Generative Adversarial Nets for Robust Scatter Estimation: A Proper Scoring Rule PerspectiveMar 05 2019Robust scatter estimation is a fundamental task in statistics. The recent discovery on the connection between robust estimation and generative adversarial nets (GANs) by Gao et al. (2018) suggests that it is possible to compute depth-like robust estimators ... More
Local differential privacy: Elbow effect in optimal density estimation and adaptation over Besov ellipsoidsMar 05 2019We address the problem of non-parametric density estimation under the additional constraint that only privatised data are allowed to be published and available for inference. For this purpose, we adopt a recent generalisation of classical minimax theory ... More
A Prediction Tournament ParadoxMar 05 2019In a prediction tournament, contestants "forecast" by asserting a numerical probability for each of (say) 100 future real-world events. The scoring system is designed so that (regardless of the unknown true probabilities) more accurate forecasters will ... More
Tutorial: Deriving The Efficient Influence Curve for Large ModelsMar 05 2019This paper aims to provide a tutorial for upper level undergraduate and graduate students in statistics and biostatistics on deriving influence functions for non-parametric and semi-parametric models. The author will build on previously known efficiency ... More
Tutorial: Deriving The Efficient Influence Curve for Large ModelsMar 05 2019Mar 09 2019This paper aims to provide a tutorial for upper level undergraduate and graduate students in statistics, biostatistics and epidemiology on deriving influence functions for non-parametric and semi-parametric models. The author will build on previously ... More
Measuring and Controlling Bias for Some Bayesian Inferences and the Relation to Frequentist CriteriaMar 05 2019A common concern with Bayesian methodology in scientific contexts is that inferences can be heavily influenced by subjective biases. As presented here, there are two types of bias for some quantity of interest: bias against and bias in favor. Based upon ... More
Concentration-based confidence intervals for U-statisticsMar 05 2019Concentration inequalities have become increasingly popular in machine learning, probability, and statistical research. Using concentration inequalities, one can construct confidence intervals (CIs) for many quantities of interest. Unfortunately, many ... More
Change Detection with the Kernel Cumulative Sum AlgorithmMar 05 2019Online change detection involves monitoring a stream of data for changes in the statistical properties of incoming observations. A good change detector will detect any changes shortly after they occur, while raising few false alarms. Although there are ... More
Approximations of Shannon Mutual Information for Discrete Variables with Applications to Neural Population CodingMar 04 2019Although Shannon mutual information has been widely used, its effective calculation is often difficult for many practical problems, including those in neural population coding. Asymptotic formulas based on Fisher information sometimes provide accurate ... More
Data Amplification: Instance-Optimal Property EstimationMar 04 2019Mar 05 2019The best-known and most commonly used distribution-property estimation technique uses a plug-in estimator, with empirical frequency replacing the underlying distribution. We present novel linear-time-computable estimators that significantly "amplify" ... More
Nonparametric Confidence Regions for Level Sets: Statistical Properties and GeometryMar 04 2019This paper studies and critically discusses the construction of nonparametric confidence regions for density level sets. Methodologies based on both vertical variation and horizontal variation are considered. The investigations provide theoretical insight ... More
Multivariate extensions of isotonic regression and total variation denoising via entire monotonicity and Hardy-Krause variationMar 04 2019We consider the problem of nonparametric regression when the covariate is $d$-dimensional, where $d \geq 1$. In this paper we introduce and study two nonparametric least squares estimators (LSEs) in this setting---the entirely monotonic LSE and the constrained ... More
Time Series Source Separation using Dynamic Mode DecompositionMar 04 2019The dynamic mode decomposition (DMD) extracted dynamic modes are the non-orthogonal eigenvectors of the matrix that best approximates the one-step temporal evolution of the multivariate samples. In the context of dynamic system analysis, the extracted ... More
Multiscale clustering of nonparametric regression curvesMar 04 2019In a wide range of modern applications, we observe a large number of time series rather than only a single one. It is often natural to suppose that there is some group structure in the observed time series. When each time series is modelled by a nonparametric ... More
Multiscale inference and long-run variance estimation in nonparametric regression with time series errorsMar 04 2019In this paper, we develop new multiscale methods to test qualitative hypotheses about the regression function m in a nonparametric regression model with fixed design points and time series errors. In time series applications, m represents a nonparametric ... More
Spectral Density-Based and Measure-Preserving ABC for partially observed diffusion processes. An illustration on Hamiltonian SDEsMar 04 2019Approximate Bayesian Computation (ABC) has become one of the major tools of likelihood-free statistical inference in complex mathematical models. Simultaneously, stochastic differential equations (SDEs) have developed to an established tool for modelling ... More
Same but Different: distance correlations between topological summariesMar 04 2019Persistent homology allows us to create topological summaries of complex data. In order to analyse these statistically we need to choose a topological summary and a metric space where these topological summaries exist. While different representations ... More
Regression models for compositional data: General log-contrast formulations, proximal optimization, and microbiome data applicationsMar 04 2019Compositional data sets are ubiquitous in science, including geology, ecology, and microbiology. In microbiome research, compositional data primarily arise from high-throughput sequence-based profiling experiments. These data comprise microbial compositions ... More
Empirical priors for prediction in sparse high-dimensional linear regressionMar 03 2019Often the primary goal of fitting a regression model is prediction, but the majority of work in recent years focuses on inference tasks, such as estimation and feature selection. In this paper we adopt the familiar sparse, high-dimensional linear regression ... More
Heavy Tailed Horseshoe PriorsMar 03 2019Locally adaptive shrinkage in the Bayesian framework is achieved through the use of local-global prior distributions that model both the global level of sparsity as well as individual shrinkage parameters for mean structure parameters. The most popular ... More
On one-sample Bayesian tests for the meanMar 03 2019This paper deals with a new Bayesian approach to the standard one-sample $z$- and $t$- tests. More specifically, let $x_1,\ldots,x_n$ be an independent random sample from a normal distribution with mean $\mu$ and variance $\sigma^2$. The goal is to test ... More
Goodness-of-Fit Testing for Time Series Models via Distance CovarianceMar 02 2019In many statistical modeling frameworks, goodness-of-fit tests are typically administered to the estimated residuals. In the time series setting, whiteness of the residuals is assessed using the sample autocorrelation function. For many time series models, ... More
Nonparametric adaptive inference of birth and death models in a large population limitMar 02 2019Motivated by improving mortality tables from human demography databases, we investigate statistical inference of a stochastic age-evolving density of a population alimented by time inhomogeneous mortality and fertility. Asymptotics are taken as the size ... More
High-Dimensional Learning under Approximate Sparsity: A Unifying Framework for Nonsmooth Learning and Regularized Neural NetworksMar 02 2019High-dimensional statistical learning (HDSL) has been widely applied in data analysis, operations research, and stochastic optimization. Despite the availability of multiple theoretical frameworks, most HDSL theories stipulate the following two conditions, ... More
Quantitative Robustness of Localized Support Vector MachinesMar 01 2019The huge amount of available data nowadays is a challenge for kernel-based machine learning algorithms like SVMs with respect to runtime and storage capacities. Local approaches might help to relieve these issues and to improve statistical accuracy. It ... More
Improving efficiency in fuzzy regression modeling by Stein-type shrinkageMar 01 2019The fuzzy linear regression (FLR) modeling was first proposed making use of linear programming and then followed by many improvements in a variety of ways. In almost all approaches changing the meters, objective function, and restrictions caused to improve ... More
Are profile likelihoods likelihoods? No, but sometimes they can beMar 01 2019Mar 08 2019We offer our two cents to the ongoing discussion on whether profile likelihoods are "true" likelihood functions, by showing that the profile likelihood function can in fact be identical to a marginal likelihood in the special case of normal models. Thus, ... More
Approximation by finite mixtures of continuous density functions that vanish at infinityMar 01 2019Given sufficiently many components, it is often cited that finite mixture models can approximate any other probability density function (pdf) to an arbitrary degree of accuracy. Unfortunately, the nature of this approximation result is often left unclear. ... More
A robust approach for principal component analyisisFeb 28 2019In this paper we analyze different ways of performing principal component analysis throughout three different approaches: robust covariance and correlation matrix estimation, projection pursuit approach and non-parametric maximum entropy algorithm. The ... More
Reliability Analysis of Systems Subject To Mutually Dependent Competing Failure Processes With Changing Degradation RateFeb 28 2019In this paper, a new reliability model has been developed for a single system degrading stochastically which experiences soft and hard failure. Soft failure occurs when the physical deterioration level of the system is greater than a predefined failure ... More
Construction Methods for GaussoidsFeb 28 2019The number of $n$-gaussoids is shown to be a double exponential function in $n$. The necessary bounds are achieved by studying construction methods for gaussoids that rely on prescribing $3$-minors and encoding the resulting combinatorial constraints ... More
Oracle inequalities for square root analysis estimators with application to total variation penaltiesFeb 28 2019We study the analysis estimator directly, without any step through a synthesis formulation. For the analysis estimator we derive oracle inequalities with fast and slow rates by adapting the arguments involving projections by Dalalyan, Hebiri and Lederer ... More
Learning rates for Gaussian mixtures under group invarianceFeb 28 2019We study the pointwise maximum likelihood estimation rates for a class of Gaussian mixtures that are invariant under the action of some isometry group. This model is also known as multi-reference alignment, where random isometries of a given vector are ... More
Optimal estimation of variance in nonparametric regression with random designFeb 27 2019Consider the heteroscedastic nonparametric regression model with random design \begin{align*} Y_i = f(X_i) + V^{1/2}(X_i)\varepsilon_i, \quad i=1,2,\ldots,n, \end{align*} with $f(\cdot)$ and $V(\cdot)$ $\alpha$- and $\beta$-H\"older smooth, respectively. ... More
Improved Concentration Bounds for Conditional Value-at-Risk and Cumulative Prospect Theory using Wasserstein distanceFeb 27 2019Known finite-sample concentration bounds for the Wasserstein distance between the empirical and true distribution of a random variable are used to derive a two-sided concentration bound for the error between the true conditional value-at-risk (CVaR) of ... More
Quasi-Bayes properties of a recursive procedure for mixturesFeb 27 2019Bayesian methods are attractive and often optimal, yet nowadays pressure for fast computations, especially with streaming data and online learning, brings renewed interest in faster, although possibly sub-optimal, solutions. To what extent these algorithms ... More
Maximum Likelihood Estimation of Sparse Networks with Missing ObservationsFeb 27 2019Estimating the matrix of connections probabilities is one of the key questions when studying sparse networks. In this work, we consider networks generated under the sparse graphon model and the in-homogeneous random graph model with missing observations. ... More
Consistent estimation of the missing mass for feature modelsFeb 27 2019Feature models are popular in machine learning and they have been recently used to solve many unsupervised learning problems. In these models every observation is endowed with a finite set of features, usually selected from an infinite collection $(F_{j})_{j\geq ... More
A Good-Turing estimator for feature allocation modelsFeb 27 2019Feature allocation models generalize species sampling models by allowing every observation to belong to more than one species, now called features. Under the popular Bernoulli product model for feature allocation, given $n$ samples, we study the problem ... More
Adaptation for nonparametric estimators of locally stationary processesFeb 27 2019Two adaptive bandwidth selection methods for nonparametric estimators in locally stationary processes are proposed. We investigate a cross validation approach and a method based on contrast minimization and derive asymptotic properties of both methods. ... More
Clustering through the optimal transport barycenter problemFeb 27 2019The problem of clustering a data set is formulated in terms of the Wasserstein barycenter problem in optimal transport. The objective proposed is the maximization of the variability attributable to class, further characterized as the minimization of the ... More
On the well-posedness of Bayesian inverse problemsFeb 26 2019The subject of this article is the introduction of a weaker concept of well-posedness of Bayesian inverse problems. The conventional concept of (`Lipschitz') well-posedness in [Stuart 2010, Acta Numerica 19, pp. 451-559] is difficult to verify in practice, ... More
Penalized Sieve GEL for Weighted Average Derivatives of Nonparametric Quantile IV RegressionsFeb 26 2019This paper considers estimation and inference for a weighted average derivative (WAD) of a nonparametric quantile instrumental variables regression (NPQIV). NPQIV is a non-separable and nonlinear ill-posed inverse problem, which might be why there is ... More
Fair Capital Risk AllocationFeb 26 2019In this paper we develop a novel methodology for estimation of risk capital allocation. The methodology is rooted in the theory of risk measures. We work within a general, but tractable class of law-invariant coherent risk measures, with a particular ... More
A Family of Exact Goodness-of-Fit Tests for High-Dimensional Discrete DistributionsFeb 26 2019The objective of goodness-of-fit testing is to assess whether a dataset of observations is likely to have been drawn from a candidate probability distribution. This paper presents a rank-based family of goodness-of-fit tests that is specialized to discrete ... More
Effect Inference from Two-Group Data with Sampling BiasFeb 26 2019In many applications, different populations are compared using data that are sampled in a biased manner. Under sampling biases, standard methods that estimate the difference between the population means yield unreliable inferences. Here we develop an ... More
Efficient online learning with kernels for adversarial large scale problemsFeb 26 2019We are interested in a framework of online learning with kernels for low-dimensional but large-scale and potentially adversarial datasets. Considering the Gaussian kernel, we study the computational and theoretical performance of online variations of ... More
Brownian motion tree models are toricFeb 26 2019Felsenstein's classical model for Gaussian distributions on a phylogenetic tree is shown to be a toric variety in the space of concentration matrices. We present an exact semialgebraic characterization of this model, and we demonstrate how the toric structure ... More
Logarithmic Regret for parameter-free Online Logistic RegressionFeb 26 2019We consider online optimization procedures in the context of logistic regression, focusing on the Extended Kalman Filter (EKF). We introduce a second-order algorithm close to the EKF, named Semi-Online Step (SOS), for which we prove a O(log(n)) regret ... More
A Dynamic Model for Double Bounded Time Series With Chaotic Driven Conditional AveragesFeb 25 2019In this work we introduce a class of dynamic models for time series taking values on the unit interval. The proposed model follows a generalized linear model approach where the random component, conditioned on the past information, follows a beta distribution, ... More
A Robust Unscented Transformation for Uncertain MomentsFeb 25 2019This paper proposes a robust version of the unscented transform (UT) for one-dimensional random variables. It is assumed that the moments are not exactly known, but are known to lie in intervals. In this scenario, the moment matching equations are reformulated ... More
Sampling Sup-Normalized Spectral Functions for Brown-Resnick ProcessesFeb 25 2019Sup-normalized spectral functions form building blocks of max-stable and Pareto processes and therefore play an important role in modeling spatial extremes. For one of the most popular examples, the Brown-Resnick process, simulation is not straightforward. ... More
Weak convergence theory for Poisson sampling designsFeb 25 2019This work provides some general theorems about unconditional and conditional weak convergence of Horvitz-Thompson empirical processes in the case of Poisson sampling designs. The theorems presented in this work are more general than previously published ... More
Nonlinear generalization of the single index modelFeb 24 2019Single index model is a powerful yet simple model, widely used in statistics, machine learning, and other scientific fields. It models the regression function as $g(<a,x>)$, where a is an unknown index vector and x are the features. This paper deals with ... More
Goodness-of-fit Tests for the Bivariate Poisson DistributionFeb 24 2019The bivariate Poisson distribution is commonly used to model bivariate count data. In this paper we study a goodness-of-fit test for this distribution. We also provide a review of the existing tests for the bivariate Poisson distribution, and its multivariate ... More
De-Biasing The Lasso With Degrees-of-Freedom AdjustmentFeb 24 2019This paper studies schemes to de-bias the Lasso in sparse linear regression where the goal is to estimate and construct confidence intervals for a low-dimensional projection of the unknown coefficient vector in a preconceived direction $a_0$. We assume ... More