Some ordering properties of highest and lowest order statistics with exponentiated Gumble type-II distributed componentsApr 18 2019In this paper, we have studied the stochastic comparisons of the highest and lowest order statistics of exponentiated Gumble type-II distribution with three parameters. We have compared both the statistics by using three different stochastic ordering.

Statistical witchhunts: Science, justice & the p-value crisisApr 11 2019We provide accessible insight into the current 'replication crisis' in 'statistical science', by revisiting the old metaphor of 'court trial as hypothesis test'. Inter alia, we define and diagnose harmful statistical witch-hunting both in justice and

The Contribution Plot: Decomposition and Graphical Display of the RV Coefficient, with Application to Genetic and Brain Imaging Biomarkers of Alzheimer's DiseaseApr 08 2019Alzheimer's disease (AD) is a chronic neurodegenerative disease that causes memory loss and decline in cognitive abilities. AD is the sixth leading cause of death in the United States, affecting an estimated 5 million Americans. To assess the association

Analytic Evaluation of the Fractional Moments for the Quasi-Stationary Distribution of the Shiryaev Martingale on an IntervalApr 05 2019We consider the quasi-stationary distribution of the classical Shiryaev diffusion restricted to the interval $[0,A]$ with absorption at a fixed $A>0$. We derive analytically a closed-form formula for the distribution's fractional moment of an {\em arbitrary}

Statistical testing in a Linear Probability SpaceApr 02 2019Imagine that you could calculate of posttest probabilities, i.e. Bayes theorem with simple addition. This is possible if we stop thinking of probabilities as ranging from 0 to 1.0. There is a naturally occurring linear probability space when data are

Data-driven discovery of coordinates and governing equationsMar 29 2019The discovery of governing equations from scientific data has the potential to transform data-rich fields that lack well-characterized quantitative descriptions. Advances in sparse regression are currently enabling the tractable identification of both

GraSPy: Graph Statistics in PythonMar 29 2019We introduce GraSPy, a Python library devoted to statistical inference, machine learning, and visualization of random graphs and graph populations. This package provides flexible and easy-to-use algorithms for analyzing and understanding graphs with a

An innovating Statistical Learning Tool based on Partial Differential Equations, intending livestock Data AssimilationMar 29 2019The realistic modeling intended to quantify precisely some biological mechanisms is a task requiering a lot of a priori knowledge and generally leading to heavy mathematical models. On the other hand, the structure of the classical Machine Learning algorithms,

Deterministic bootstrapping for a class of bootstrap methodsMar 26 2019Apr 09 2019An algorithm is described that enables efficient deterministic approximate computation of the bootstrap distribution for any linear bootstrap method $T_n^*$, alleviating the need for repeated resampling from observations (resp. input-derived data). In

Revising the Wilks Scoring System for pro RAW PowerliftingMar 22 2019Purpose: In powerlifting the total result is highly dependent on the athletes bodyweight. Powerlifting is divided to equipped and RAW types. Pro RAW powerlifting competitions use the Wilks scoring system to compare and rank powerlifting results across

Three issues impeding communication of statistical methodology for incomplete dataMar 21 2019We identify three issues permeating the literature on statistical methodology for incomplete data written for non-specialist statisticians and other investigators. The first is a mathematical defect in the notation Yobs, Ymis used to partition the data

A response-matrix-centred approach to presenting cross-section measurementsMar 15 2019Mar 21 2019The current canonical approach to publishing cross-section data is to unfold the reconstructed distributions. Detector effects like efficiency and smearing are undone mathematically, yielding distributions in true event properties. This is an ill-posed

Effects of Stochastic Parametrization on Extreme Value StatisticsMar 13 2019Extreme geophysical events are of crucial relevance to our daily life: they threaten human lives and cause property damage. To assess the risk and reduce losses, we need to model and probabilistically predict these events. Parametrizations are computational

Tutorial: Deriving The Efficient Influence Curve for Large ModelsMar 05 2019This paper aims to provide a tutorial for upper level undergraduate and graduate students in statistics and biostatistics on deriving influence functions for non-parametric and semi-parametric models. The author will build on previously known efficiency

Comparison of plotting system outputs in beginner analystsMar 03 2019The R programming language is built on an ecosystem of packages, some that allow analysts to accomplish the same tasks. For example, there are at least two clear workflows for creating data visualizations in R: using the base graphics package (referred

Bounds on Bayes Factors for Binomial A/B TestingFeb 28 2019Bayes factors, in many cases, have been proven to bridge the classic -value based significance testing and bayesian analysis of posterior odds. This paper discusses this phenomena within the binomial A/B testing setup (applicable for example to conversion

A note on Fibonacci Sequences of Random VariablesFeb 26 2019The focus of this paper is the random sequences in the form $\{X_{0},X_{1},$ $X_{n}=X_{n-2}+X_{n-1},n=2,3,..\dot{\}},$ referred to as Fibonacci Random Sequence (FRS). The initial random variables $X_{0}$ and $X_{1}$ are assumed to be absolutely continuous

Modeling the Health Expenditure in Japan, 2011. A Healthy Life Years Lost MethodologyFeb 25 2019The Healthy Life Years Lost Methodology (HLYL) is introduced to model and estimate the Health Expenditure in Japan in 2011. The HLYL theory and estimation methods are presented in our books in the Springer Series on Demographic Methods and Population

A Combinatorial Approach to Causal InferenceFeb 18 2019The objective of causal inference is to learn the network of causal relationships holding between a system of variables from the correlations that these variables exhibit; a sub-problem of which is to certify whether or not a given causal hypothesis is

Synthesis of High-Resolution Load Profiles with Minimal DataFeb 15 2019For the estimation of a new energy supply system it is an important to have high-resolution energy load profile. Such a profile is in general either not present or very costly to obtain. We will therefore present a method which synthesizes load profiles

Applications of band-limited extrapolation to forecasting of weather and financial time seriesFeb 15 2019This paper describes the practical application of causal extrapolation of sequences for the purpose of forecasting. The methods and proofs have been applied to simulations to measure the range which data can be accurately extrapolated. Real world data

Optimal BIBD-extended designsFeb 12 2019Balanced incomplete block designs (BIBDs) are a class of designs with v treatments and b blocks of size k that are optimal with regards to a wide range of optimality criteria, but it is not clear which designs to choose for combinations of v, b and k

Learning spatially-correlated temporal dictionaries for calcium imagingFeb 08 2019Calcium imaging has become a fundamental neural imaging technique, aiming to recover the individual activity of hundreds of neurons in a cortical region. Current methods (mostly matrix factorization) are aimed at detecting neurons in the field-of-view

Characterization of Sine- Skewed von Mises DistributionFeb 07 2019The von Mises distribution is one of the most important distribution in statistics to deal with circular data. In this paper we will consider some basic properties and characterizations of the sine skewed von Mises distribution.

CMS Sematrix: A Tool to Aid the Development of Clinical Quality Measures (CQMs)Feb 05 2019As part of the effort to improve quality and to reduce national healthcare costs, the Centers for Medicare and Medicaid Services (CMS) are responsible for creating and maintaining an array of clinical quality measures (CQMs) for assessing healthcare structure,

Uncertainty Quantification in Molecular Signals using Polynomial Chaos ExpansionJan 30 2019Molecular signals are abundant in engineering and biological contexts, and undergo stochastic propagation in fluid dynamic channels. The received signal is sensitive to a variety of input and channel parameter variations. Currently we do not understand

Shannon's entropy and its Generalizations towards Statistics, Reliability and Information Science during 1948-2018Jan 28 2019Starting from the pioneering works of Shannon and Weiner in 1948, a plethora of works have been reported on entropy in different directions. Entropy-related review work in the direction of statistics, reliability and information science, to the best of

Variability in the interpretation of Dutch probability phrases - a risk for miscommunicationJan 28 2019Verbal probability phrases are often used to express estimated risk. In this study, focus was on the numerical interpretation of 29 Dutch probability and frequency phrases, including several complementary phrases to test (a)symmetry in their interpretation.

Organic Fiducial InferenceJan 23 2019A substantial generalization is put forward of the theory of subjective fiducial inference as it was outlined in earlier papers. In particular, this theory is extended to deal with cases where the data are discrete or categorical rather than continuous,

Fitting A Mixture Distribution to Data: TutorialJan 20 2019This paper is a step-by-step tutorial for fitting a mixture distribution to data. It merely assumes the reader has the background of calculus and linear algebra. Other required background is briefly reviewed before explaining the main algorithm. In explaining

Custodes: Auditable Hypothesis TestingJan 19 2019We present Custodes: a new approach to solving the complex issue of preventing "p-hacking" in scientific studies. The novel protocol provides a concrete and publicly auditable method for controlling false-discoveries and eliminates any potential for data

Systemic Risk: Conditional Distortion Risk MeasuresJan 15 2019Jan 28 2019In this paper, we introduce the rich classes of conditional distortion (CoD) risk measures and distortion risk contribution ($\Delta$CoD) measures as measures of systemic risk and analyze their properties and representations. The classes include the well-known

Approaching Ethical Guidelines for Data ScientistsJan 14 2019The goal of this article is to inspire data scientists to participate in the debate on the impact that their professional work has on society, and to become active in public debates on the digital world as data science professionals. How do ethical principles

Projective Decomposition and Matrix Equivalence up to ScaleJan 04 2019A data matrix may be seen simply as a means of organizing observations into rows ( e.g., by measured object) and into columns ( e.g., by measured variable) so that the observations can be analyzed with mathematical tools. As a mathematical object, a matrix

Practical Considerations for Data Collection and Management in Mobile Health Micro-randomized TrialsDec 27 2018There is a growing interest in leveraging the prevalence of mobile technology to improve health by delivering momentary, contextualized interventions to individuals' smartphones. A just-in-time adaptive intervention (JITAI) adjusts to an individual's

Generalized Score Matching for Non-Negative DataDec 26 2018A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally

Pragmatic hypotheses in the evolution of scienceDec 25 2018This paper introduces pragmatic hypotheses and relates this concept to the spiral of scientific evolution. Previous works determined a characterization of logically consistent statistical hypothesis tests and showed that the modal operators obtained from

Application of Robust Estimators in Shewhart S-ChartsDec 24 2018Maintaining the quality of manufactured products at a desired level is known to increase customer satisfaction and profitability. Shewhart control chart is the most widely used in statistical process control (SPC) technique to monitor the quality of products

Low-temperature marginal ferromagnetism explains anomalous scale-free correlations in natural flocksDec 18 2018We introduce a new ferromagnetic model capable of reproducing one of the most intriguing properties of collective behaviour in starling flocks, namely the fact that strong collective order of the system coexists with scale-free correlations of the modulus

On a flexible construction of a negative binomial modelDec 18 2018This work presents a construction of stationary Markov models with negative binomial marginal distributions. The proposal is novel in that a simple form of the corresponding transition probabilities is available, thus revealing uninvolved simulation and

Multiple testing with persistent homologyDec 16 2018We propose a general null model for persistent homology barcodes from a point cloud, to test for example acyclicity in simplicial complexes generated from point clouds. One advantage of the null model we propose is efficiency in generating a null model

Key Factor Not to Drop Out is to Attend the LectureDec 12 2018In addition to the learning check testing results performed at each lectures, we have extended the factors to find the key dropping out factors. Among them are, the number of successes in the learning check testing, the number of attendances to the follow-up

Rapid Prototyping Model for Healthcare Alternative Payment Models: Replicating the Federally Qualified Health Center Advanced Primary Care Practice DemonstrationDec 10 2018Innovation in healthcare payment and service delivery utilizes high cost, high risk pilots paired with traditional program evaluations. Decision-makers are unable to reliably forecast the impacts of pilot interventions in this complex system, complicating

Calculating CVaR and bPOE for Common Probability Distributions With Application to Portfolio Optimization and Density EstimationNov 27 2018Feb 17 2019Conditional Value-at-Risk (CVaR) and Value-at-Risk (VaR), also called the superquantile and quantile, are frequently used to characterize the tails of probability distribution's and are popular measures of risk. Buffered Probability of Exceedance (bPOE)

Castor: Contextual IoT Time Series Data and Model Management at ScaleNov 20 2018Feb

Batch Self Organizing maps for distributional data using adaptive distancesNov 17 2018Mar 29 2019The paper deals with a Batch Self Organizing Map algorithm (DBSOM) for data described by distributional-valued variables. This kind of variables is characterized to take as values one-dimensional probability or frequency distributions on a numeric support. ... More

A Model-Centric Analysis of Openness, Replication, and ReproducibilityNov 12 2018The literature on the reproducibility crisis presents several putative causes for the proliferation of irreproducible results, including HARKing, p-hacking and publication bias. Without a theory of reproducibility, however, it is difficult to determine ... More

Surrogate Modeling of Stochastic Functions - Application to computational Electromagnetic DosimetryNov 09 2018Metamodeling of complex numerical systems has recently attracted the interest of the mathematical programming community. Despite the progress in high performance computing, simulations remain costly, as a matter of fact, the assessment of the exposure ... More

Essential Collaboration Skills: The ASCCR Frame for CollaborationNov 08 2018Statistics and data science are collaborative disciplines that typically require practitioners to interact with many different people or groups. Consequently, collaboration skills are part of the personal and professional skills essential for success ... More

Using GitHub Classroom To Teach StatisticsNov 05 2018Git and GitHub are common tools for keeping track of multiple versions of data analytic content, which allow for more than one person to simultaneously work on a project. GitHub Classroom aims to provide a way for students to work on and submit their ... More

Monte Carlo Simulations on robustness of functional location estimator based on several functional depthNov 05 2018Functional data analysis has been a growing field of study in recent decades, and one fundamental task in functional data analysis is estimating the sample location. A notion called statistical depth has been extended from multivariate data to functional ... More

The Holy Grail and the Bad Sampling - A test for the homogeneity of missing proportions for evaluating the agreement between peer review and bibliometrics in the Italian research assessment exercisesOct 29 2018Two experiments for evaluating the agreement between bibliometrics and informed peer review - depending on two large samples of journal articles - were performed by the Italian governmental agency for research evaluation. They were presented as successful ... More

On a generalized form of subjective probabilityOct 25 2018This paper is motivated by the questions of how to give the concept of probability an adequate real-world meaning, and how to explain a certain type of phenomenon that can be found, for instance, in Ellsberg's paradox. It attempts to answer these questions ... More

Complementary Lipschitz continuity results for the distribution of intersections or unions of independent random sets in finite spacesOct 24 2018We prove that intersections and unions of independent random sets in finite spaces achieve a form of Lipschitz continuity. More precisely, given the distribution of a random set $\Xi$, the function mapping any random set distribution to the distribution ... More

Helix modelling through the Mardia-Holmes model framework and an extension of the Mardia-Holmes modelOct 21 2018For noisy two-dimensional data, which are approximately uniformly distributed near the circumference of an ellipse, Mardia and Holmes (1980) developed a model to fit the ellipse. In this paper we adapt their methodology to the analysis of helix data in ... More

I can see clearly now: reinterpreting statistical significanceOct 15 2018Null hypothesis significance testing remains popular despite decades of concern about misuse and misinterpretation. We believe that much of the problem is due to language: significance testing has little to do with other meanings of the word "significance". ... More

Canadian Crime Rates in the Penalty BoxOct 11 2018Oct 26 2018Over the 1962-2016 period, the Canadian violent crime rate has remained strongly correlated with National Hockey League (NHL) penalties. The Canadian property crime rate was similarly correlated with stolen base attempts in the Major League Baseball (MLB). ... More

Benchmarking in cluster analysis: A white paperSep 27 2018Oct 01 2018To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods ... More

The distortion principle for insurance pricing: properties, identification and robustnessSep 18 2018Distortion (Denneberg 1990) is a well known premium calculation principle for insurance contracts. In this paper, we study sensitivity properties of distortion functionals w.r.t. the assumptions for risk aversion as well as robustness w.r.t. ambiguity ... More

Perspective from the Literature on the Role of Expert Judgment in Scientific and Statistical Research and PracticeSep 13 2018This article, produced as a result of the Symposium on Statistical Inference, is an introduction to the literature on the function of expertise, judgment, and choice in the practice of statistics and scientific research. In particular, expert judgment ... More

Quantile Regression for Qualifying Match of GEFCom2017 Probabilistic Load ForecastingSep 10 2018We present a simple quantile regression-based forecasting method that was applied in a probabilistic load forecasting framework of the Global Energy Forecasting Competition 2017 (GEFCom2017). The hourly load data is log transformed and split into a long-term ... More

Data scraping, ingestation, and modeling: bringing data from cars.com into the intro stats classSep 09 2018New tools have made it much easier for students to develop skills to work with interesting data sets as they begin to extract meaning from data. To fully appreciate the statistical analysis cycle, students benefit from repeated experiences collecting, ... More

Non-Gaussian Stochastic Volatility Model with Jumps via Gibbs SamplerAug 31 2018Oct 01 2018In this work, we propose a model for estimating volatility from financial time series, extending the non-Gaussian family of space-state models with exact marginal likelihood proposed by Gamerman, Santos and Franco (2013). On the literature there are models ... More

Stakes are higher, risk is lower: Citation distributions are more equal in high quality journalsAug 22 2018Sep 21 2018Psychology is a discipline standing at the crossroads of hard and social sciences. Therefore it is especially interesting to study bibliometric characteristics of psychology journals. We also take two adjacent disciplines, neurosciences and sociology. ... More

The Turtleback Diagram for Conditional ProbabilityAug 21 2018We elaborate on an alternative representation of conditional probability to the usual tree diagram. We term the representation `turtleback diagram' for its resemblance to the pattern on turtle shells. Adopting the set theoretic view of events and the ... More

The Function Transformation Omics - FunomicsAug 17 2018There are no two identical leaves in the world, so how to find effective markers or features to distinguish them is an important issue. Function transformation, such as f(x,y) and f(x,y,z), can transform two, three, or multiple input/observation variables ... More

A Conversation with Jon WellnerAug 15 2018Jon August Wellner was born in Portland, Oregon, in August 1945. He received his Bachelor's degree from the University of Idaho in 1968 and his PhD degree from the University of Washington in 1975. From 1975 until 1983 he was an Assistant Professor and ... More

Allocations of Standby Redundancies to Coherent Systems with Dependent ComponentsAug 07 2018In the context of industrial engineering, standby allocation strategy is usually adopted by engineers to improve the lifetimes of coherent systems. This paper investigates the optimal allocation strategies of standby redundancies for coherent systems ... More

Sounding Spider: An Efficient Way for Representing Uncertainties in High DimensionsAug 03 2018This article proposes a visualization method for multidimensional data based on: (i) Animated functional Hypothetical Outcome Plots (f-HOPs); (ii) 3-dimensional Kiviat plot; and (iii) data sonification. In an Uncertainty Quantification (UQ) framework, ... More

Spatially varying coefficient modeling for large datasets: Eliminating N from spatial regressionsJul 17 2018While spatially varying coefficient (SVC) modeling is popular in applied science, its computational burden is substantial. This is especially true if a multiscale property of SVC is considered. Given this background, this study develops a Moran's eigenvector-based ... More

Attack and defence in cellular decision-making: lessons from machine learningJul 10 2018Jan 30 2019Machine learning algorithms are sensitive to meaningless (or "adversarial") perturbations. This is reminiscent of cellular decision-making where ligands (called "antagonists") prevent correct signalling, like in early immune recognition. We draw a formal ... More

Discussion on Using Stacking to Average Bayesian Predictive Distributions by Yao et alJun 27 2018I begin by summarizing key ideas of the paper under discussion. Then I will talk about a graphical modeling perspective, posterior contraction rates and alternative methods of aggregation. Moreover, I will also discuss possible applications of the stacking ... More

Nonparametric Confidence Regions for Veronese-Whitney Means and Antimeans on Planar Kendall Shape SpacesJun 21 2018Oct 13 2018In this paper after a brief revision of VW-means, which are extrinsic means on real and complex projective spaces, relative to the Veronese-Whitney embeddings, we give two examples of sample VW means computations on planar Kendall shape spaces. Here we ... More

A Constructive Algebraic Proof of Student's TheoremJun 21 2018Student's theorem is an important result in statistics which states that for normal population, the sample variance is independent from the sample mean and has a chi-square distribution. The existing proofs of this theorem either overly rely on advanced ... More

Kernel Methods for Nonlinear Connectivity DetectionJun 19 2018In this paper, we show that the presence of nonlinear coupling between time series may be detected employing kernel feature space representations alone dispensing with the need to go back to solve the pre-image problem to gauge model adequacy. As a consequence, ... More

Data learning from big dataJun 08 2018Technology is generating a huge and growing availability of observa tions of diverse nature. This big data is placing data learning as a central scientific discipline. It includes collection, storage, preprocessing, visualization and, essentially, statistical ... More

Unwinding the model manifold: choosing similarity measures to remove local minima in sloppy dynamical systemsMay 30 2018Feb 22 2019In this paper, we consider the problem of parameter sensitivity in models of complex dynamical systems through the lens of information geometry. We calculate the sensitivity of model behavior to variations in parameters. In most cases, models are sloppy, ... More

To Bayes or Not To Bayes? That's no longer the question!May 28 2018This paper seeks to provide a thorough account of the ubiquitous nature of the Bayesian paradigm in modern statistics, data science and artificial intelligence. Once maligned, on the one hand by those who philosophically hated the very idea of subjective ... More

Analytic moment and Laplace transform formulae for the quasi-stationary distribution of the Shiryaev diffusion on an intervalMay 19 2018We derive analytic closed-form moment and Laplace transform formulae for the quasi-stationary distribution of the classical Shiryaev diffusion restricted to the interval $[0,A]$ with absorption at a given $A>0$.

Hyperspectral Data Analysis in R: the hsdar PackageMay 14 2018Hyperspectral remote sensing is a promising tool for a variety of applications including ecology, geology, analytical chemistry and medical research. This article presents the new \hsdar package for R statistical software, which performs a variety of ... More

On estimands and the analysis of adverse events in the presence of varying follow-up times within the benefit assessment of therapiesMay 04 2018Sep 21 2018The analysis of adverse events (AEs) is a key component in the assessment of a drug's safety profile. Inappropriate analysis methods may result in misleading conclusions about a therapy's safety and consequently its benefit-risk ratio. The statistical ... More

Conjectures on Optimal Nested Generalized Group Testing AlgorithmMay 03 2018Sep 12 2018Consider a finite population of $N$ items, where item $i$ has a probability $p_i$ to be defective. The goal is to identify all items by means of group testing. This is the generalized group testing problem (GGTP hereafter). In the case of $\displaystyle ... More

BNSP: an R Package for Fitting Bayesian Semiparametric Regression Models and Variable SelectionApr 29 2018Oct 08 2018The R package BNSP provides a unified framework for semiparametric location-scale regression and stochastic search variable selection. The statistical methodology that the package is built upon utilizes basis function expansions to represent semiparametric ... More

Viewing Simpson's ParadoxApr 21 2018Well known Simpson's paradox is puzzling and surprising for many, especially for the empirical researchers and users of statistics. However there is no surprise as far as mathematical details are concerned. A lot more is written about the paradox but ... More

Resolving the Lord's ParadoxApr 21 2018An explanation to Lord's paradox using ordinary least square regression models is given. It is not a paradox at all, if the regression parameters are interpreted as predictive or as causal with stricter conditions and be aware of laws of averages. We ... More

Multiple factor analysis of distributional dataApr 19 2018In the framework of Symbolic Data Analysis (SDA), distribution-variables are a particular case of multi-valued variables: each unit is represented by a set of distributions (e.g. histograms, density functions or quantile functions), one for each variable. ... More

Bayesian model-data synthesis with an application to global Glacio-Isostatic AdjustmentApr 17 2018Apr 30 2018We introduce a framework for updating large scale geospatial processes using a model-data synthesis method based on Bayesian hierarchical modelling. Two major challenges come from updating large-scale Gaussian process and modelling non-stationarity. To ... More

Contest models highlight inherent inefficiencies of scientific funding competitionsApr 10 2018Jan 02 2019Scientific research funding is allocated largely through a system of soliciting and ranking competitive grant proposals. In these competitions, the proposals themselves are not the deliverables that the funder seeks, but instead are used by the funder ... More