Machine Learning Based Analysis and Quantification of Potential Power Gain from Passive Device InstallationJun 13 2019Passive device installation on wind turbine generators (WTGs) can potentially improve the power generation of WTGs. Yet, how much impact the installation will make is unclear because conducting controlled experiments is impossible due to ever-changing ... More

Interpretable ICD Code Embeddings with Self- and Mutual-Attention MechanismsJun 13 2019We propose a novel and interpretable embedding method to represent the international statistical classification codes of diseases and related health problems (i.e., ICD codes). This method considers a self-attention mechanism within the disease domain ... More

Reinforcement Learning of Spatio-Temporal Point ProcessesJun 13 2019Spatio-temporal event data is ubiquitous in various applications, such as social media, crime events, and electronic health records. Spatio-temporal point processes offer a versatile framework for modeling such event data, as it can jointly capture spatial ... More

Dynamic Time Scan ForecastingJun 12 2019The dynamic time scan forecasting method relies on the premise that the most important pattern in a time series precedes the forecasting window, i.e., the last observed values. Thus, a scan procedure is applied to identify similar patterns, or best matches, ... More

Identifying and Predicting Parkinson's Disease Subtypes through Trajectory Clustering via Bipartite NetworksJun 12 2019Parkinson's disease (PD) is a common neurodegenerative disease with a high degree of heterogeneity in its clinical features, rate of progression, and change of variables over time. In this work, we present a novel data-driven, network-based Trajectory ... More

A Bayesian Hierarchical Model for Evaluating Forensic Footwear EvidenceJun 12 2019When a latent shoeprint is discovered at a crime scene, forensic analysts inspect it for distinctive patterns of wear such as scratches and holes (known as accidentals) on the source shoe's sole. If its accidentals correspond to those of a suspect's shoe, ... More

Functional Singular Spectrum AnalysisJun 12 2019In this paper, we introduce a new extension of the Singular Spectrum Analysis (SSA) called functional SSA to analyze functional time series. The new methodology is developed by integrating ideas from functional data analysis and univariate SSA. We explore ... More

Global optimization using Sobol indicesJun 12 2019We propose and assess a new global (derivative-free) optimization algorithm, inspired by the LIPO algorithm, which uses variance-based sensitivity analysis (Sobol indices) to reduce the number of calls to the objective function. This method should be ... More

Exploring Bayesian approaches to eQTL mapping through probabilistic programmingJun 12 2019The discovery of genomic polymorphisms influencing gene expression (also known as expression quantitative trait loci or eQTLs) can be formulated as a sparse Bayesian multivariate/multiple regression problem. An important aspect in the development of such ... More

Applying economic measures to lapse risk management with machine learning approachesJun 12 2019Modeling policyholders lapse behaviors is important to a life insurer since lapses affect pricing, reserving, profitability, liquidity, risk management, as well as the solvency of the insurer. Lapse risk is indeed the most significant life underwriting ... More

Understanding artificial intelligence ethics and safetyJun 11 2019A remarkable time of human promise has been ushered in by the convergence of the ever-expanding availability of big data, the soaring speed and stretch of cloud computing platforms, and the advancement of increasingly sophisticated machine learning algorithms. ... More

Distribution-Free Multisample Test Based on Optimal Matching with Applications to Single Cell GenomicsJun 11 2019In this paper we propose a nonparametric graphical test based on optimal matching, for assessing the equality of multiple unknown multivariate probability distributions. Our procedure pools the data from the different classes to create a graph based on ... More

Estimating the Number of Fatal Victims of the Peruvian Internal Armed Conflict, 1980-2000: an application of modern multi-list Capture-Recapture techniquesJun 11 2019We estimate the number of fatal victims of the Peruvian internal armed conflict between 1980-2000 using stratified seven-list Capture-Recapture methods based on Dirichlet process mixtures, which we extend to accommodate incomplete stratification information. ... More

ProPublica's COMPAS Data RevisitedJun 11 2019In this paper I re-examine the COMPAS recidivism score and criminal history data collected by ProPublica in 2016, which has fueled intense debate and research in the nascent field of `algorithmic fairness' or `fair machine learning' over the past three ... More

Causal Inference in Higher Education: Building Better CurriculumsJun 11 2019Higher educational institutions constantly look for ways to meet students' needs and support them through graduation. Recent work in the field of learning analytics have developed methods for grade prediction and course recommendations. Although these ... More

Characterization and valuation of uncertainty of calibrated parameters in stochastic decision modelsJun 11 2019We evaluated the implications of different approaches to characterize uncertainty of calibrated parameters of stochastic decision models (DMs) in the quantified value of such uncertainty in decision making. We used a microsimulation DM of colorectal cancer ... More

Regional economic convergence and spatial quantile regressionJun 11 2019The presence of \b{eta}-convergence in European regions is an important issue to be analyzed. In this paper, we adopt a quantile regression approach in analyzing economic convergence. While previous work has performed quantile regression at the national ... More

Statistical Species IdentificationJun 11 2019Identification of taxa can be significantly assisted by statistical classification in two major ways. First, one may use a statistical model to determine taxon of subjects based on various characteristics or traits. Secondly, when faced with a collection ... More

Assessing the effects of exposure to sulfuric acid aerosol on respiratory function in adultsJun 10 2019$\textbf{Background:}$ Sulfuric acid aerosol is suspected to be a major contributor to mortality and morbidity associated with air pollution. $\textbf{Objective:}$ To determine if exposure of human participants to anticipated levels of sulfuric acid aerosol ... More

Technical Preprint: Rationale and Design of a Planned Observational Study to Evaluate the Impact of Hydrocodone Rescheduling on Opioid Prescribing After SurgeryJun 10 2019In October 2014, the US Drug Enforcement Agency (DEA) reclassified hydrocodone from Schedule III to Schedule II of the Controlled Substances Act, resulting in a prohibition on refills in the initial prescription. While this schedule change was associated ... More

The Regression Discontinuity DesignJun 10 2019This handbook chapter gives an introduction to the sharp regression discontinuity design, covering identification, estimation, inference, and falsification methods.

Intertemporal Community Detection in Bikeshare NetworksJun 10 2019We investigate the changes in the patterns of usage in the \textit{Divvy} bikeshare system in Chicago from 2016-2018. We devise a community detection method that finds clusters of nodes that are increasing, decreasing, or stable in connectivity across ... More

Confidence intervals for class prevalences under prior probability shiftJun 10 2019Point estimation of class prevalences in the presence of data set shift has been a popular research topic for more than two decades. Less attention has been paid to the construction of confidence and prediction intervals for estimates of class prevalences. ... More

Big Variates: Visualizing and identifying key variables in a multivariate worldJun 10 2019Big Data involves both a large number of events but also many variables. This paper will concentrate on the challenge presented by the large number of variables in a Big Dataset. It will start with a brief review of exploratory data visualisation for ... More

Pitfalls and Protocols in Practice of Manufacturing Data ScienceJun 10 2019The practical application of machine learning and data science (ML/DS) techniques present a range of procedural issues to be examined and resolve including those relating to the data issues, methodologies, assumptions, and applicable conditions. Each ... More

A Comprehensive Hidden Markov Model for Hourly Rainfall Time SeriesJun 10 2019For hydrological applications, such as urban flood modelling, it is often important to be able to simulate sub-daily rainfall time series from stochastic models. However, the literature is currently lacking owing to several challenges with modelling rainfall ... More

Efficient Bayesian estimation for GARCH-type models via Sequential Monte CarloJun 10 2019This paper exploits the advantages of sequential Monte Carlo (SMC) to develop parameter estimation and model selection methods for GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) style models. This approach provides an alternative method ... More

Incorporating Open Data into Introductory Courses in StatisticsJun 10 2019The 2016 Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report emphasized six recommendations to teach introductory courses in statistics. Among them: use of real data with context and purpose. Many educators have created ... More

RobustTrend: A Huber Loss with a Combined First and Second Order Difference Regularization for Time Series Trend FilteringJun 10 2019Extracting the underlying trend signal is a crucial step to facilitate time series analysis like forecasting and anomaly detection. Besides noise signal, time series can contain not only outliers but also abrupt trend changes in real-world scenarios. ... More

Modeling Excess Deaths After a Natural Disaster with Application to Hurricane MariaJun 09 2019Estimation of excess deaths due to a natural disaster is an important public health problem. The CDC provides guidelines to fill death certificates to help determine the death toll of such events. But, even when followed by medical examiners, the guidelines ... More

Graph Independence TestingJun 09 2019Identifying statistically significant dependency between variables is a key step in scientific discoveries. Many recent methods, such as distance and kernel tests, have been proposed for valid and consistent independence testing and can be applied to ... More

On Copula-based Collective Risk ModelsJun 09 2019Several collective risk models have recently been proposed by relaxing the widely used but controversial assumption of independence between claim frequency and severity. Approaches include the bivariate copula model, random effect model, and two-part ... More

On statistical Calderón problemsJun 08 2019For $D$ a bounded domain in $\mathbb R^d, d \ge 3,$ with smooth boundary $\partial D$, the non-linear inverse problem of recovering the unknown conductivity $\gamma$ determining solutions $u=u_{\gamma, f}$ of the partial differential equation \begin{equation*} ... More

A Naive Bayes Approach for NFL Passing Evaluation using Tracking Data Extracted from ImagesJun 07 2019The NFL collects detailed tracking data capturing the location of all players and the ball during each play. Although the raw form of this data is not publicly available, the NFL releases a set of aggregated statistics via their Next Gen Stats (NGS) platform. ... More

Modelling the spatial extent and severity of extreme European windstormsJun 07 2019Windstorms are a primary natural hazard affecting Europe that are commonly linked to substantial property and infrastructural damage and are responsible for the largest spatially aggregated financial losses. Such extreme winds are typically generated ... More

Structured Variational Inference in Continuous Cox Process ModelsJun 07 2019We propose a scalable framework for inference in an inhomogeneous Poisson process modeled by a continuous sigmoidal Cox process that assumes the corresponding intensity function is given by a Gaussian process (GP) prior transformed with a scaled logistic ... More

Approximate Identification of the Optimal Epidemic Source in Complex NetworksJun 07 2019Jun 10 2019We consider the problem of identifying the source of an epidemic, spreading through a network, from a complete observation of the infected nodes in a snapshot of the network. Previous work on the problem has often employed geometric, spectral or heuristic ... More

Early detection of sepsis utilizing deep learning on electronic health record event sequencesJun 07 2019The timeliness of detection of a sepsis event in progress is a crucial factor in the outcome for the patient. Machine learning models built from data in electronic health records can be used as an effective tool for improving this timeliness, but so far ... More

Association Between Intelligence and Cortical Thickness in Adolescents: Evidence from the ABCD StudyJun 07 2019The relationship between the intelligence and brain morphology is warmly concerned in cognitive field. General intelligence can be defined as the weighted sum of fluid and crystallized intelligence. Fluid abilities depend on genes and genes expression ... More

A Bayesian approach for the analysis of error rate studies in forensic scienceJun 06 2019Over the past decade, the field of forensic science has received recommendations from the National Research Council of the U.S. National Academy of Sciences, the U.S. National Institute of Standards and Technology, and the U.S. President's Council of ... More

An Inverse Optimization Approach to Measuring Clinical Pathway ConcordanceJun 06 2019Clinical pathways outline standardized processes in the delivery of care for a specific disease. Patient journeys through the healthcare system, though, can deviate substantially from recommended or reference pathways. Given the positive benefits of clinical ... More

Fast Multi-resolution Segmentation for Nonstationary Hawkes Process Using CumulantsJun 06 2019The stationarity is assumed in vanilla Hawkes process, which reduces the model complexity but introduces a strong assumption. In this paper, we propose a fast multi-resolution segmentation algorithm to capture the time-varying characteristics of nonstationary ... More

Selecting Biomarkers for building optimal treatment selection rules using Kernel MachinesJun 06 2019Optimal biomarker combinations for treatment-selection can be derived by minimizing total burden to the population caused by the targeted disease and its treatment. However, when multiple biomarkers are present, including all in the model can be expensive ... More

DOT: Gene-set analysis by combining decorrelated association statisticsJun 05 2019Historically, the majority of statistical association methods have been designed assuming availability of SNP-level information. However, modern genetic and sequencing data present new challenges to access and sharing of genotype-phenotype datasets, including ... More

Probabilistic Structure Learning for EEG/MEG Source Imaging with Hierarchical Graph PriorJun 05 2019Brain source imaging is an important method for noninvasively characterizing brain activity using Electroencephalogram (EEG) or Magnetoencephalography (MEG) recordings. Traditional EEG/MEG Source Imaging (ESI) methods usually assume that either source ... More

A Model-free Approach to Linear Least Squares Regression with Exact Probabilities and Applications to Covariate SelectionJun 05 2019The classical model for linear regression is ${\mathbold Y}={\mathbold x}{\mathbold \beta} +\sigma{\mathbold \varepsilon}$ with i.i.d. standard Gaussian errors. Much of the resulting statistical inference is based on Fisher's $F$-distribution. In this ... More

The Stanford Acuity Test: A Probabilistic Approach for Precise Visual Acuity TestingJun 05 2019Chart-based visual acuity measurements are used by billions of people to diagnose and guide treatment of vision impairment. However, the ubiquitous eye exam has no mechanism for reasoning about uncertainty and as such, suffers from a well-documented reproducibility ... More

Going Deep: Models for Continuous-TimeWithin-Play Valuation of Game Outcomesin American Football with Tracking DataJun 05 2019Continuous-time assessments of game outcomes in sports have become increasingly common in the last decade. In American football, only discrete-time estimates of play value were possible, since the most advanced public football datasets were recorded at ... More

High-resolution estimates of the foreign-born population and international migration for the United StatesJun 04 2019Detailed estimates of migration stocks and flows provides evidence for understanding population dynamics, and the impact of economic and political changes that influence migration. Using data from the 2000 decennial census and 2001-2016 American Community ... More

Visual Fixations Duration as an Indicator of Skill Level in eSportsJun 04 2019Using highly interactive systems like computer games requires a lot of visual activity and eye movements. Eye movements are best characterized by visual fixation - periods of time when the eyes stay relatively still over an object. We analyzed the distributions ... More

Partial and semi-partial measures of spatial associations for multivariate lattice dataJun 04 2019This paper concerns the development of partial and semi-partial measures of spatial associations in the context of multivariate spatial lattice data which describe global or local associations among spatially aggregated measurements for pairs of different ... More

Revision of ISO 19229 to support the certification of calibration gases for purityJun 04 2019The second edition of ISO 19229 expands the guidance in its predecessor in two ways. Firstly, it provides more support and examples describing possible experimental approaches for purity analysis. A novelty is that it describes how the beta distribution, ... More

Model Trees for PersonalizationJun 04 2019As more commerce and media consumption are being conducted online, a wealth of new opportunities are emerging for personalized advertising. We propose a general methodology, Model Trees for Personalization (MTP), for tackling a broad class of personalized ... More

Hybrid Machine Learning Forecasts for the FIFA Women's World Cup 2019Jun 03 2019In this work, we combine two different ranking methods together with several other predictors in a joint random forest approach for the scores of soccer matches. The first ranking method is based on the bookmaker consensus, the second ranking method estimates ... More

A Dyadic IRT ModelJun 03 2019We propose a dyadic Item Response Theory (dIRT) model for measuring interactions of pairs of individuals when the responses to items represent the actions (or behaviors, perceptions, etc.) of each individual (actor) made within the context of a dyad formed ... More

The Computational Structure of Unintentional MeaningJun 03 2019Speech-acts can have literal meaning as well as pragmatic meaning, but these both involve consequences typically intended by a speaker. Speech-acts can also have unintentional meaning, in which what is conveyed goes above and beyond what was intended. ... More

Stress Testing Network Reconstruction via Graphical Causal ModeJun 03 2019An optimal evaluation of the resilience in financial portfolios implies having initial hypotheses about the causal influence between the macroeconomic variables and the risk parameters. In this paper, we propose a graphical model for to infer the causal ... More

Anchored Causal Inference in the Presence of Measurement ErrorJun 03 2019We consider the problem of learning a causal graph in the presence of measurement error. This setting is for example common in genomics, where gene expression is corrupted through the measurement process. We develop a provably consistent procedure for ... More

Generalised linear models for prognosis and intervention: Theory, practice, and implications for machine learningJun 03 2019In health research, machine learning (ML) is often hailed as the new frontier of data analytics which, combined with big data, will purportedly revolutionise delivery of healthcare and ultimately lead to more informed public health policy and clinical ... More

Unconstrained representation of orthogonal matrices with application to common principle componentsJun 03 2019Many statistical problems involve the estimation of a $\left(d\times d\right)$ orthogonal matrix $\textbf{Q}$. Such an estimation is often challenging due to the orthonormality constraints on $\textbf{Q}$. To cope with this problem, we propose a very ... More

Conditional inference on the asset with maximum Sharpe ratioJun 03 2019Jun 09 2019We apply the procedure of Lee et al. to the problem of performing inference on the signal noise ratio of the asset which displays maximum sample Sharpe ratio over a set of possibly correlated assets. We find a multivariate analogue of the commonly used ... More

Copula-based functional Bayes classification with principal components and partial least squaresJun 03 2019We present a new functional Bayes classifier that uses principal component (PC) or partial least squares (PLS) scores from the common covariance function, that is, the covariance function marginalized over groups. When the groups have different covariance ... More

Joint spatial modeling of significant wave height and wave period using the SPDE approachJun 01 2019The ocean wave distribution in a specific region of space and time is described by its sea state. Knowledge about the sea states a ship encounters on a journey can be used to assess various parameters of risk and wear associated with the journey. Two ... More

Statistical analysis of the water level of Huang He river (Yellow river) in ChinaJun 01 2019Very high water levels of the large rivers are extremely dangerous events that can lead to large floods and loss of property and thousands and even tens of thousands human lives. The information from the systematical monitoring of the water levels allows ... More

Influences in Forecast Errors for Wind and Photovoltaic Power: A Study on Machine Learning ModelsMay 31 2019Despite the increasing importance of forecasts of renewable energy, current planning studies only address a general estimate of the forecast quality to be expected and selected forecast horizons. However, these estimates allow only a limited and highly ... More

Improving the resolution of CryoEM single particle analysisMay 31 2019We present a new 3D refinement method for CryoEM single particle analysis which can improve the resolution of final map comparing with traditional methods in this paper. Our approach leverages the prior information about the electron density map of macromolecules ... More

A novel hybrid model based on multi-objective Harris hawks optimization algorithm for daily PM2.5 and PM10 forecastingMay 30 2019High levels of air pollution may seriously affect people's living environment and even endanger their lives. In order to reduce air pollution concentrations, and warn the public before the occurrence of hazardous air pollutants, it is urgent to design ... More

Separating an Outlier from a ChangeMay 30 2019We study the quickest change detection problem with an unknown post-change distribution. In this scenario, the unknown change in the distribution of observations may occur in many ways without much structure, while, before change, an outlier (a false ... More

Negative binomial-reciprocal inverse Gaussian distribution: Statistical properties with applicationsMay 30 2019In this article, we propose a new three parameter distribution by compounding negative binomial with reciprocal inverse Gaussian model called negative binomial-reciprocal inverse Gaussian distribution. This model is tractable with some important properties ... More

A Block Diagonal Markov Model for Indoor Software-Defined Power Line CommunicationMay 30 2019A Semi-Hidden Markov Model (SHMM) for bursty error channels is defined by a state transition probability matrix $A$, a prior probability vector $\Pi$, and the state dependent output symbol error probability matrix $B$. Several processes are utilized for ... More

Mean-dependent nonstationary spatial modelsMay 29 2019Nonstationarity is a major challenge in analyzing spatial data. For example, daily precipitation measurements may have increased variability and decreased spatial smoothness in areas with high mean rainfall. Common nonstationary covariance models introduce ... More

NeoGuard: a public, online learning platform for neonatal seizuresMay 29 2019Seizures occur in the neonatal period more frequently than other periods of life and usually denote the presence of serious brain dysfunction. The gold standard for detecting seizures is based on visual inspection of continuous electroencephalogram (cEEG) ... More

Calculating the Expected Value of Sample Information in Practice: Considerations from Three Case StudiesMay 28 2019Investing efficiently in future research to improve policy decisions is an important goal. Expected Value of Sample Information (EVSI) can be used to select the specific design and sample size of a proposed study by assessing the benefit of a range of ... More

Can we disregard the whole model? Omnibus non-inferiority testing for $R^{2}$ in multivariable linear regression and $\hatη^{2}$ in ANOVAMay 28 2019Determining a lack of association between an outcome variable and a number of different explanatory variables is frequently necessary in order to disregard a proposed model (i.e., to confirm the lack of an association between an outcome and predictors). ... More

Attacker Behaviour Profiling using Stochastic Ensemble of Hidden Markov ModelsMay 28 2019Cyber threat intelligence is one of the emerging areas of focus in information security. Much of the recent work has focused on rule-based methods and detection of network attacks using Intrusion Detection algorithms. In this paper we propose a framework ... More

Evaluation of mineralogy per geological layers by Approximate Bayesian ComputationMay 28 2019We propose a new methodology to perform mineralogic inversion from wellbore logs based on a Bayesian linear regression model. Our method essentially relies on three steps. The first step makes use of Approximate Bayesian Computation (ABC) and selects ... More

Rare Failure Prediction via Event Matching for Aerospace ApplicationsMay 28 2019In this paper, we consider a problem of failure prediction in the context of predictive maintenance applications. We present a new approach for rare failures prediction, based on a general methodology, which takes into account peculiar properties of technical ... More

Marginalized Frailty-Based Illness-Death Model: Application to the UK-Biobank Survival DataMay 27 2019The UK Biobank is a large-scale health resource comprising genetic, environmental and medical information on approximately 500,000 volunteer participants in the UK, recruited at ages 40--69 during the years 2006--2010. The project monitors the health ... More

Private Learning and Regularized Optimal TransportMay 27 2019Private data are valuable either by remaining private (for instance if they are sensitive) or, on the other hand, by being used publicly to increase some utility. These two objectives are antagonistic and leaking data might be more rewarding than concealing ... More

Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III ErrorsMay 27 2019Statistical significance testing is widely accepted as a means to assess how well a difference in effectiveness reflects an actual difference between systems, as opposed to random noise because of the selection of topics. According to recent surveys on ... More

Adaptive probabilistic principal component analysisMay 27 2019Using the linear Gaussian latent variable model as a starting point we relax some of the constraints it imposes by deriving a nonparametric latent feature Gaussian variable model. This model introduces additional discrete latent variables to the original ... More

Robust probabilistic modeling of photoplethysmography signals with application to the classification of premature beatsMay 26 2019In this paper we propose a robust approach to model photoplethysmography (PPG) signals. After decomposing the signal into two components, we focus the analysis on the pulsatile part, related to cardiac information. The goal is to enable a deeper understanding ... More

ABCD Neurocognitive Prediction Challenge 2019: Predicting individual residual fluid intelligence scores from cortical grey matter morphologyMay 26 2019We predicted residual fluid intelligence scores from T1-weighted MRI data available as part of the ABCD NP Challenge 2019, using morphological similarity of grey-matter regions across the cortex. Individual structural covariance networks (SCN) were abstracted ... More

ABCD Neurocognitive Prediction Challenge 2019: Predicting individual fluid intelligence scores from structural MRI using probabilistic segmentation and kernel ridge regressionMay 26 2019We applied several regression and deep learning methods to predict fluid intelligence scores from T1-weighted MRI scans as part of the ABCD Neurocognitive Prediction Challenge (ABCD-NP-Challenge) 2019. We used voxel intensities and probabilistic tissue-type ... More

A Test for Differential Ascertainment in Case-Control Studies with Application to Child MaltreatmentMay 26 2019We propose a method to test for the presence of differential ascertainment in case-control studies, when data are collected by multiple sources. We show that, when differential ascertainment is present, the use of only the observed cases leads to severe ... More

Usage of multiple RTL features for Earthquake predictionMay 26 2019We construct a classification model that predicts if an earthquake with the magnitude above a threshold will take place at a given location in a time range 30-180 days from a given moment of time. A common approach is to use expert forecasts based on ... More

Sensitivity analysis using bias functions for studies extending inferences from a randomized trial to a target populationMay 25 2019Extending (generalizing or transporting) causal inferences from a randomized trial to a target population requires ``generalizability'' or ``transportability'' assumptions, which state that randomized and non-randomized individuals are exchangeable conditional ... More

Ensemble of 3D CNN regressors with data fusion for fluid intelligence predictionMay 25 2019In this work, we aim at predicting children's fluid intelligence scores based on structural T1-weighted MR images from the largest long-term study of brain development and child health. The target variable was regressed on a data collection site, socio-demographic ... More

Safely and Quickly Deploying New Features with a Staged Rollout Framework Using Sequential Test and Adaptive Experimental DesignMay 25 2019During the rapid development cycle for Internet products (websites and mobile apps), new features are developed and rolled out to users constantly. Features with code defects or design flaws can cause outages and significant degradation of user experience. ... More

The experiment is just as important as the likelihood in understanding the prior: A cautionary note on robust cognitive modellingMay 24 2019Cognitive modelling shares many features with statistical modelling, making it seem trivial to borrow from the practices of robust Bayesian statistics to protect the practice of robust cognitive modelling. We take one aspect of statistical workflow-prior ... More

Machine Learning Estimation of Heterogeneous Treatment Effects with InstrumentsMay 24 2019We consider the estimation of heterogeneous treatment effects with arbitrary machine learning methods in the presence of unobserved confounders with the aid of a valid instrument. Such settings arise in A/B tests with an intent-to-treat structure, where ... More