total 12626took 0.10s

Online Sampling from Log-Concave DistributionsFeb 21 2019Given a sequence of convex functions $f_0, f_1, \ldots, f_T$, we study the problem of sampling from the Gibbs distribution $\pi_t \propto e^{-\sum_{k=0}^t f_k}$ for each epoch $t$ in an online manner. This problem occurs in applications to machine learning, ... More

Learned Step Size QuantizationFeb 21 2019We present here Learned Step Size Quantization, a method for training deep networks such that they can run at inference time using low precision integer matrix multipliers, which offer power and space advantages over high precision alternatives. The essence ... More

Domain Partitioning NetworkFeb 21 2019Standard adversarial training involves two agents, namely a generator and a discriminator, playing a mini-max game. However, even if the players converge to an equilibrium, the generator may only recover a part of the target data distribution, in a situation ... More

Statistics and Samples in Distributional Reinforcement LearningFeb 21 2019We present a unifying framework for designing and analysing distributional reinforcement learning (DRL) algorithms in terms of recursively estimating statistics of the return distribution. Our key insight is that DRL algorithms can be decomposed as the ... More

Deep Learning Multidimensional ProjectionsFeb 21 2019Dimensionality reduction methods, also known as projections, are frequently used for exploring multidimensional data in machine learning, data science, and information visualization. Among these, t-SNE and its variants have become very popular for their ... More

An information criterion for auxiliary variable selection in incomplete data analysisFeb 21 2019Statistical inference is considered for variables of interest, called primary variables, when auxiliary variables are observed along with the primary variables. We consider the setting of incomplete data analysis, where some primary variables are not ... More

LOSSGRAD: automatic learning rate in gradient descentFeb 20 2019In this paper, we propose a simple, fast and easy to implement algorithm LOSSGRAD (locally optimal step-size in gradient descent), which automatically modifies the step-size in gradient descent during neural networks training. Given a function $f$, a ... More

Noisy multi-label semi-supervised dimensionality reductionFeb 20 2019Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively ... More

A Note on Bounding Regret of the C$^2$UCB Contextual Combinatorial BanditFeb 20 2019We revisit the proof by Qin et al. (2014) of bounded regret of the C$^2$UCB contextual combinatorial bandit. We demonstrate an error in the proof of volumetric expansion of the moment matrix, used in upper bounding a function of context vector norms. ... More

A Random Subspace Technique That Is Resistant to a Limited Number of Features Corrupted by an AdversaryFeb 19 2019In this paper, we consider batch supervised learning where an adversary is allowed to corrupt instances with arbitrarily large noise. The adversary is allowed to corrupt any $l$ features in each instance and the adversary can change their values in any ... More

Fast Neural Network Verification via Shadow PricesFeb 19 2019To use neural networks in safety-critical settings it is paramount to provide assurances on their runtime operation. Recent work on ReLU networks has sought to verify whether inputs belonging to a bounded box can ever yield some undesirable output. Input-splitting ... More

Simplifying Graph Convolutional NetworksFeb 19 2019Graph Convolutional Networks (GCNs) and their variants have experienced significant attention and have become the de facto methods for learning graph representations. GCNs derive inspiration primarily from recent deep learning approaches, and as a result, ... More

Global Convergence of Adaptive Gradient Methods for An Over-parameterized Neural NetworkFeb 19 2019Adaptive gradient methods like AdaGrad are widely used in optimizing neural networks. Yet, existing convergence guarantees for adaptive gradient methods require either convexity or smoothness, and, in the smooth setting, only guarantee convergence to ... More

Adaptive Cross-Modal Few-Shot LearningFeb 19 2019Metric-based meta-learning techniques have successfully been applied to few-shot classification problems. However, leveraging cross-modal information in a few-shot setting has yet to be explored. When the support from visual information is limited in ... More

Investigating Generalisation in Continuous Deep Reinforcement LearningFeb 19 2019Feb 20 2019Deep Reinforcement Learning has shown great success in a variety of control tasks. However, it is unclear how close we are to the vision of putting Deep RL into practice to solve real world problems. In particular, common practice in the field is to train ... More

Investigating Generalisation in Continuous Deep Reinforcement LearningFeb 19 2019Deep Reinforcement Learning has shown great success in a variety of control tasks. However, it is unclear how close we are to the vision of putting Deep RL into practice to solve real world problems. In particular, common practice in the field is to train ... More

Evaluating model calibration in classificationFeb 19 2019Probabilistic classifiers output a probability distribution on target classes rather than just a class prediction. Besides providing a clear separation of prediction and decision making, the main advantage of probabilistic models is their ability to represent ... More

DEDPUL: Method for Mixture Proportion Estimation and Positive-Unlabeled Classification based on Density EstimationFeb 19 2019This paper studies Positive-Unlabeled Classification, the problem of semi-supervised binary classification in the case when Negative (N) class in the training set is contaminated with instances of Positive (P) class. We develop a novel method (DEDPUL) ... More

On the Convergence of EM for truncated mixtures of two GaussiansFeb 19 2019Motivated by a recent result of Daskalakis et al. \cite{DGTZ18}, we analyze the population version of Expectation-Maximization (EM) algorithm for the case of \textit{truncated} mixtures of two Gaussians. Truncated samples from a $d$-dimensional mixture ... More

Proper-Composite Loss Functions in Arbitrary DimensionsFeb 19 2019The study of a machine learning problem is in many ways is difficult to separate from the study of the loss function being used. One avenue of inquiry has been to look at these loss functions in terms of their properties as scoring rules via the proper-composite ... More

Hyperbolic Discounting and Learning over Multiple HorizonsFeb 19 2019Feb 20 2019Reinforcement learning (RL) typically defines a discount factor as part of the Markov Decision Process. The discount factor values future rewards by an exponential scheme that leads to theoretical convergence guarantees of the Bellman equation. However, ... More

Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient DescentFeb 18 2019A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for ... More

A parallel Fortran framework for neural networks and deep learningFeb 18 2019This paper describes neural-fortran, a parallel Fortran framework for neural networks and deep learning. It features a simple interface to construct feed-forward neural networks of arbitrary structure and size, several activation functions, and stochastic ... More

Incremental Cluster Validity Indices for Hard Partitions: Extensions and Comparative StudyFeb 18 2019Validation is one of the most important aspects of clustering, but most approaches have been batch methods. Recently, interest has grown in providing incremental alternatives. This paper extends the incremental cluster validity index (iCVI) family to ... More

On Evaluating Adversarial RobustnessFeb 18 2019Correctly evaluating defenses against adversarial examples has proven to be extremely difficult. Despite the significant amount of recent work attempting to design defenses that withstand adaptive attacks, few have succeeded; most papers that propose ... More

RACE: Sub-Linear Memory Sketches for Approximate Near-Neighbor Search on Streaming DataFeb 18 2019We demonstrate the first possibility of a sub-linear memory sketch for solving the approximate near-neighbor search problem. In particular, we develop an online sketching algorithm that can compress $N$ vectors into a tiny sketch consisting of small arrays ... More

Is a single unique Bayesian network enough to accurately represent your data?Feb 18 2019Bayesian network (BN) modelling is extensively used in systems epidemiology. Usually it consists in selecting and reporting the best-fitting structure conditional to the data. A major practical concern is avoiding overfitting, on account of its extreme ... More

Going deep in clustering high-dimensional data: deep mixtures of unigrams for uncovering topics in textual dataFeb 18 2019Mixtures of Unigrams (Nigam et al., 2000) are one of the simplest and most efficient tools for clustering textual data, as they assume that documents related to the same topic have similar distributions of terms, naturally described by Multinomials. When ... More

Fast Efficient Hyperparameter Tuning for Policy GradientsFeb 18 2019The performance of policy gradient methods is sensitive to hyperparameter settings that must be tuned for any new application. Widely used grid search methods for tuning hyperparameters are sample inefficient and computationally expensive. More advanced ... More

STCN: Stochastic Temporal Convolutional NetworksFeb 18 2019Convolutional architectures have recently been shown to be competitive on many sequence modelling tasks when compared to the de-facto standard of recurrent neural networks (RNNs), while providing computational and modeling advantages due to inherent parallelism. ... More

The Kalai-Smorodinski solution for many-objective Bayesian optimizationFeb 18 2019An ongoing aim of research in multiobjective Bayesian optimization is to extend its applicability to a large number of objectives. While coping with a limited budget of evaluations, recovering the set of optimal compromise solutions generally requires ... More

Intra- and Inter-epoch Temporal Context Network (IITNet) for Automatic Sleep Stage ScoringFeb 18 2019This study proposes a novel deep learning model, called IITNet, to learn intra- and inter-epoch temporal contexts from a raw single channel electroencephalogram (EEG) for automatic sleep stage scoring. When sleep experts identify the sleep stage of a ... More

Optimizing Stochastic Gradient Descent in Text Classification Based on Fine-Tuning Hyper-Parameters Approach. A Case Study on Automatic Classification of Global Terrorist AttacksFeb 18 2019The objective of this research is to enhance performance of Stochastic Gradient Descent (SGD) algorithm in text classification. In our research, we proposed using SGD learning with Grid-Search approach to fine-tuning hyper-parameters in order to enhance ... More

Prediction of Porosity and Permeability Alteration based on Machine Learning AlgorithmsFeb 18 2019The objective of this work is to study the applicability of various Machine Learning algorithms for prediction of some rock properties which geoscientists usually define due to special lab analysis. We demonstrate that these special properties can be ... More

Designing recurrent neural networks by unfolding an l1-l1 minimization algorithmFeb 18 2019We propose a new deep recurrent neural network (RNN) architecture for sequential signal reconstruction. Our network is designed by unfolding the iterations of the proximal gradient method that solves the l1-l1 minimization problem. As such, our network ... More

Grids versus Graphs: Partitioning Space for Improved Taxi Demand-Supply ForecastsFeb 18 2019Accurate taxi demand-supply forecasting is a challenging application of ITS (Intelligent Transportation Systems), due to the complex spatial and temporal patterns. We investigate the impact of different spatial partitioning techniques on the prediction ... More

Differentially Private Continual LearningFeb 18 2019Catastrophic forgetting can be a significant problem for institutions that must delete historic data for privacy reasons. For example, hospitals might not be able to retain patient data permanently. But neural networks trained on recent data alone will ... More

A Unifying Bayesian View of Continual LearningFeb 18 2019Some machine learning applications require continual learning - where data comes in a sequence of datasets, each is used for training and then permanently discarded. From a Bayesian perspective, continual learning seems straightforward: Given the model ... More

Optimized data exploration applied to the simulation of a chemical processFeb 18 2019In complex simulation environments, certain parameter space regions may result in non-convergent or unphysical outcomes. All parameters can therefore be labeled with a binary class describing whether or not they lead to valid results. In general, it can ... More

Detecting and Diagnosing Incipient Building Faults Using Uncertainty Information from Deep Neural NetworksFeb 18 2019Early detection of incipient faults is of vital importance to reducing maintenance costs, saving energy, and enhancing occupant comfort in buildings. Popular supervised learning models such as deep neural networks are considered promising due to their ... More

A One-Class Support Vector Machine Calibration Method for Time Series Change Point DetectionFeb 18 2019It is important to identify the change point of a system's health status, which usually signifies an incipient fault under development. The One-Class Support Vector Machine (OC-SVM) is a popular machine learning model for anomaly detection and hence could ... More

Quantized Frank-Wolfe: Communication-Efficient Distributed OptimizationFeb 17 2019How can we efficiently mitigate the overhead of gradient communications in distributed optimization? This problem is at the heart of training scalable machine learning models and has been mainly studied in the unconstrained setting. In this paper, we ... More

A semi-supervised deep residual network for mode detection in Wi-Fi signalsFeb 17 2019Due to their ubiquitous and pervasive nature, Wi-Fi networks have the potential to collect large-scale, low-cost, and disaggregate data on multimodal transportation. In this study, we develop a semi-supervised deep residual network (ResNet) framework ... More

ODIN: ODE-Informed Regression for Parameter and State Inference in Time-Continuous Dynamical SystemsFeb 17 2019Parameter inference in ordinary differential equations is an important problem in many applied sciences and in engineering, especially in a data-scarce setting. In this work, we introduce a novel generative modeling approach based on constrained Gaussian ... More

Separating common (global and local) and distinct variation in multiple mixed types data setsFeb 17 2019Multiple sets of measurements on the same objects obtained from different platforms may reflect partially complementary information of the studied system. The integrative analysis of such data sets not only provides us with the opportunity of a deeper ... More

Multiple Document Representations from News Alerts for Automated Bio-surveillance Event DetectionFeb 17 2019Due to globalization, geographic boundaries no longer serve as effective shields for the spread of infectious diseases. In order to aid bio-surveillance analysts in disease tracking, recent research has been devoted to developing information retrieval ... More

Context-Based Dynamic Pricing with Online ClusteringFeb 17 2019We consider a context-based dynamic pricing problem of online products which have low sales. Sales data from Alibaba, a major global online retailer, illustrate the prevalence of low-sale products. For these products, existing single-product dynamic pricing ... More

WiSE-VAE: Wide Sample Estimator VAEFeb 16 2019Variational Auto-encoders (VAEs) have been very successful as methods for forming compressed latent representations of complex, often high-dimensional, data. In this paper, we derive an alternative variational lower bound from the one common in VAEs, ... More

WiSE-VAE: Wide Sample Estimator VAEFeb 16 2019Feb 19 2019Variational Auto-encoders (VAEs) have been very successful as methods for forming compressed latent representations of complex, often high-dimensional, data. In this paper, we derive an alternative variational lower bound from the one common in VAEs, ... More

Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth OptimizationFeb 16 2019Proximal gradient method has been playing an important role to solve many machine learning tasks, especially for the nonsmooth problems. However, in some machine learning problems such as the bandit model and the black-box learning problem, proximal gradient ... More

A Little Is Enough: Circumventing Defenses For Distributed LearningFeb 16 2019Distributed learning is central for large-scale training of deep-learning models. However, they are exposed to a security threat in which Byzantine participants can interrupt or control the learning process. Previous attack models and their corresponding ... More

Deep Convolutional Sum-Product Networks for Probabilistic Image RepresentationsFeb 16 2019Sum-Product Networks (SPNs) are hierarchical probabilistic graphical models capable of fast and exact inference. Applications of SPNs to real-world data such as large image datasets has been fairly limited in previous literature. We introduce Convolutional ... More

Making Convex Loss Functions Robust to Outliers using $e$-Exponentiated TransformationFeb 16 2019In this paper, we propose a novel $e$-exponentiated transformation, $0.5< e<1$, for loss functions. When the transformation is applied to a convex loss function, the transformed loss function enjoys the following desirable property: for one layer network, ... More

Screening Rules for Lasso with Non-Convex Sparse RegularizersFeb 16 2019Leveraging on the convexity of the Lasso problem , screening rules help in accelerating solvers by discarding irrelevant variables, during the optimization process. However, because they provide better theoretical guarantees in identifying relevant variables, ... More

RES-SE-NET: Boosting Performance of Resnets by Enhancing Bridge-connectionsFeb 16 2019One of the ways to train deep neural networks effectively is to use residual connections. Residual connections can be classified as being either identity connections or bridge-connections with a reshaping convolution. Empirical observations on CIFAR-10 ... More

TopicEq: A Joint Topic and Mathematical Equation Model for Scientific TextsFeb 16 2019Scientific documents rely on both mathematics and text to communicate ideas. Inspired by the topical correspondence between mathematical equations and word contexts observed in scientific texts, we propose a novel topic model that jointly generates mathematical ... More

Towards Explainable AI: Significance Tests for Neural NetworksFeb 16 2019Neural networks underpin many of the best-performing AI systems. Their success is largely due to their strong approximation properties, superior predictive performance, and scalability. However, a major caveat is explainability: neural networks are often ... More

Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limitFeb 16 2019We consider learning two layer neural networks using stochastic gradient descent. The mean-field description of this learning dynamics approximates the evolution of the network weights by an evolution in the space of probability distributions in $R^D$ ... More

ProLoNets: Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement LearningFeb 15 2019Deep reinforcement learning has seen great success across a breadth of tasks such as in game playing and robotic manipulation. However, the modern practice of attempting to learn tabula rasa disregards the logical structure of many domains and the wealth ... More

Asymptotic Finite Sample Information Losses in Neural ClassifiersFeb 15 2019This paper considers the subject of information losses arising from finite datasets used in the training of neural classifiers. It proves a relationship between such losses and the product of the expected total variation of the estimated neural model ... More

On resampling vs. adjusting probabilistic graphical models in estimation of distribution algorithmsFeb 15 2019The Bayesian Optimisation Algorithm (BOA) is an Estimation of Distribution Algorithm (EDA) that uses a Bayesian network as probabilistic graphical model (PGM). Determining the optimal Bayesian network structure given a solution sample is an NP-hard problem. ... More

Robustness of Neural Networks: A Probabilistic and Practical ApproachFeb 15 2019Neural networks are becoming increasingly prevalent in software, and it is therefore important to be able to verify their behavior. Because verifying the correctness of neural networks is extremely challenging, it is common to focus on the verification ... More

Adaptive Sequence SubmodularityFeb 15 2019In many machine learning applications, one needs to interactively select a sequence of items (e.g., recommending movies based on a user's feedback) or make sequential decisions in certain orders (e.g., guiding an agent through a series of states). Not ... More

Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse ReparameterizationFeb 15 2019Deep neural networks are typically highly over-parameterized with pruning techniques able to remove a significant fraction of network parameters with little loss in accuracy. Recently, techniques based on dynamic re-allocation of non-zero parameters have ... More

The excluded area of two-dimensional hard particlesFeb 15 2019The excluded area between a pair of two-dimensional hard particles with given relative orientation is the region in which one particle cannot be located due to the presence of the other particle. The magnitude of the excluded area as a function of the ... More

Translation Insensitivity for Deep Convolutional Gaussian ProcessesFeb 15 2019Deep learning has been at the foundation of large improvements in image classification. To improve the robustness of predictions, Bayesian approximations have been used to learn parameters in deep neural networks. We follow an alternative approach, by ... More

The Fairness of Risk Scores Beyond Classification: Bipartite Ranking and the xAUC MetricFeb 15 2019Where machine-learned predictive risk scores inform high-stakes decisions, such as bail and sentencing in criminal justice, fairness has been a serious concern. Recent work has characterized the disparate impact that such risk scores can have when used ... More

Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisationsFeb 15 2019T-distributed stochastic neighbour embedding (t-SNE) is a widely used data visualisation technique. It differs from its predecessor SNE by the low-dimensional similarity kernel: the Gaussian kernel was replaced by the heavy-tailed Cauchy kernel, solving ... More

Robust Reinforcement Learning in POMDPs with Incomplete and Noisy ObservationsFeb 15 2019In real-world scenarios, the observation data for reinforcement learning with continuous control is commonly noisy and part of it may be dynamically missing over time, which violates the assumption of many current methods developed for this. We addressed ... More

Fast Task-Aware Architecture InferenceFeb 15 2019Neural architecture search has been shown to hold great promise towards the automation of deep learning. However in spite of its potential, neural architecture search remains quite costly. To this point, we propose a novel gradient-based framework for ... More

Asymptotically exact data augmentation: models, properties and algorithmsFeb 15 2019Data augmentation, by the introduction of auxiliary variables, has become an ubiquitous technique to improve mixing/convergence properties, simplify the implementation or reduce the computational time of inference methods such as Markov chain Monte Carlo. ... More

SVM-based Deep Stacking NetworksFeb 15 2019The deep network model, with the majority built on neural networks, has been proved to be a powerful framework to represent complex data for high performance machine learning. In recent years, more and more studies turn to nonneural network approaches ... More

Efficient Deep Learning of GMMsFeb 15 2019We show that a collection of Gaussian mixture models (GMMs) in $R^{n}$ can be optimally classified using $O(n)$ neurons in a neural network with two hidden layers (deep neural network), whereas in contrast, a neural network with a single hidden layer ... More

Learning to Adaptively Scale Recurrent Neural NetworksFeb 15 2019Recent advancements in recurrent neural network (RNN) research have demonstrated the superiority of utilizing multiscale structures in learning temporal representations of time series. Currently, most of multiscale RNNs use fixed scales, which do not ... More

Learning Topological Representation for Networks via Hierarchical SamplingFeb 15 2019The topological information is essential for studying the relationship between nodes in a network. Recently, Network Representation Learning (NRL), which projects a network into a low-dimensional vector space, has been shown their advantages in analyzing ... More

AutoQB: AutoML for Network Quantization and Binarization on Mobile DevicesFeb 15 2019In this paper, we propose a hierarchical deep reinforcement learning (DRL)-based AutoML framework, AutoQB, to automatically explore the design space of channel-level network quantization and binarization for hardware-friendly deep learning on mobile devices. ... More

Lipschitz Generative Adversarial NetsFeb 15 2019In this paper we study the convergence of generative adversarial networks (GANs) from the perspective of the informativeness of the gradient of the optimal discriminative function. We show that GANs without restriction on the discriminative function space ... More

ProxSARAH: An Efficient Algorithmic Framework for Stochastic Composite Nonconvex OptimizationFeb 15 2019In this paper, we propose a new stochastic algorithmic framework to solve stochastic composite nonconvex optimization problems that covers both finite-sum and expectation settings. Our algorithms rely on the SARAH estimator introduced in (Nguyen et al., ... More

KINN: Incorporating Expert Knowledge in Neural NetworksFeb 15 2019The promise of ANNs to automatically discover and extract useful features/patterns from data without dwelling on domain expertise although seems highly promising but comes at the cost of high reliance on large amount of accurately labeled data, which ... More

Reinforcement Learning Without Backpropagation or a ClockFeb 15 2019Feb 18 2019In this paper we introduce a reinforcement learning (RL) approach for training policies, including artificial neural network policies, that is both backpropagation-free and clock-free. It is backpropagation-free in that it does not propagate any information ... More

Classification with unknown class conditional label noise on non-compact feature spacesFeb 14 2019We investigate the problem of classification in the presence of unknown class conditional label noise in which the labels observed by the learner have been corrupted with some unknown class dependent probability. In order to obtain finite sample rates, ... More

WaveletFCNN: A Deep Time Series Classification Model for Wind Turbine Blade Icing DetectionFeb 14 2019Wind power, as an alternative to burning fossil fuels, is plentiful and renewable. Data-driven approaches are increasingly popular for inspecting the wind turbine failures. In this paper, we propose a novel classification-based anomaly detection system ... More

CrossNorm: Normalization for Off-Policy TD Reinforcement LearningFeb 14 2019Off-policy Temporal Difference (TD) learning methods, when combined with function approximators, suffer from the risk of divergence, a phenomenon known as the deadly triad. It has long been noted that some feature representations work better than others. ... More

A Probabilistic framework for Quantum ClusteringFeb 14 2019Quantum Clustering is a powerful method to detect clusters in data with mixed density. However, it is very sensitive to a length parameter that is inherent to the Schr\"odinger equation. In addition, linking data points into clusters requires local estimates ... More

Learning to Control Self-Assembling Morphologies: A Study of Generalization via ModularityFeb 14 2019Contemporary sensorimotor learning approaches typically start with an existing complex agent (e.g., a robotic arm), which they learn to control. In contrast, this paper investigates a modular co-evolution strategy: a collection of primitive agents learns ... More

Unsupervised Visuomotor Control through Distributional Planning NetworksFeb 14 2019While reinforcement learning (RL) has the potential to enable robots to autonomously acquire a wide range of skills, in practice, RL usually requires manual, per-task engineering of reward functions, especially in real world settings where aspects of ... More

Classifying Treatment Responders Under Causal Effect MonotonicityFeb 14 2019In the context of individual-level causal inference, we study the problem of predicting whether someone will respond or not to a treatment based on their features and past examples of features, treatment indicator (e.g., drug/no drug), and a binary outcome ... More

A Broad Class of Discrete-Time Hypercomplex-Valued Hopfield Neural NetworksFeb 14 2019In this paper, we address the stability of a broad class of discrete-time hypercomplex-valued Hopfield-type neural networks. To ensure the neural networks belonging to this class always settle down at a stationary state, we introduce novel hypercomplex ... More

Convergence analysis of Tikhonov regularization for non-linear statistical inverse learning problemsFeb 14 2019We study a non-linear statistical inverse learning problem, where we observe the noisy image of a quantity through a non-linear operator at some random design points. We consider the widely used Tikhonov regularization (or method of regularization, MOR) ... More

Sinkhorn Divergence of Topological Signature Estimates for Time Series ClassificationFeb 14 2019Distinguishing between classes of time series sampled from dynamic systems is a common challenge in systems and control engineering, for example in the context of health monitoring, fault detection, and quality control. The challenge is increased when ... More

Generalisation in fully-connected neural networks for time series forecastingFeb 14 2019In this paper we study the generalisation capabilities of fully-connected neural networks trained in the context of time series forecasting. Time series do not satisfy the typical assumption in statistical learning theory of the data being i.i.d. samples ... More

On Reinforcement Learning Using Monte Carlo Tree Search with Supervised Learning: Non-Asymptotic AnalysisFeb 14 2019Inspired by the success of AlphaGo Zero (AGZ) which utilizes Monte Carlo Tree Search (MCTS) with Supervised Learning via Neural Network to learn the optimal policy and value function, in this work, we focus on establishing formally that such an approach ... More

On Many-to-Many Mapping Between Concordance Correlation Coefficient and Mean Square ErrorFeb 14 2019The concordance correlation coefficient (CCC) is one of the most widely used reproducibility indices, introduced by Lin in 1989. In addition to its extensive use in assay validation, CCC serves various different purposes in other multivariate population-related ... More

Probabilistic Neural Architecture SearchFeb 13 2019In neural architecture search (NAS), the space of neural network architectures is automatically explored to maximize predictive accuracy for a given task. Despite the success of recent approaches, most existing methods cannot be directly applied to large ... More

A Study on Graph-Structured Recurrent Neural Networks and Sparsification with Application to Epidemic ForecastingFeb 13 2019We study epidemic forecasting on real-world health data by a graph-structured recurrent neural network (GSRNN). We achieve state-of-the-art forecasting accuracy on the benchmark CDC dataset. To improve model efficiency, we sparsify the network weights ... More

Anytime Tail AveragingFeb 13 2019Tail averaging consists in averaging the last examples in a stream. Common techniques either have a memory requirement which grows with the number of samples to average, are not available at every timestep or do not accomodate growing windows. We propose ... More

How do infinite width bounded norm networks look in function space?Feb 13 2019We consider the question of what functions can be captured by ReLU networks with an unbounded number of units (infinite width), but where the overall network Euclidean norm (sum of squares of all weights in the system, except for an unregularized bias ... More

An Optimized Recurrent Unit for Ultra-Low-Power Keyword SpottingFeb 13 2019There is growing interest in being able to run neural networks on sensors, wearables and internet-of-things (IoT) devices. However, the computational demands of neural networks make them difficult to deploy on resource-constrained edge devices. To meet ... More

Differentially Private Learning of Geometric ConceptsFeb 13 2019We present differentially private efficient algorithms for learning union of polygons in the plane (which are not necessarily convex). Our algorithms achieve $(\alpha,\beta)$-PAC learning and $(\epsilon,\delta)$-differential privacy using a sample of ... More

ATMSeer: Increasing Transparency and Controllability in Automated Machine LearningFeb 13 2019To relieve the pain of manually selecting machine learning algorithms and tuning hyperparameters, automated machine learning (AutoML) methods have been developed to automatically search for good models. Due to the huge model search space, it is impossible ... More