Results for "Aapo Hyvärinen"

total 48took 0.08s
A Unified Probabilistic Model for Learning Latent Factors and Their Connectivities from High-Dimensional DataMay 24 2018Connectivity estimation is challenging in the context of high-dimensional data. A useful preprocessing step is to group variables into clusters, however, it is not always clear how to do so from the perspective of connectivity estimation. Another practical ... More
Clustering via Mode Seeking by Direct Estimation of the Gradient of a Log-DensityApr 20 2014Mean shift clustering finds the modes of the data probability density by identifying the zero points of the density gradient. Since it does not require to fix the number of clusters in advance, the mean shift has been a popular clustering algorithm in ... More
Variational Autoencoders and Nonlinear ICA: A Unifying FrameworkJul 10 2019The framework of variational autoencoders allows us to efficiently learn deep latent-variable models, such that the model's marginal distribution over observed variables fits the data. Often, we're interested in going a step further, and want to approximate ... More
Mode-Seeking Clustering and Density Ridge Estimation via Direct Estimation of Density-Derivative-RatiosJul 06 2017Mar 30 2018Modes and ridges of the probability density function behind observed data are useful geometric features. Mode-seeking clustering assigns cluster labels by associating data samples with the nearest modes, and estimation of density ridges enables us to ... More
Density Estimation in Infinite Dimensional Exponential FamiliesDec 12 2013Nov 02 2014In this paper, we consider an infinite dimensional exponential family, $\mathcal{P}$ of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space, $H$ and show it to be quite rich in the sense that a broad class ... More
Simultaneous Estimation of Non-Gaussian Components and their Correlation StructureJun 18 2015The statistical dependencies which independent component analysis (ICA) cannot remove often provide rich information beyond the linear independent components. It would thus be very useful to estimate the dependency structure from data. While such models ... More
Deep Energy Estimator NetworksMay 21 2018Density estimation is a fundamental problem in statistical learning. This problem is especially challenging for complex high-dimensional data due to the curse of dimensionality. A promising solution to this problem is given here in an inference-free hierarchical ... More
Density Estimation in Infinite Dimensional Exponential FamiliesDec 12 2013May 26 2017In this paper, we consider an infinite dimensional exponential family, $\mathcal{P}$ of probability densities, which are parametrized by functions in a reproducing kernel Hilbert space, $H$ and show it to be quite rich in the sense that a broad class ... More
Neural-Kernelized Conditional Density EstimationJun 05 2018Conditional density estimation is a general framework for solving various problems in machine learning. Among existing methods, non-parametric and/or kernel-based methods are often difficult to use on large datasets, while methods based on neural networks ... More
Simultaneous Estimation of Non-Gaussian Components and their Correlation StructureJun 18 2015Jul 27 2017The statistical dependencies which independent component analysis (ICA) cannot remove often provide rich information beyond the linear independent components. It would thus be very useful to estimate the dependency structure from data. While such models ... More
Analysis of the intermediate-state contributions to neutrinoless double beta-minus decaysApr 06 2016A comprehensive analysis of the structure of the nuclear matrix elements (NMEs) of neutrinoless double beta-minus decays to the 0^+ ground and first excited states is performed in terms of the contributing multipole states in the intermediate nuclei of ... More
On strong measure zero subsets of {}^kappa 2 .Oct 15 1997This paper answers three questions posed by the first author. In Theorem 2.6 we show that the family of strong measure zero subsets of {}^{omega_1}2 is 2^{aleph_1}-additive under GMA and CH. In Theorem 3.1 we prove that the generalized Borel conjecture ... More
GraphChi-DB: Simple Design for a Scalable Graph Database System -- on Just a PCMar 04 2014We propose a new data structure, Parallel Adjacency Lists (PAL), for efficiently managing graphs with billions of edges on disk. The PAL structure is based on the graph storage model of GraphChi (Kyrola et. al., OSDI 2012), but we extend it to enable ... More
Diffusion-emission theory of photon enhanced thermionic emission solar energy harvestersMay 30 2012Aug 24 2012Numerical and semi-analytical models are presented for photon-enhanced-thermionic-emission (PETE) devices. The models take diffusion of electrons, inhomogeneous photogeneration, and bulk and surface recombination into account. The efficiencies of PETE ... More
Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICAMay 20 2016Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new intuitive principle of unsupervised deep learning from ... More
Neural Empirical BayesMar 06 2019We formulate a novel framework that unifies kernel density estimation and empirical Bayes, where we address a broad set of problems in unsupervised learning with a geometric interpretation rooted in the concentration of measure phenomenon. We start by ... More
On the Identifiability of the Post-Nonlinear Causal ModelMay 09 2012By taking into account the nonlinear effect of the cause, the inner noise effect, and the measurement distortion effect in the observed variables, the post-nonlinear (PNL) causal model has demonstrated its excellent performance in distinguishing the cause ... More
Korn's inequality and John domainsMar 03 2016It is quite well known that Korn's inequality is true on all John domains. We are interested in the converse implication under assumption of so called separation condition of the domain. Our result implies that in a simply connected planar domain the ... More
Boundary blow up under Sobolev mappingsApr 15 2013We prove that for mappings $W^{1,n}(B^n, \R^n),$ continuous up to the boundary, with modulus of continuity satisfying certain divergence condition, the image of the boundary of the unit ball has zero $n$-Hausdorff measure. For H\"older continuous mappings ... More
Korn inequality on irregular domainsJul 04 2013In this paper, we study the weighted Korn inequality on some irregular domains, e.g., $s$-John domains and domains satisfying quasi-hyperbolic boundary conditions. Examples regarding sharpness of the Korn inequality on these domains are presented. Moreover, ... More
Source Separation and Higher-Order Causal Analysis of MEG and EEGMar 15 2012Separation of the sources and analysis of their connectivity have been an important topic in EEG/MEG analysis. To solve this problem in an automatic manner, we propose a two-layer model, in which the sources are conditionally uncorrelated from each other, ... More
Korn's inequality and John domainsMar 03 2016Jun 20 2017It is quite well known that Korn's inequality is true on all John domains. We are interested in the converse implication under assumption of so called separation condition of the domain. Our result implies that in a simply connected planar domain the ... More
Solvability of the divergence equation implies John via Poincaré inequalityJul 04 2013Let $\Omega \subset \rr^2$ be a bounded simply connected domain. We show that, for a fixed (every) $p\in (1,\fz),$ the divergence equation $\mathrm{div}\,\mathbf{v}=f$ is solvable in $W^{1,p}_0(\Omega)^2$ for every $f\in L^p_0(\Omega)$, if and only if ... More
Nonlinear ICA Using Auxiliary Variables and Generalized Contrastive LearningMay 22 2018Feb 04 2019Nonlinear ICA is a fundamental problem for unsupervised representation learning, emphasizing the capacity to recover the underlying latent variables generating the data (i.e., identifiability). Recently, the very first identifiability proofs for nonlinear ... More
Mappings of finite distortion: compactness of the branch setSep 25 2017Jun 26 2018We show that an entire branched cover of finite distortion cannot have a compact branch set if its distortion satisfies a certain asymptotic growth condition. We furthermore show that this bound is strict by constructing an entire, continuous, open and ... More
On proper branched coverings and a question of VuorinenApr 29 2019We study global injectivity of proper branched coverings defined on the Euclidean $n$-ball in the case when the branch set is compact. In particular we show that such mappings are homeomorphisms when $n=3$ or when the branch set is empty. This proves ... More
Weak regularity of the inverse under minimal assumptionsApr 10 2018May 22 2019Let $\Omega\subset\mathbb{R}^3$ be a domain and let $f\in BV_{\operatorname{loc}}(\Omega,\mathbb{R}^3)$ be a homeomorphism such that its distributional adjugate is a finite Radon measure. We show that its inverse has bounded variation $f^{-1}\in BV_{\operatorname{loc}}$. ... More
Weak regularity of the inverse under minimal assumptionsApr 10 2018Let $\Omega\subset\mathbb{R}^3$ be a domain and let $f\in BV_{\operatorname{loc}}(\Omega,\mathbb{R}^3)$ be a homeomorphism such that its distributional adjoint is a finite Radon measure. We show that its inverse has bounded variation $f^{-1}\in BV_{\operatorname{loc}}$. ... More
Information criteria for non-normalized modelsMay 15 2019Many statistical models are given in the form of non-normalized densities with an intractable normalization constant. Since maximum likelihood estimation is computationally intensive for these models, several estimation methods have been developed which ... More
A Family of Computationally Efficient and Simple Estimators for Unnormalized Statistical ModelsMar 15 2012We introduce a new family of estimators for unnormalized statistical models. Our family of estimators is parameterized by two nonlinear functions and uses a single sample from an auxiliary distribution, generalizing Maximum Likelihood Monte Carlo estimation ... More
Expectation Propagation for Neural Networks with Sparsity-promoting PriorsMar 27 2013We propose a novel approach for nonlinear regression using a two-layer neural network (NN) model structure with sparsity-favoring hierarchical priors on the network weights. We present an expectation propagation (EP) approach for approximate integration ... More
On distributional adjugate and derivative of the inverseApr 09 2019Let $\Omega\subset\er^3$ be a domain and let $f\in BV_{\loc}(\Omega,\er^3)$ be a homeomorphism such that its distributional adjugate $\Adj Df$ is a finite Radon measure. Very recently in \cite{HKL} it was shown that its inverse has bounded variation $f^{-1}\in ... More
A direct method for estimating a causal ordering in a linear non-Gaussian acyclic modelAug 09 2014Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the datagenerating process of variables. Recently, it ... More
Causal Discovery with General Non-Linear Relationships Using Non-Linear ICAApr 19 2019We consider the problem of inferring causal relationships between two or more passively observed variables. While the problem of such causal discovery has been extensively studied especially in the bivariate setting, the majority of current methods assume ... More
GraphLab: A Distributed Framework for Machine Learning in the CloudJul 05 2011Machine Learning (ML) techniques are indispensable in a wide range of fields. Unfortunately, the exponential increase of dataset sizes are rapidly extending the runtime of sequential algorithms and threatening to slow future progress in ML. With the promise ... More
Discovery of non-gaussian linear causal models using ICAJul 04 2012In recent years, several methods have been proposed for the discovery of causal structure from non-experimental data (Spirtes et al. 2000; Pearl 2000). Such methods make various assumptions on the data generating process to facilitate its identification ... More
Estimation of causal orders in a linear non-Gaussian acyclic model: a method robust against latent confoundersApr 09 2012We consider to learn a causal ordering of variables in a linear non-Gaussian acyclic model called LiNGAM. Several existing methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But, the ... More
Finding Exogenous Variables in Data with Many More Variables than ObservationsApr 06 2009Apr 07 2011Many statistical methods have been proposed to estimate causal models in classical situations with fewer variables than observations (p<n, p: the number of variables and n: the number of observations). However, modern datasets including gene expression ... More
Bridging Information Criteria and Parameter Shrinkage for Model SelectionJul 08 2013Model selection based on classical information criteria, such as BIC, is generally computationally demanding, but its properties are well studied. On the other hand, model selection based on parameter shrinkage by $\ell_1$-type penalties is computationally ... More
Parallel Coordinate Descent for L1-Regularized Loss MinimizationMay 26 2011We propose Shotgun, a parallel coordinate descent algorithm for minimizing L1-regularized losses. Though coordinate descent seems inherently sequential, we prove convergence bounds for Shotgun which predict linear speedups, up to a problem-dependent limit. ... More
ParceLiNGAM: A causal ordering method robust against latent confoundersMar 29 2013Jul 29 2013We consider learning a causal ordering of variables in a linear non-Gaussian acyclic model called LiNGAM. Several existing methods have been shown to consistently estimate a causal ordering assuming that all the model assumptions are correct. But, the ... More
Causal discovery of linear acyclic models with arbitrary distributionsJun 13 2012An important task in data analysis is the discovery of causal relationships between observed variables. For continuous-valued data, linear acyclic causal models are commonly used to model the data-generating process, and the inference of such models is ... More
GraphLab: A New Framework For Parallel Machine LearningAug 09 2014Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML ... More
Distributed GraphLab: A Framework for Machine Learning in the CloudApr 26 2012While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead ... More
GraphLab: A New Framework for Parallel Machine LearningJun 25 2010Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML ... More
Thermoelectric bolometers based on ultra-thin heavily doped single-crystal silicon membranesApr 08 2017We present ultra-thin silicon membrane thermocouple bolometers suitable for fast and sensitive detection of low levels of thermal power and infrared radiation at room temperature. The devices are based on 40 nm-thick strain tuned single crystalline silicon ... More
DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation modelJan 13 2011Apr 07 2011Structural equation models and Bayesian networks have been widely used to analyze causal relations between continuous variables. In such frameworks, linear acyclic models are typically used to model the data-generating process of variables. Recently, ... More
Accurate, Large Minibatch SGD: Training ImageNet in 1 HourJun 08 2017Apr 30 2018Deep learning thrives with large neural networks and large datasets. However, larger networks and larger datasets result in longer training times that impede research and development progress. Distributed synchronous SGD offers a potential solution to ... More