total 2210took 0.13s

Online Adaptive Statistical Compressed Sensing of Gaussian Mixture ModelsDec 26 2011A framework of online adaptive statistical compressed sensing is introduced for signals following a mixture model. The scheme first uses non-adaptive measurements, from which an online decoding scheme estimates the model selection. As soon as a candidate ... More

Monomial Gamma Monte Carlo SamplingFeb 25 2016Feb 29 2016We unify slice sampling and Hamiltonian Monte Carlo (HMC) sampling by demonstrating their connection under the canonical transformation from Hamiltonian mechanics. This insight enables us to extend HMC and slice sampling to a broader family of samplers, ... More

Tensor-Dictionary Learning with Deep Kruskal-Factor AnalysisDec 08 2016Mar 05 2017A multi-way factor analysis model is introduced for tensor-variate data of any order. Each data item is represented as a (sparse) sum of Kruskal decompositions, a Kruskal-factor analysis (KFA). KFA is nonparametric and can infer both the tensor-rank of ... More

A Deep Generative Deconvolutional Image ModelDec 23 2015A deep generative model is developed for representation and analysis of images, based on a hierarchical convolutional dictionary-learning framework. Stochastic {\em unpooling} is employed to link consecutive layers in the model, yielding top-down image ... More

Task-Driven Adaptive Statistical Compressive Sensing of Gaussian Mixture ModelsJan 25 2012A framework for adaptive and non-adaptive statistical compressive sensing is developed, where a statistical model replaces the standard sparsity model of classical compressive sensing. We propose within this framework optimal task-specific sensing protocols ... More

Negative Binomial Process Count and Mixture ModelingSep 15 2012Oct 13 2013The seemingly disjoint problems of count and mixture modeling are united under the negative binomial (NB) process. A gamma process is employed to model the rate measure of a Poisson process, whose normalization provides a random probability measure for ... More

Adaptive Temporal Compressive Sensing for VideoFeb 14 2013Oct 15 2013This paper introduces the concept of adaptive temporal compressive sensing (CS) for video. We propose a CS algorithm to adapt the compression ratio based on the scene's temporal complexity, computed from the compressed data, without compromising the quality ... More

Levy Measure Decompositions for the Beta and Gamma ProcessesJun 18 2012We develop new representations for the Levy measures of the beta and gamma processes. These representations are manifested in terms of an infinite sum of well-behaved (proper) beta and gamma distributions. Further, we demonstrate how these infinite sums ... More

Augment-and-Conquer Negative Binomial ProcessesSep 05 2012Feb 15 2013By developing data augmentation methods unique to the negative binomial (NB) distribution, we unite seemingly disjoint count and mixture models under the NB process framework. We develop fundamental properties of the models and derive efficient Gibbs ... More

Variational Autoencoder for Deep Learning of Images, Labels and CaptionsSep 28 2016A novel variational autoencoder is developed to model images, as well as associated labels or captions. The Deep Generative Deconvolutional Network (DGDN) is used as a decoder of the latent image features, and a deep Convolutional Neural Network (CNN) ... More

Coded aperture compressive temporal imagingFeb 04 2013We use mechanical translation of a coded aperture for code division multiple access compression of video. We present experimental results for reconstruction at 148 frames per coded snapshot.

StoryGAN: A Sequential Conditional GAN for Story VisualizationDec 06 2018Apr 18 2019We propose a new task, called Story Visualization. Given a multi-sentence paragraph, the story is visualized by generating a sequence of images, one for each sentence. In contrast to video generation, story visualization focuses less on the continuity ... More

Factored Temporal Sigmoid Belief Networks for Sequence LearningMay 22 2016Deep conditional generative models are developed to simultaneously learn the temporal dependencies of multiple sequences. The model is designed by introducing a three-way weight tensor to capture the multiplicative interactions between side information ... More

A Generative Model for Deep Convolutional LearningApr 15 2015A generative model is developed for deep (multi-layered) convolutional dictionary learning. A novel probabilistic pooling operation is integrated into the deep model, yielding efficient bottom-up (pretraining) and top-down (refinement) probabilistic learning. ... More

Revisiting the Softmax Bellman Operator: New Benefits and New PerspectiveDec 02 2018May 19 2019The impact of softmax on the value function itself in reinforcement learning (RL) is often viewed as problematic because it leads to sub-optimal value (or Q) functions and interferes with the contraction properties of the Bellman operator. Surprisingly, ... More

Compressive Sensing via Convolutional Factor AnalysisJan 11 2017We solve the compressive sensing problem via convolutional factor analysis, where the convolutional dictionaries are learned {\em in situ} from the compressed measurements. An alternating direction method of multipliers (ADMM) paradigm for compressive ... More

Adversarial Self-Paced Learning for Mixture Models of Hawkes ProcessesJun 20 2019We propose a novel adversarial learning strategy for mixture models of Hawkes processes, leveraging data augmentation techniques of Hawkes process in the framework of self-paced learning. Instead of learning a mixture model directly from a set of event ... More

Cross-Domain Multitask Learning with Latent Probit ModelsJun 27 2012Learning multiple tasks across heterogeneous domains is a challenging problem since the feature space may not be the same for different tasks. We assume the data in multiple tasks are generated from a latent common domain via sparse domain transforms ... More

Stochastic Blockmodels meet Graph Neural NetworksMay 14 2019Stochastic blockmodels (SBM) and their variants, $e.g.$, mixed-membership and overlapping stochastic blockmodels, are latent variable based generative models for graphs. They have proven to be successful for various tasks, such as discovering the community ... More

On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order IntegratorsOct 21 2016Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic ... More

Generalized Bregman Divergence and Gradient of Mutual Information for Vector Poisson ChannelsJan 28 2013May 09 2013We investigate connections between information-theoretic and estimation-theoretic quantities in vector Poisson channel models. In particular, we generalize the gradient of mutual information with respect to key system parameters from the scalar to the ... More

Superposition-Assisted Stochastic Optimization for Hawkes ProcessesFeb 13 2018Feb 14 2018We consider the learning of multi-agent Hawkes processes, a model containing multiple Hawkes processes with shared endogenous impact functions and different exogenous intensities. In the framework of stochastic maximum likelihood estimation, we explore ... More

Learning Registered Point Processes from Idiosyncratic ObservationsOct 03 2017Feb 13 2018A parametric point process model is developed, with modeling based on the assumption that sequential observations often share latent phenomena, while also possessing idiosyncratic effects. An alternating optimization method is proposed to learn a "registered" ... More

Deep Generative Models for Relational Data with Side InformationJun 16 2017We present a probabilistic framework for overlapping community discovery and link prediction for relational data, given as a graph. The proposed framework has: (1) a deep architecture which enables us to infer multiple layers of latent features/communities ... More

A Probabilistic Framework for Nonlinearities in Stochastic Neural NetworksSep 18 2017We present a probabilistic framework for nonlinearities, based on doubly truncated Gaussian distributions. By setting the truncation points appropriately, we are able to generate various types of nonlinearities within a unified framework, including sigmoid, ... More

Zero-Truncated Poisson Tensor Factorization for Massive Binary TensorsAug 18 2015We present a scalable Bayesian model for low-rank factorization of massive tensors with binary observations. The proposed model has the following key properties: (1) in contrast to the models based on the logistic or probit likelihood, using a zero-truncated ... More

Generative Deep Deconvolutional LearningDec 18 2014Feb 22 2015A generative Bayesian model is developed for deep (multi-layer) convolutional dictionary learning. A novel probabilistic pooling operation is integrated into the deep model, yielding efficient bottom-up and top-down probabilistic learning. After learning ... More

Revisiting the Softmax Bellman Operator: Theoretical Properties and Practical BenefitsDec 02 2018The softmax function has been primarily employed in reinforcement learning (RL) to improve exploration and provide a differentiable approximation to the max function, as also observed in the mellowmax paper by Asadi and Littman. This paper instead focuses ... More

Scalable Gromov-Wasserstein Learning for Graph Partitioning and MatchingMay 18 2019We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis. The proposed method is based on the fact that Gromov-Wasserstein discrepancy is a pseudometric ... More

Nonlocal Low-Rank Tensor Factor Analysis for Image RestorationMar 19 2018Low-rank signal modeling has been widely leveraged to capture non-local correlation in image processing applications. We propose a new method that employs low-rank tensor factor analysis for tensors generated by grouped image patches. The low-rank tensors ... More

Scalable Gromov-Wasserstein Learning for Graph Partitioning and MatchingMay 18 2019May 22 2019We propose a scalable Gromov-Wasserstein learning (S-GWL) method and establish a novel and theoretically-supported paradigm for large-scale graph analysis. The proposed method is based on the fact that Gromov-Wasserstein discrepancy is a pseudometric ... More

Interpretable ICD Code Embeddings with Self- and Mutual-Attention MechanismsJun 13 2019We propose a novel and interpretable embedding method to represent the international statistical classification codes of diseases and related health problems (i.e., ICD codes). This method considers a self-attention mechanism within the disease domain ... More

Finite sample posterior concentration in high-dimensional regressionJul 20 2012Jan 03 2014We study the behavior of the posterior distribution in high-dimensional Bayesian Gaussian linear regression models having $p\gg n$, with $p$ the number of predictors and $n$ the sample size. Our focus is on obtaining quantitative finite sample bounds ... More

On Connecting Stochastic Gradient MCMC and Differential PrivacyDec 25 2017Significant success has been realized recently on applying machine learning to real-world applications. There have also been corresponding concerns on the privacy of training data, which relates to data security and confidentiality issues. Differential ... More

Tree-Structure Bayesian Compressive Sensing for VideoOct 12 2014A Bayesian compressive sensing framework is developed for video reconstruction based on the color coded aperture compressive temporal imaging (CACTI) system. By exploiting the three dimension (3D) tree structure of the wavelet and Discrete Cosine Transformation ... More

GO Gradient for Expectation-Based ObjectivesJan 17 2019Within many machine learning algorithms, a fundamental problem concerns efficient calculation of an unbiased gradient wrt parameters $\gammav$ for expectation-based objectives $\Ebb_{q_{\gammav} (\yv)} [f(\yv)]$. Most existing methods either (i) suffer ... More

Adaptive Feature Abstraction for Translating Video to TextNov 23 2016Nov 17 2017Previous models for video captioning often use the output from a specific layer of a Convolutional Neural Network (CNN) as video features. However, the variable context-dependent semantics in the video may make it more appropriate to adaptively select ... More

An inner-loop free solution to inverse problems using deep neural networksSep 06 2017Nov 14 2017We propose a new method that uses deep learning techniques to accelerate the popular alternating direction method of multipliers (ADMM) solution for inverse problems. The ADMM updates consist of a proximity operator, a least squares regression that includes ... More

Lognormal and Gamma Mixed Negative Binomial RegressionJun 27 2012In regression analysis of counts, a lack of simple and efficient algorithms for posterior computation has made Bayesian approaches appear unattractive and thus underdeveloped. We propose a lognormal and gamma mixed negative binomial (NB) regression model ... More

Gromov-Wasserstein Learning for Graph Matching and Node EmbeddingJan 17 2019A novel Gromov-Wasserstein learning framework is proposed to jointly match (align) graphs and learn embedding vectors for the associated graph nodes. Using Gromov-Wasserstein discrepancy, we measure the dissimilarity between two graphs and find their ... More

Inferring Latent Structure From Mixed Real and Categorical Relational DataJun 27 2012We consider analysis of relational data (a matrix), in which the rows correspond to subjects (e.g., people) and the columns correspond to attributes. The elements of the matrix may be a mix of real and categorical. Each subject and attribute is characterized ... More

Syntax-Infused Variational Autoencoder for Text GenerationJun 05 2019We present a syntax-infused variational autoencoder (SIVAE), that integrates sentences with their syntactic trees to improve the grammar of generated sentences. Distinct from existing VAE-based text generative models, SIVAE contains two separate latent ... More

On Norm-Agnostic Robustness of Adversarial TrainingMay 15 2019Adversarial examples are carefully perturbed in-puts for fooling machine learning models. A well-acknowledged defense method against such examples is adversarial training, where adversarial examples are injected into training data to increase robustness. ... More

Stochastic Gradient MCMC with Stale GradientsOct 21 2016Stochastic gradient MCMC (SG-MCMC) has played an important role in large-scale Bayesian learning, with well-developed theoretical convergence properties. In such applications of SG-MCMC, it is becoming increasingly popular to employ distributed systems, ... More

Scalable Bayesian Non-Negative Tensor Factorization for Massive Count DataAug 18 2015We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well ... More

Variational Gaussian Copula InferenceJun 19 2015May 18 2016We utilize copulas to constitute a unified framework for constructing and optimizing variational proposals in hierarchical Bayesian models. For models with continuous and non-Gaussian hidden variables, we propose a semiparametric and automated variational ... More

Reconstruction of Signals Drawn from a Gaussian Mixture from Noisy Compressive MeasurementsJul 02 2013Mar 17 2014This paper determines to within a single measurement the minimum number of measurements required to successfully reconstruct a signal drawn from a Gaussian mixture model in the low-noise regime. The method is to develop upper and lower bounds that are ... More

Learning Context-Sensitive Convolutional Filters for Text ProcessingSep 25 2017Aug 30 2018Convolutional neural networks (CNNs) have recently emerged as a popular building block for natural language processing (NLP). Despite their success, most existing CNN models employed in NLP share the same learned (and static) set of filters for all input ... More

Stochastic Gradient Monomial Gamma SamplerJun 05 2017Jan 10 2018Recent advances in stochastic gradient techniques have made it possible to estimate posterior distributions from large datasets via Markov Chain Monte Carlo (MCMC). However, when the target posterior is multimodal, mixing performance is often poor. This ... More

Unsupervised Learning with Truncated Gaussian Graphical ModelsNov 15 2016Gaussian graphical models (GGMs) are widely used for statistical modeling, because of ease of inference and the ubiquitous use of the normal distribution in practical approximations. However, they are also known for their limited modeling abilities, due ... More

Adaptive Feature Abstraction for Translating Video to LanguageNov 23 2016A new model for video captioning is developed, using a deep three-dimensional Convolutional Neural Network (C3D) as an encoder for videos and a Recurrent Neural Network (RNN) as a decoder for captions. We consider both "hard" and "soft" attention mechanisms, ... More

Multiscale Shrinkage and Lévy ProcessesJan 11 2014A new shrinkage-based construction is developed for a compressible vector $\boldsymbol{x}\in\mathbb{R}^n$, for cases in which the components of $\xv$ are naturally associated with a tree structure. Important examples are when $\xv$ corresponds to the ... More

Beta-Negative Binomial Process and Poisson Factor AnalysisDec 15 2011Feb 04 2012A beta-negative binomial (BNB) process is proposed, leading to a beta-gamma-Poisson process, which may be viewed as a "multi-scoop" generalization of the beta-Bernoulli process. The BNB process is augmented into a beta-gamma-gamma-Poisson hierarchical ... More

Deep Temporal Sigmoid Belief Networks for Sequence ModelingSep 23 2015Deep dynamic generative models are developed to learn sequential dependencies in time-series data. The multi-layered model is designed by constructing a hierarchy of temporal sigmoid belief networks (TSBNs), defined as a sequential stack of sigmoid belief ... More

High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep ModelsDec 23 2015Learning in deep models using Bayesian methods has generated significant attention recently. This is largely because of the feasibility of modern Bayesian methods to yield scalable learning and inference, while maintaining a measure of uncertainty in ... More

Gromov-Wasserstein Learning for Graph Matching and Node EmbeddingJan 17 2019May 07 2019A novel Gromov-Wasserstein learning framework is proposed to jointly match (align) graphs and learn embedding vectors for the associated graph nodes. Using Gromov-Wasserstein discrepancy, we measure the dissimilarity between two graphs and find their ... More

Nonlinear Statistical Learning with Truncated Gaussian Graphical ModelsJun 02 2016We introduce the truncated Gaussian graphical model (TGGM) as a novel framework for designing statistical models for nonlinear learning. A TGGM is a Gaussian graphical model (GGM) with a subset of variables truncated to be nonnegative. The truncated variables ... More

Nested Dictionary Learning for Hierarchical Organization of Imagery and TextOct 16 2012A tree-based dictionary learning model is developed for joint analysis of imagery and associated text. The dictionary learning may be applied directly to the imagery from patches, or to general feature vectors extracted from patches or superpixels (using ... More

Earliness-Aware Deep Convolutional Networks for Early Time Series ClassificationNov 14 2016We present Earliness-Aware Deep Convolutional Networks (EA-ConvNets), an end-to-end deep learning framework, for early classification of time series data. Unlike most existing methods for early classification of time series data, that are designed to ... More

Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural NetworksDec 23 2015Effective training of deep neural networks suffers from two main issues. The first is that the parameter spaces of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient ... More

Unsupervised Learning with Truncated Gaussian Graphical ModelsNov 15 2016Nov 20 2016Gaussian graphical models (GGMs) are widely used for statistical modeling, because of ease of inference and the ubiquitous use of the normal distribution in practical approximations. However, they are also known for their limited modeling abilities, due ... More

Multi-Label Learning from Medical Plain Text with Convolutional Residual ModelsJan 15 2018Aug 08 2018Predicting diagnoses from Electronic Health Records (EHRs) is an important medical application of multi-label learning. We propose a convolutional residual model for multi-label classification from doctor notes in EHR data. A given patient may have multiple ... More

Bridging the Gap between Stochastic Gradient MCMC and Stochastic OptimizationDec 25 2015Aug 05 2016Stochastic gradient Markov chain Monte Carlo (SG-MCMC) methods are Bayesian analogs to popular stochastic optimization methods; however, this connection is not well studied. We explore this relationship by applying simulated annealing to an SGMCMC algorithm. ... More

Certified Adversarial Robustness with Additive Gaussian NoiseSep 10 2018May 29 2019The existence of adversarial data examples has drawn significant attention in the deep-learning community; such data are seemingly minimally perturbed relative to the original data, but lead to very different outputs from a deep-learning algorithm. Although ... More

Diffusion Maps for Textual Network EmbeddingMay 24 2018Jan 14 2019Textual network embedding leverages rich text information associated with the network to learn low-dimensional vectorial representations of vertices. Rather than using typical natural language processing (NLP) approaches, recent research exploits the ... More

Generative Adversarial Network Training is a Continual Learning ProblemNov 27 2018Generative Adversarial Networks (GANs) have proven to be a powerful framework for learning to draw samples from complex distributions. However, GANs are also notoriously difficult to train, with mode collapse and oscillations a common problem. We hypothesize ... More

Policy Optimization as Wasserstein Gradient FlowsAug 09 2018Policy optimization is a core component of reinforcement learning (RL), and most existing RL methods directly optimize parameters of a policy based on maximizing the expected total reward, or its surrogate. Though often achieving encouraging empirical ... More

Second-Order Adversarial Attack and Certifiable RobustnessSep 10 2018We propose a powerful second-order attack method that outperforms existing attack methods on reducing the accuracy of state-of-the-art defense models based on adversarial training. The effectiveness of our attack method motivates an investigation of provable ... More

Communications Inspired Linear Discriminant AnalysisJun 27 2012We study the problem of supervised linear dimensionality reduction, taking an information-theoretic viewpoint. The linear projection matrix is designed by maximizing the mutual information between the projected signal and the class label (based on a Shannon ... More

Learning Structural Weight Uncertainty for Sequential Decision-MakingDec 30 2017Apr 02 2018Learning probability distributions on the weights of neural networks (NNs) has recently proven beneficial in many applications. Bayesian methods, such as Stein variational gradient descent (SVGD), offer an elegant framework to reason about NN model uncertainty. ... More

Nonlinear Statistical Learning with Truncated Gaussian Graphical ModelsJun 02 2016Nov 20 2016We introduce the truncated Gaussian graphical model (TGGM) as a novel framework for designing statistical models for nonlinear learning. A TGGM is a Gaussian graphical model (GGM) with a subset of variables truncated to be nonnegative. The truncated variables ... More

Survival Function Matching for Calibrated Time-to-Event PredictionsMay 21 2019Models for predicting the time of a future event are crucial for risk assessment, across a diverse range of applications. Existing time-to-event (survival) models have focused primarily on preserving pairwise ordering of estimated event times, or relative ... More

Deconvolutional Latent-Variable Model for Text Sequence MatchingSep 21 2017Nov 22 2017A latent-variable model is introduced for text matching, inferring sentence representations by jointly optimizing generative and discriminative objectives. To alleviate typical optimization challenges in latent-variable models for text, we employ deconvolutional ... More

A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMCSep 04 2017Stochastic gradient Markov Chain Monte Carlo (SG-MCMC) has been developed as a flexible family of scalable Bayesian sampling algorithms. However, there has been little theoretical analysis of the impact of minibatch size to the algorithm's convergence ... More

Scalable Thompson Sampling via Optimal TransportFeb 19 2019Thompson sampling (TS) is a class of algorithms for sequential decision-making, which requires maintaining a posterior distribution over a model. However, calculating exact posterior distributions is intractable for all but the simplest models. Consequently, ... More

Distilled Wasserstein Learning for Word Embedding and Topic ModelingSep 12 2018We propose a novel Wasserstein method with a distillation mechanism, yielding joint learning of word embeddings and topics. The proposed method is based on the fact that the Euclidean distance between word embeddings may be employed as the underlying ... More

Improved Semantic-Aware Network Embedding with Fine-Grained Word AlignmentAug 29 2018Network embeddings, which learn low-dimensional representations for each vertex in a large-scale network, have received considerable attention in recent years. For a wide range of applications, vertices in a network are typically accompanied by rich textual ... More

Benefits from Superposed Hawkes ProcessesOct 14 2017Feb 14 2018The superposition of temporal point processes has been studied for many years, although the usefulness of such models for practical applications has not be fully developed. We investigate superposed Hawkes process as an important class of such models, ... More

A Bayesian Nonparametric Approach to Image Super-resolutionSep 22 2012Super-resolution methods form high-resolution images from low-resolution images. In this paper, we develop a new Bayesian nonparametric model for super-resolution. Our method uses a beta-Bernoulli process to learn a set of recurring visual patterns, called ... More

Towards Amortized Ranking-Critical Training for Collaborative FilteringJun 10 2019Collaborative filtering is widely used in modern recommender systems. Recent research shows that variational autoencoders (VAEs) yield state-of-the-art performance by integrating flexible representations from deep neural networks into latent variable ... More

Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process MixtureMay 28 2013Nov 01 2013This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a low-variance asymptotic analysis ... More

Towards Unifying Hamiltonian Monte Carlo and Slice SamplingFeb 25 2016Jan 10 2018We unify slice sampling and Hamiltonian Monte Carlo (HMC) sampling, demonstrating their connection via the Hamiltonian-Jacobi equation from Hamiltonian mechanics. This insight enables extension of HMC and slice sampling to a broader family of samplers, ... More

Learning a Hybrid Architecture for Sequence Regression and AnnotationDec 16 2015When learning a hidden Markov model (HMM), sequen- tial observations can often be complemented by real-valued summary response variables generated from the path of hid- den states. Such settings arise in numerous domains, includ- ing many applications ... More

VAE Learning via Stein Variational Gradient DescentApr 18 2017Nov 17 2017A new method for learning variational autoencoders (VAEs) is developed, based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance ... More

LMVP: Video Predictor with Leaked Motion InformationJun 24 2019We propose a Leaked Motion Video Predictor (LMVP) to predict future frames by capturing the spatial and temporal dependencies from given inputs. The motion is modeled by a newly proposed component, motion guider, which plays the role of both learner and ... More

Towards Unifying Hamiltonian Monte Carlo and Slice SamplingFeb 25 2016Oct 17 2016We unify slice sampling and Hamiltonian Monte Carlo (HMC) sampling, demonstrating their connection via the Hamiltonian-Jacobi equation from Hamiltonian mechanics. This insight enables extension of HMC and slice sampling to a broader family of samplers, ... More

Topic-Guided Variational Autoencoders for Text GenerationMar 17 2019We propose a topic-guided variational autoencoder (TGVAE) model for text generation. Distinct from existing variational autoencoder (VAE) based approaches, which assume a simple Gaussian prior for the latent code, our model specifies the prior as a Gaussian ... More

Predicting Smoking Events with a Time-Varying Semi-Parametric Hawkes Process ModelSep 05 2018Health risks from cigarette smoking -- the leading cause of preventable death in the United States -- can be substantially reduced by quitting. Although most smokers are motivated to quit, the majority of quit attempts fail. A number of studies have explored ... More

Deconvolutional Paragraph Representation LearningAug 16 2017Sep 22 2017Learning latent representations from long text sequences is an important first step in many natural language processing applications. Recurrent Neural Networks (RNNs) have become a cornerstone for this challenging task. However, the quality of sentences ... More

Adversarial Symmetric Variational AutoencoderNov 14 2017Nov 18 2017A new form of variational autoencoder (VAE) is developed, in which the joint distribution of data and codes is considered in two (symmetric) forms: ($i$) from observed data fed through the encoder to yield codes, and ($ii$) from latent codes drawn from ... More

Continuous-Time Flows for Efficient Inference and Density EstimationSep 04 2017Aug 01 2018Two fundamental problems in unsupervised learning are efficient inference for latent-variable models and robust density estimation based on large amounts of unlabeled data. Algorithms for the two tasks, such as normalizing flows and generative adversarial ... More

Sequence Generation with Guider NetworkNov 02 2018Sequence generation with reinforcement learning (RL) has received significant attention recently. However, a challenge with such methods is the sparse-reward problem in the RL training process, in which a scalar guiding signal is often only available ... More

Low-Cost Compressive Sensing for Color Video and DepthFeb 27 2014A simple and inexpensive (low-power and low-bandwidth) modification is made to a conventional off-the-shelf color video camera, from which we recover {multiple} color frames for each of the original measured frames, and each of the recovered frames can ... More

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL VanishingMar 25 2019Jun 10 2019Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter ... More

Semantic Compositional Networks for Visual CaptioningNov 23 2016A Semantic Compositional Network (SCN) is developed for image captioning, in which semantic concepts (i.e., tags) are detected from the image, and the probability of each tag is used to compose the parameters in a long short-term memory (LSTM) network. ... More

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL VanishingMar 25 2019May 26 2019Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter ... More

Adversarial Time-to-Event ModelingApr 09 2018Jun 07 2018Modern health data science applications leverage abundant molecular and electronic health data, providing opportunities for machine learning to build statistical models to support clinical practice. Time-to-event analysis, also called survival analysis, ... More

Learning Generic Sentence Representations Using Convolutional Neural NetworksNov 23 2016Jul 26 2017We propose a new encoder-decoder approach to learn distributed sentence representations that are applicable to multiple purposes. The model is learned by using a convolutional neural network as an encoder to map an input sentence into a continuous vector, ... More

Video Generation From TextOct 01 2017Generating videos from text has proven to be a significant challenge for existing generative models. We tackle this problem by training a conditional generative model to extract both static and dynamic information from text. This is manifested in a hybrid ... More

Understanding and Accelerating Particle-Based Variational InferenceJul 04 2018Jul 16 2019Particle-based variational inference methods (ParVIs) have gained attention in the Bayesian inference literature, for their capacity to yield flexible and accurate approximations. We explore ParVIs from the perspective of Wasserstein gradient flows, and ... More