Results for "Quoc V. Le"

total 91024took 0.15s
Neural Architecture Search with Reinforcement LearningNov 05 2016Neural networks are powerful and flexible models that work well for many difficult learning tasks in image, speech and natural language understanding. Despite their success, neural networks are still hard to design. In this paper, we use a recurrent network ... More
EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksMay 28 2019Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing ... More
Distributed Representations of Sentences and DocumentsMay 16 2014May 22 2014Many machine learning algorithms require the input to be represented as a fixed-length feature vector. When it comes to texts, one of the most common fixed-length features is bag-of-words. Despite their popularity, bag-of-words features have two major ... More
EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksMay 28 2019Jun 10 2019Convolutional Neural Networks (ConvNets) are commonly developed at a fixed resource budget, and then scaled up for better accuracy if more resources are available. In this paper, we systematically study model scaling and identify that carefully balancing ... More
Exploiting Similarities among Languages for Machine TranslationSep 17 2013Dictionaries and phrase tables are the basis of modern statistical machine translation systems. This paper develops a method that can automate the process of generating and extending dictionaries and phrase tables. Our method can translate missing word ... More
HyperNetworksSep 27 2016Oct 28 2016This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between ... More
Neural Programmer: Inducing Latent Programs with Gradient DescentNov 16 2015Aug 04 2016Deep neural networks have achieved impressive supervised classification performance in many tasks including image recognition, speech recognition, and sequence to sequence learning. However, this success has not been translated to applications like question ... More
Sequence to Sequence Learning with Neural NetworksSep 10 2014Dec 14 2014Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this ... More
HyperNetworksSep 27 2016This work explores hypernetworks: an approach of using a small network, also known as a hypernetwork, to generate the weights for a larger network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between ... More
A Bayesian Perspective on Generalization and Stochastic Gradient DescentOct 17 2017Feb 14 2018We consider two questions at the heart of machine learning; how can we predict if a minimum will generalize to the test set, and why does stochastic gradient descent find minima that generalize well? Our work responds to Zhang et al. (2016), who showed ... More
Do Better ImageNet Models Transfer Better?May 23 2018Jun 17 2019Transfer learning is a cornerstone of computer vision, yet little work has been done to evaluate the relationship between architecture and transfer. An implicit hypothesis in modern computer vision research is that models that perform better on ImageNet ... More
Semi-supervised Sequence LearningNov 04 2015We present two approaches that use unlabeled data to improve sequence learning with recurrent networks. The first approach is to predict what comes next in a sequence, which is a conventional language model in natural language processing. The second approach ... More
HyperNetworksSep 27 2016Dec 01 2016This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between ... More
The Evolved TransformerJan 30 2019Feb 06 2019Recent works have highlighted the strengths of the Transformer architecture for dealing with sequence tasks. At the same time, neural architecture search has advanced to the point where it can outperform human-designed models. The goal of this work is ... More
Learning to Skim TextApr 23 2017Apr 29 2017Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering. Despite their promise, many recurrent models have to read ... More
Listen, Attend and SpellAug 05 2015Aug 20 2015We present Listen, Attend and Spell (LAS), a neural network that learns to transcribe speech utterances to characters. Unlike traditional DNN-HMM models, this model learns all the components of a speech recognizer jointly. Our system has two components: ... More
Selfie: Self-supervised Pretraining for Image EmbeddingJun 07 2019We introduce a pretraining technique called Selfie, which stands for SELF-supervised Image Embedding. Selfie generalizes the concept of masked language modeling to continuous data, such as images. Given masked-out patches in an input image, our method ... More
Unsupervised Pretraining for Sequence to Sequence LearningNov 08 2016Sequence to sequence models are successful tools for supervised sequence learning tasks, such as machine translation. Despite their success, these models still require much labeled data and it is unclear how to improve them using unlabeled data, which ... More
Regularized Evolution for Image Classifier Architecture SearchFeb 05 2018Feb 06 2018The effort devoted to hand-crafting image classifiers has motivated the use of architecture search to discover them automatically. Reinforcement learning and evolution have both shown promise for this purpose. This study employs a regularized version ... More
Learning Transferable Architectures for Scalable Image RecognitionJul 21 2017Apr 11 2018Developing neural network image classification models often requires significant architecture engineering. In this paper, we study a method to learn the model architectures directly on the dataset of interest. As this approach is expensive when the dataset ... More
A Simple Way to Initialize Recurrent Networks of Rectified Linear UnitsApr 03 2015Apr 07 2015Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose ... More
Document Embedding with Paragraph VectorsJul 29 2015Paragraph Vectors has been recently proposed as an unsupervised method for learning distributed representations for pieces of texts. In their work, the authors showed that the method can learn an embedding of movie review texts which can be leveraged ... More
DropBlock: A regularization method for convolutional networksOct 30 2018Deep neural networks often work well when they are over-parameterized and trained with a massive amount of noise and regularization, such as weight decay and dropout. Although dropout is widely used as a regularization technique for fully connected layers, ... More
Neural Program Synthesis with Priority Queue TrainingJan 10 2018We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best ... More
Soft Conditional ComputationApr 10 2019Conditional computation aims to increase the size and accuracy of a network, at a small increase in inference cost. Previous hard-routing models explicitly route the input to a subset of experts. We propose soft conditional computation, which, in contrast, ... More
NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object DetectionApr 16 2019Current state-of-the-art convolutional architectures for object detection are manually designed. Here we aim to learn a better architecture of feature pyramid network for object detection. We adopt Neural Architecture Search and discover a new feature ... More
Neural Optimizer Search with Reinforcement LearningSep 21 2017Sep 22 2017We present an approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. We train a Recurrent Neural Network controller to generate a string in a domain specific language that describes a mathematical ... More
Regularized Evolution for Image Classifier Architecture SearchFeb 05 2018Feb 16 2019The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers ... More
Numerical solutions of a boundary value problem on the sphere using radial basis functionsFeb 14 2014Oct 20 2016Boundary value problems on the unit sphere arise naturally in geophysics and oceanography when scientists model a physical quantity on large scales. Robust numerical methods play an important role in solving these problems. In this article, we construct ... More
Direct Optimization of Ranking MeasuresApr 25 2007Web page ranking and collaborative filtering require the optimization of sophisticated performance measures. Current Support Vector approaches are unable to optimize them directly and focus on pairwise comparisons instead. We present a new approach which ... More
A Neural Conversational ModelJun 19 2015Jul 22 2015Conversational modeling is an important task in natural language understanding and machine intelligence. Although previous approaches exist, they are often restricted to specific domains (e.g., booking an airline ticket) and require hand-crafted rules. ... More
A Neural TransducerNov 16 2015Aug 04 2016Sequence-to-sequence models have achieved impressive results on various tasks. However, they are unsuitable for tasks that require incremental predictions to be made as more data arrives or tasks that have long input sequences and output sequences. This ... More
Intriguing Properties of Adversarial ExamplesNov 08 2017It is becoming increasingly clear that many machine learning classifiers are vulnerable to adversarial examples. In attempting to explain the origin of adversarial examples, previous studies have typically focused on the fact that neural networks operate ... More
Attention Augmented Convolutional NetworksApr 22 2019Convolutional networks have been the paradigm of choice in many computer vision applications. The convolution operation however has a significant weakness in that it only operates on a local neighborhood, thus missing global information. Self-attention, ... More
Neural Program Synthesis with Priority Queue TrainingJan 10 2018Mar 23 2018We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best ... More
Neural Input Search for Large Scale Recommendation ModelsJul 10 2019Recommendation problems with large numbers of discrete items, such as products, webpages, or videos, are ubiquitous in the technology industry. Deep neural networks are being increasingly used for these recommendation problems. These models use embeddings ... More
Learning a Natural Language Interface with Neural ProgrammerNov 28 2016Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide ... More
Neural Combinatorial Optimization with Reinforcement LearningNov 29 2016This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent network that, given a set of city coordinates, predicts ... More
Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?Nov 07 2017Jun 29 2018Deep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose individual ... More
MnasNet: Platform-Aware Neural Architecture Search for MobileJul 31 2018Designing convolutional neural networks (CNN) models for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant effort has been dedicated to design and improve mobile models on all three ... More
Learning a Natural Language Interface with Neural ProgrammerNov 28 2016Mar 02 2017Learning a natural language interface for database tables is a challenging task that involves deep language understanding and multi-step reasoning. The task is often approached by mapping natural language queries to logical forms or programs that provide ... More
Transformer-XL: Attentive Language Models Beyond a Fixed-Length ContextJan 09 2019Jan 18 2019Transformer networks have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. As a solution, we propose a novel neural architecture, Transformer-XL, that enables Transformer to ... More
Domain Adaptive Transfer Learning with Specialist ModelsNov 16 2018Dec 11 2018Transfer learning is a widely used method to build high performing computer vision models. In this paper, we study the efficacy of transfer learning by examining how the choice of data impacts performance. We find that more pre-training data does not ... More
Don't Decay the Learning Rate, Increase the Batch SizeNov 01 2017Feb 24 2018It is common practice to decay the learning rate. Here we show one can usually obtain the same learning curve on both training and test sets by instead increasing the batch size during training. This procedure is successful for stochastic gradient descent ... More
Transformer-XL: Attentive Language Models Beyond a Fixed-Length ContextJan 09 2019Jun 02 2019Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length ... More
GPipe: Efficient Training of Giant Neural Networks using Pipeline ParallelismNov 16 2018Dec 12 2018GPipe is a scalable pipeline parallelism library that enables learning of giant deep neural networks. It partitions network layers across accelerators and pipelines execution to achieve high hardware utilization. It leverages recomputation to minimize ... More
BAM! Born-Again Multi-Task Networks for Natural Language UnderstandingJul 10 2019It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training ... More
Learning Longer-term Dependencies in RNNs with Auxiliary LossesMar 01 2018Jun 13 2018Despite recent advances in training recurrent neural networks (RNNs), capturing long-term dependencies in sequences remains a fundamental challenge. Most approaches use backpropagation through time (BPTT), which is difficult to scale to very long sequences. ... More
Addressing the Rare Word Problem in Neural Machine TranslationOct 30 2014May 30 2015Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches. A significant weakness in conventional NMT systems is their inability to correctly translate very ... More
Multi-task Sequence to Sequence LearningNov 19 2015Mar 01 2016Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task ... More
Efficient Neural Architecture Search via Parameter SharingFeb 09 2018Feb 12 2018We propose Efficient Neural Architecture Search (ENAS), a fast and inexpensive approach for automatic model design. In ENAS, a controller learns to discover neural network architectures by searching for an optimal subgraph within a large computational ... More
Unsupervised Data AugmentationApr 29 2019Despite its success, deep learning still needs large labeled datasets to succeed. Data augmentation has shown much promise in alleviating the need for more labeled data, but it so far has mostly been applied in supervised settings and achieved limited ... More
The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical StudyMay 09 2019We investigate how the final parameters found by stochastic gradient descent are influenced by over-parameterization. We generate families of models by increasing the number of channels in a base network, and then perform a large hyper-parameter search ... More
AutoAugment: Learning Augmentation Policies from DataMay 24 2018Apr 11 2019Data augmentation is an effective technique for improving the accuracy of modern image classifiers. However, current data augmentation implementations are manually designed. In this paper, we describe a simple procedure called AutoAugment to automatically ... More
Neural Combinatorial Optimization with Reinforcement LearningNov 29 2016Jan 12 2017This paper presents a framework to tackle combinatorial optimization problems using neural networks and reinforcement learning. We focus on the traveling salesman problem (TSP) and train a recurrent network that, given a set of city coordinates, predicts ... More
Stochastic natural gradient descent draws posterior samples in function spaceJun 25 2018Nov 28 2018Recent work has argued that stochastic gradient descent can approximate the Bayesian uncertainty in model parameters near local minima. In this work we develop a similar correspondence for minibatch natural gradient descent (NGD). We prove that for sufficiently ... More
Learning Graph MatchingJun 17 2008As a fundamental problem in pattern recognition, graph matching has applications in a variety of fields, from computer vision to computational biology. In graph matching, patterns are modeled as graphs and pattern recognition amounts to finding a correspondence ... More
MnasNet: Platform-Aware Neural Architecture Search for MobileJul 31 2018May 29 2019Designing convolutional neural networks (CNN) for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant efforts have been dedicated to design and improve mobile CNNs on all dimensions, ... More
Learning Data Augmentation Strategies for Object DetectionJun 26 2019Data augmentation is a critical component of training deep learning models. Although data augmentation has been shown to significantly improve image classification, its potential has not been thoroughly investigated for object detection. Given the additional ... More
XLNet: Generalized Autoregressive Pretraining for Language UnderstandingJun 19 2019With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with ... More
QANet: Combining Local Convolution with Global Self-Attention for Reading ComprehensionApr 23 2018Current end-to-end machine reading and question answering (Q\&A) models are primarily based on recurrent neural networks (RNNs) with attention. Despite their success, these models are often slow for both training and inference due to the sequential nature ... More
A splitting proximal point method for Nash-Cournot equilibrium models involving nonconvex cost functionsMay 13 2011Unlike convex case, a local equilibrium point of a nonconvex Nash-Cournot oligopolistic equilibrium problem may not be a global one. Finding such a local equilibrium point or even a stationary point of this problem is not an easy task. This paper deals ... More
On Two Results of Mixed MultiplicitiesJan 08 2009This paper shows that the main result of Trung-Verma in 2007 [TV] only is an immediate consequence of an improvement version of [Theorem 3.4, Vi1] in 2000.
Building high-level features using large scale unsupervised learningDec 29 2011Jul 12 2012We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a 9-layered locally connected sparse ... More
The universal Vassiliev-Kontsevich invariant for framed oriented linksJan 06 1994We give a generalization of the Reshetikhin-Turaev functor for tangles to get a combinatorial formula for the universal Vassiliev-Kontsevich invariant of framed oriented links which is coincident with the Kontsevich integral. The universal Vassiliev-Kontsevich ... More
SpecAugment: A Simple Data Augmentation Method for Automatic Speech RecognitionApr 18 2019We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking ... More
Searching for MobileNetV3May 06 2019Jun 12 2019We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware-aware network architecture search ... More
A Case Where Interference Does Not Affect The Channel DispersionApr 01 2014In 1975, Carleial presented a special case of an interference channel in which the interference does not reduce the capacity of the constituent point-to-point Gaussian channels. In this work, we show that if the inequalities in the conditions that Carleial ... More
Latent Sequence DecompositionsOct 10 2016Nov 05 2016We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes sequences with variable lengthed output units as a function of both the input sequence and the output sequence. We present a training algorithm which samples valid extensions ... More
Random dynamical systems generated by stochastic Navier--Stokes equation on the rotating sphereMar 26 2014In this paper we first prove the existence and uniqueness of the solution to the stochastic Navier--Stokes equations on the rotating 2-dimensional sphere. Then we show the existence of an asymptotically compact random dynamical system associated with ... More
Searching for MobileNetV3May 06 2019May 14 2019We present the next generation of MobileNets based on a combination of complementary search techniques as well as a novel architecture design. MobileNetV3 is tuned to mobile phone CPUs through a combination of hardware aware network architecture search ... More
Device Placement Optimization with Reinforcement LearningJun 13 2017Jun 25 2017The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture ... More
Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot TranslationNov 14 2016Aug 21 2017We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages. Our solution requires no change in the model architecture from our base system but instead introduces an artificial token at the ... More
Second-Order Coding Rates for Conditional Rate-DistortionOct 10 2014This paper characterizes the second-order coding rates for lossy source coding with side information available at both the encoder and the decoder. We first provide non-asymptotic bounds for this problem and then specialize the non-asymptotic bounds for ... More
Latent Sequence DecompositionsOct 10 2016Feb 07 2017We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes sequences with variable lengthed output units as a function of both the input sequence and the output sequence. We present a training algorithm which samples valid extensions ... More
A note on joint reductions and mixed multiplicitiesOct 28 2011Jul 31 2012Let $(A, \frak m)$ be a noetherian local ring with maximal ideal $\frak{m}$ and infinite residue field $k = A/\frak{m}.$ Let $J$ be an $\frak m$-primary ideal, $I_1,...,I_s$ ideals of $A$, and $M$ a finitely generated $A$-module. In this paper, we interpret ... More
Higher order Quasi-Monte Carlo integration for holomorphic, parametric operator equationsSep 08 2014Jun 24 2015We analyze the convergence of higher order Quasi-Monte Carlo (QMC) quadratures of solution-functionals to countably-parametric, nonlinear operator equations with distributed uncertain parameters taking values in a separable Banach space $X$ admitting ... More
Multi-level higher order QMC Galerkin discretization for affine parametric operator equationsJun 17 2014Aug 09 2015We develop a convergence analysis of a multi-level algorithm combining higher order quasi-Monte Carlo (QMC) quadratures with general Petrov-Galerkin discretizations of countably affine parametric operator equations of elliptic and parabolic type, extending ... More
Latent Sequence DecompositionsOct 10 2016Oct 17 2016We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes sequences with variable lengthed output units as a function of both the input sequence and the output sequence. We present a training algorithm which samples valid extensions ... More
Massive Exploration of Neural Machine Translation ArchitecturesMar 11 2017Mar 21 2017Neural Machine Translation (NMT) has shown remarkable progress over the past few years with production systems now being deployed to end-users. One major drawback of current architectures is that they are expensive to train, typically requiring days to ... More
Using Videos to Evaluate Image Model RobustnessApr 22 2019Apr 24 2019Human visual systems are robust to a wide range of image transformations that are challenging for artificial networks. We present the first study of image model robustness to the minute transformations found across video frames, which we term "natural ... More
Random attractors for the stochastic Navier--Stokes equations on the 2D unit sphereFeb 13 2014Jun 02 2015In this paper we prove the existence of random attractors for the Navier--Stokes equations on 2 dimensional sphere under random forcing irregular in space and time. We also deduce the existence of an invariant measure.
Higher Order Quasi Monte-Carlo Integration in Uncertainty QuantificationSep 29 2014We review recent results on dimension-robust higher order convergence rates of Quasi-Monte Carlo Petrov-Galerkin approximations for response functionals of infinite-dimensional, parametric operator equations which arise in computational uncertainty quantification. ... More
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak SupervisionOct 31 2016Apr 23 2017Harnessing the statistical power of neural networks to perform language understanding and symbolic reasoning is difficult, when it requires executing efficient discrete operations against a large knowledge-base. In this work, we introduce a Neural Symbolic ... More
Algebraic Representation of Many-Particle Coulomb Green Function and Application in Atomic CalculationsSep 26 2004Sep 28 2004Basing on the relation between the Coulomb Green function and the Green function of harmonic oscillator, the algebraic representation of the many-particle Coulomb Green function in the form of annihilation and creation operators is established. These ... More
On the applicability of distributed ledger architectures to peer-to-peer energy trading frameworkOct 11 2018As more and more distributed renewable energy resources are integrated to the grid, the traditional consumers have become the prosumers who can sell back their surplus energy to the others who are in energy shortage. This peer-to-peer (P2P) energy transaction ... More
On the Nanocommunications at THz Band in Graphene-Enabled Wireless Network-on-ChipAug 01 2017One of the main challenges towards the growing computation-intensive applications with scalable bandwidth requirement is the deployment of a dense number of on-chip cores within a chip package. To this end, this paper investigates the Wireless Network- ... More
TensorSCONE: A Secure TensorFlow Framework using Intel SGXFeb 12 2019Machine learning has become a critical component of modern data-driven online services. Typically, the training phase of machine learning techniques requires to process large-scale datasets which may contain private and sensitive information of customers. ... More
A Data-Scalable Randomized Misfit Approach for Solving Large-Scale PDE-Constrained Inverse ProblemsMar 04 2016Apr 17 2017A randomized misfit approach is presented for the efficient solution of large-scale PDE-constrained inverse problems with high-dimensional data. The purpose of this paper is to offer a theory-based framework for random projections in this inverse problem ... More
Implementation of Web-Based Respondent-Driven Sampling among Men who Have Sex with Men in VietnamJun 08 2012Objective: Lack of representative data about hidden groups, like men who have sex with men (MSM), hinders an evidence-based response to the HIV epidemics. Respondent-driven sampling (RDS) was developed to overcome sampling challenges in studies of populations ... More
Approximate Stream Analytics in Apache Flink and Apache Spark StreamingSep 09 2017Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. ... More
Multilevel higher order Quasi-Monte Carlo Bayesian EstimationNov 24 2016We propose and analyze deterministic multilevel approximations for Bayesian inversion of operator equations with uncertain distributed parameters, subject to additive Gaussian measurement data. The algorithms use a multilevel (ML) approach based on deterministic, ... More
A broad-band laser-driven double Fano system - photoelectron spectraFeb 06 2013Fano profiles, with their asymmetric character, have many potential applications in technology. The design of Fano profiles into optical systems may create new nonlinear and switchable metamaterials, high-quality optical waveguides, ultrasensitive media ... More
A Survey of Multi-Access Edge Computing in 5G and Beyond: Fundamentals, Technology Integration, and State-of-the-ArtJun 20 2019Driven by the emergence of new compute-intensive applications and the vision of IoT, the 5G network will face an unprecedented increase in traffic volume and computation demands. However, end users mostly have limited storage capacities and finite processing ... More
Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision (Short Version)Dec 04 2016Extending the success of deep neural networks to natural language understanding and symbolic reasoning requires complex operations and external memory. Recent neural program induction approaches have attempted to address this problem, but are typically ... More
Backprop EvolutionAug 08 2018The back-propagation algorithm is the cornerstone of deep learning. Despite its importance, few variations of the algorithm have been attempted. This work presents an approach to discover new variations of the back-propagation equation. We use a domain ... More
Using Web Co-occurrence Statistics for Improving Image CategorizationDec 19 2013Dec 20 2013Object recognition and localization are important tasks in computer vision. The focus of this work is the incorporation of contextual information in order to improve object recognition and localization. For instance, it is natural to expect not to see ... More
Electroweak phase transition in the economical 3-3-1 modelSep 02 2014Jul 24 2015We consider the EWPT in the economical 3-3-1 (E331) model. Our analysis shows that the EWPT in the model is a sequence of two first-order phase transitions, $SU(3) \rightarrow SU(2)$ at the TeV scale and $SU(2) \rightarrow U(1)$ at the $100$ GeV scale. ... More
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts LayerJan 23 2017The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model ... More
Approximate Edge Analytics for the IoT EcosystemMay 15 2018IoT-enabled devices continue to generate a massive amount of data. Transforming this continuously arriving raw data into timely insights is critical for many modern online services. For such settings, the traditional form of data analytics over the entire ... More