Deep Temporal Linear Encoding NetworksNov 21 2016The CNN-encoding of features from entire videos for the representation of human actions has rarely been addressed. Instead, CNN work has focused on approaches to fuse spatial and temporal networks, but these were typically limited to processing shorter ... More
Direction matters: hand pose estimation from local surface normalsApr 10 2016We present a hierarchical regression framework for estimating hand joint positions from single depth images based on local surface normals. The hierarchical regression follows the tree structured topology of hand from wrist to finger tips. We propose ... More
Progressive Structure from MotionMar 20 2018Jul 10 2018Structure from Motion or the sparse 3D reconstruction out of individual photos is a long studied topic in computer vision. Yet none of the existing reconstruction pipelines fully addresses a progressive scenario where images are only getting available ... More
Manifold-valued Image Generation with Wasserstein Generative Adversarial NetsDec 05 2017Jan 03 2019Generative modeling over natural images is one of the most fundamental machine learning problems. However, few modern generative models, including Wasserstein Generative Adversarial Nets (WGANs), are studied on manifold-valued images that are frequently ... More
Building Deep Networks on Grassmann ManifoldsNov 17 2016Jan 29 2018Learning representations on Grassmann manifolds is popular in quite a few visual recognition tasks. In order to enable deep learning on Grassmann manifolds, this paper proposes a deep network architecture by generalizing the Euclidean network paradigm ... More
Learning Accurate, Comfortable and Human-like DrivingMar 26 2019Autonomous vehicles are more likely to be accepted if they drive accurately, comfortably, but also similar to how human drivers would. This is especially true when autonomous and human-driven vehicles need to share the same road. The main research focus ... More
Dilemma First Search for Effortless Optimization of NP-Hard ProblemsSep 12 2016To tackle the exponentiality associated with NP-hard problems, two paradigms have been proposed. First, Branch & Bound, like Dynamic Programming, achieve efficient exact inference but requires extensive information and analysis about the problem at hand. ... More
DynamoNet: Dynamic Action and Motion NetworkApr 25 2019In this paper, we are interested in self-supervised learning the motion cues in videos using dynamic motion filters for a better motion representation to finally boost human action recognition in particular. Thus far, the vision community has focused ... More
A Novel BiLevel Paradigm for Image-to-Image TranslationApr 18 2019Image-to-image (I2I) translation is a pixel-level mapping that requires a large number of paired training data and often suffers from the problems of high diversity and strong category bias in image scenes. In order to tackle these problems, we propose ... More
On the Relation between Color Image Denoising and ClassificationApr 05 2017Large amount of image denoising literature focuses on single channel images and often experimentally validates the proposed methods on tens of images at most. In this paper, we investigate the interaction between denoising and classification on large ... More
Low-Cost Scene Modeling using a Density Function Improves Segmentation PerformanceMay 26 2016We propose a low cost and effective way to combine a free simulation software and free CAD models for modeling human-object interaction in order to improve human & object segmentation. It is intended for research scenarios related to safe human-robot ... More
RayNet: Learning Volumetric 3D Reconstruction with Ray PotentialsJan 06 2019In this paper, we consider the problem of reconstructing a dense 3D model using images captured from different views. Recent methods based on convolutional neural networks (CNN) allow learning the entire task from data. However, they do not incorporate ... More
Temporal Segment Networks: Towards Good Practices for Deep Action RecognitionAug 02 2016Deep convolutional networks have achieved great success for visual recognition in still images. However, for action recognition in videos, the advantage over traditional methods is not so evident. This paper aims to discover the principles to design effective ... More
An Analysis of Human-centered GeolocationJul 10 2017Jan 31 2018Online social networks contain a constantly increasing amount of images - most of them focusing on people. Due to cultural and climate factors, fashion trends and physical appearance of individuals differ from city to city. In this paper we investigate ... More
Spatio-Temporal Channel Correlation Networks for Action ClassificationJun 19 2018Feb 07 2019The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations ... More
Face Translation between Images and Videos using Identity-aware CycleGANDec 04 2017This paper presents a new problem of unpaired face translation between images and videos, which can be applied to facial video prediction and enhancement. In this problem there exist two major technical challenges: 1) designing a robust translation model ... More
Sliced Wasserstein Generative ModelsApr 10 2019Apr 13 2019In generative modeling, the Wasserstein distance (WD) has emerged as a useful metric to measure the discrepancy between generated and real data distributions. Unfortunately, it is challenging to approximate the WD of high-dimensional distributions. In ... More
Error Correction for Dense Semantic Image LabelingDec 11 2017Pixelwise semantic image labeling is an important, yet challenging, task with many applications. Typical approaches to tackle this problem involve either the training of deep networks on vast amounts of images to directly infer the labels or the use of ... More
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video ClassificationNov 22 2017The work in this paper is driven by the question how to exploit the temporal cues available in videos for their accurate classification, and for human action recognition in particular? Thus far, the vision community has focused on spatio-temporal approaches ... More
An open mapping theorem for finitely copresented Esakia spacesOct 04 2017Mar 12 2018We prove an open mapping theorem for the topological spaces dual to finitely presented Heyting algebras. This yields in particular a short, self-contained semantic proof of the uniform interpolation theorem for intuitionistic propositional logic, first ... More
A Riemannian Network for SPD Matrix LearningAug 15 2016Symmetric Positive Definite (SPD) matrix learning methods have become popular in many image and video processing tasks, thanks to their ability to learn appropriate statistical representations while respecting the Riemannian geometry of the underlying ... More
Unsupervised High-level Feature Learning by Ensemble Projection for Semi-supervised Image Classification and Image ClusteringFeb 02 2016Feb 04 2016This paper investigates the problem of image classification with limited or no annotations, but abundant unlabeled data. The setting exists in many tasks such as semi-supervised image classification, image clustering, and image retrieval. Unlike previous ... More
Dark Model Adaptation: Semantic Image Segmentation from Daytime to NighttimeOct 05 2018This work addresses the problem of semantic image segmentation of nighttime scenes. Although considerable progress has been made in semantic image segmentation, it is mainly related to daytime scenarios. This paper proposes a novel method to progressive ... More
Does V-NIR based Image Enhancement Come with Better Features?Aug 23 2016Aug 24 2016Image enhancement using the visible (V) and near-infrared (NIR) usually enhances useful image details. The enhanced images are evaluated by observers perception, instead of quantitative feature evaluation. Thus, can we say that these enhanced images using ... More
Image-level Classification in Hyperspectral Images using Feature Descriptors, with Application to Face RecognitionMay 11 2016In this paper, we proposed a novel pipeline for image-level classification in the hyperspectral images. By doing this, we show that the discriminative spectral information at image-level features lead to significantly improved performance in a face recognition ... More
Semantic Nighttime Image Segmentation with Synthetic Stylized Data, Gradual Adaptation and Uncertainty-Aware EvaluationJan 17 2019This work addresses the problem of semantic segmentation of nighttime images. The main direction of recent progress in semantic segmentation pertains to daytime scenes with favorable illumination conditions. We focus on improving the performance of state-of-the-art ... More
Failure Detection for Facial Landmark DetectorsAug 23 2016Most face applications depend heavily on the accuracy of the face and facial landmarks detectors employed. Prediction of attributes such as gender, age, and identity usually completely fail when the faces are badly aligned due to inaccurate facial landmark ... More
k2-means for fast and accurate large scale clusteringMay 30 2016We propose k^2-means, a new clustering method which efficiently copes with large numbers of clusters and achieves low energy solutions. k^2-means builds upon the standard k-means (Lloyd's algorithm) and combines a new strategy to accelerate the convergence ... More
Random Binary Mappings for Kernel Learning and Efficient SVMJul 19 2013Mar 28 2014Support Vector Machines (SVMs) are powerful learners that have led to state-of-the-art results in various computer vision problems. SVMs suffer from various drawbacks in terms of selecting the right kernel, which depends on the image descriptors, as well ... More
Some like it hot - visual guidance for preference predictionOct 27 2015Mar 10 2016For people first impressions of someone are of determining importance. They are hard to alter through further information. This begs the question if a computer can reach the same judgement. Earlier research has already pointed out that age, gender, and ... More
Semi-Supervised Learning by Augmented Distribution AlignmentMay 20 2019In this work, we propose a simple yet effective semi-supervised learning approach called Augmented Distribution Alignment. We reveal that an essential sampling bias exists in semi-supervised learning due to the limited amount of labeled samples, which ... More
Comment on "Ensemble Projection for Semi-supervised Image Classification"Aug 29 2014In a series of papers by Dai and colleagues [1,2], a feature map (or kernel) was introduced for semi- and unsupervised learning. This feature map is build from the output of an ensemble of classifiers trained without using the ground-truth class labels. ... More
Real-time 3D Traffic Cone Detection for Autonomous DrivingFeb 06 2019Considerable progress has been made in semantic scene understanding of road scenes with monocular cameras. It is, however, mainly related to certain classes such as cars and pedestrians. This work investigates traffic cones, an object class crucial for ... More
Multi-bin Trainable Linear Unit for Fast Image Restoration NetworksJul 30 2018Tremendous advances in image restoration tasks such as denoising and super-resolution have been achieved using neural networks. Such approaches generally employ very deep architectures, large number of parameters, large receptive fields and high nonlinear ... More
Generic 3D Convolutional Fusion for image restorationJul 26 2016Also recently, exciting strides forward have been made in the area of image restoration, particularly for image denoising and single image super-resolution. Deep learning techniques contributed to this significantly. The top methods differ in their formulations ... More
Semantic Foggy Scene Understanding with Synthetic DataAug 25 2017Sep 05 2017This work addresses the problem of semantic foggy scene understanding (SFSU). Although extensive research has been performed on image dehazing and on semantic scene understanding with weather-clear images, little attention has been paid to SFSU. Due to ... More
Failure Prediction for Autonomous DrivingMay 04 2018The primary focus of autonomous driving research is to improve driving accuracy. While great progress has been made, state-of-the-art algorithms still fail at times. Such failures may have catastrophic consequences. It therefore is important that automated ... More
End-to-End Learning of Driving Models with Surround-View Cameras and Route PlannersMar 27 2018Aug 06 2018For human drivers, having rear and side-view mirrors is vital for safe driving. They deliver a more complete view of what is happening around the car. Human drivers also heavily exploit their mental map for navigation. Nonetheless, several methods have ... More
Seven ways to improve example-based single image super resolutionNov 06 2015In this paper we present seven techniques that everybody should know to improve example-based single image super resolution (SR): 1) augmentation of data, 2) use of large dictionaries with efficient search structures, 3) cascading, 4) image self-similarities, ... More
Observing the fine structure of loops through high resolution spectroscopic observations of coronal rain with the CRISP instrument at the Swedish Solar TelescopeDec 03 2011We present here one of the first high resolution spectroscopic observations of coronal rain, performed with the CRISP instrument at the Swedish Solar Telescope. This work constitutes the first attempt to assess the importance of coronal rain in the understanding ... More
Penumbral micro-jets at high spatial and temporal resolutionMay 08 2019May 20 2019Sunspot observations in chromospheric spectral lines have revealed the existence of short-lived linear bright transients, commonly referred to as penumbral micro-jets (PMJs). Details on the origin and physical nature of PMJs are to large extend still ... More
An introduction to quantum filteringJan 30 2006This paper provides an introduction to quantum filtering theory. An introduction to quantum probability theory is given, focusing on the spectral theorem and the conditional expectation as a least squares estimate, and culminating in the construction ... More
Efficient Two-Stream Motion and Appearance 3D CNNs for Video ClassificationAug 31 2016Sep 02 2016The video and action classification have extremely evolved by deep neural networks specially with two stream CNN using RGB and optical flow as inputs and they present outstanding performance in terms of video analysis. One of the shortcoming of these ... More
Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event DetectionApr 25 2016We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due ... More
Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene UnderstandingJan 05 2019This work addresses the problem of semantic scene understanding under fog. Although marked progress has been made in semantic scene understanding, it is mainly concentrated on clear-weather scenes. Extending semantic segmentation methods to adverse weather ... More
A Three-Player GAN: Generating Hard Samples To Improve Classification NetworksMar 08 2019We propose a Three-Player Generative Adversarial Network to improve classification networks. In addition to the game played between the discriminator and generator, a competition is introduced between the generator and the classifier. The generator's ... More
Ensemble Manifold Segmentation for Model Distillation and Semi-supervised LearningApr 06 2018Manifold theory has been the central concept of many learning methods. However, learning modern CNNs with manifold structures has not raised due attention, mainly because of the inconvenience of imposing manifold structures onto the architecture of the ... More
Efficient Volumetric Fusion of Airborne and Street-Side Data for Urban ReconstructionSep 05 2016Airborne acquisition and on-road mobile mapping provide complementary 3D information of an urban landscape: the former acquires roof structures, ground, and vegetation at a large scale, but lacks the facade and street-side details, while the latter is ... More
Semantically-Guided Video Object SegmentationApr 06 2017Jul 17 2018This paper tackles the problem of semi-supervised video object segmentation, that is, segmenting an object in a sequence given its mask in the first frame. One of the main challenges in this scenario is the change of appearance of the objects of interest. ... More
Gated CRF Loss for Weakly Supervised Semantic Image SegmentationJun 11 2019State-of-the-art approaches for semantic segmentation rely on deep convolutional neural networks trained on fully annotated datasets, that have been shown to be notoriously expensive to collect, both in terms of time and money. To remedy this situation, ... More
SMIT: Stochastic Multi-Label Image-to-Image TranslationDec 10 2018Cross-domain mapping has been a very active topic in recent years. Given one image, its main purpose is to translate it to the desired target domain, or multiple domains in the case of multiple labels. This problem is highly challenging due to three main ... More
ComboGAN: Unrestrained Scalability for Image Domain TranslationDec 19 2017This year alone has seen unprecedented leaps in the area of learning-based image translation, namely CycleGAN, by Zhu et al. But experiments so far have been tailored to merely two domains at a time, and scaling them to more would require an quadratic ... More
Towards High Resolution Video Generation with Progressive Growing of Sliced Wasserstein GANsOct 04 2018Dec 06 2018The extension of image generation to video generation turns out to be a very difficult task, since the temporal dimension of videos introduces an extra challenge during the generation process. Besides, due to the limitation of memory and training stability, ... More
Fast Optical Flow using Dense Inverse SearchMar 11 2016Most recent works in optical flow extraction focus on the accuracy and neglect the time complexity. However, in real-life visual applications, such as tracking, activity detection and recognition, the time complexity is critical. We propose a solution ... More
Actionness Estimation Using Hybrid Fully Convolutional NetworksApr 25 2016Actionness was introduced to quantify the likelihood of containing a generic action instance at a specific location. Accurate and efficient estimation of actionness is important in video analysis and may benefit other relevant tasks such as action recognition ... More
DLOW: Domain Flow for Adaptation and GeneralizationDec 13 2018In this work, we propose a domain flow generation(DLOW) approach to model the domain shift between two domains by generating a continuous sequence of intermediate domains flowing from one domain to the other. The benefits of our DLOW model are two-fold. ... More
Oracle MCG: A first peek into COCO Detection ChallengesAug 14 2015The recently presented COCO detection challenge will most probably be the reference benchmark in object detection in the next years. COCO is two orders of magnitude larger than Pascal and has four times the number of categories; so in all likelihood researchers ... More
Object Referring in Videos with Language and Human GazeJan 04 2018Apr 04 2018We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only ... More
Object Referring in Visual Scene with Spoken LanguageNov 10 2017Dec 05 2017Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is more natural. ... More
Model-free Consensus Maximization for Non-Rigid ShapesJul 05 2018Aug 13 2018Many computer vision methods use consensus maximization to relate measurements containing outliers with the correct transformation model. In the context of rigid shapes, this is typically done using Random Sampling and Consensus (RANSAC) by estimating ... More
Covariance Pooling For Facial Expression RecognitionMay 13 2018Classifying facial expressions into different categories requires capturing regional distortions of facial landmarks. We believe that second-order statistics such as covariance is better able to capture such distortions in regional facial fea- tures. ... More
Blazingly Fast Video Object Segmentation with Pixel-Wise Metric LearningApr 09 2018This paper tackles the problem of video object segmentation, given some user annotation which indicates the object of interest. The problem is formulated as pixel-wise retrieval in a learned embedding space: we embed pixels of the same object instance ... More
Deep Learning on Lie Groups for Skeleton-based Action RecognitionDec 18 2016Apr 11 2017In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time ... More
PathTrack: Fast Trajectory Annotation with Path SupervisionMar 07 2017Mar 22 2017Progress in Multiple Object Tracking (MOT) has been historically limited by the size of the available datasets. We present an efficient framework to annotate trajectories and use it to produce a MOT dataset of unprecedented size. In our novel path supervision ... More
UntrimmedNets for Weakly Supervised Action Recognition and DetectionMar 09 2017May 22 2017Current action recognition methods heavily rely on trimmed videos for model training. However, it is expensive and time-consuming to acquire a large-scale trimmed video dataset. This paper presents a new weakly supervised architecture, called UntrimmedNet, ... More
DLOW: Domain Flow for Adaptation and GeneralizationDec 13 2018May 13 2019In this work, we present a domain flow generation(DLOW) model to bridge two different domains by generating a continuous sequence of intermediate domains flowing from one domain to the other. The benefits of our DLOW model are two-fold. First, it is able ... More
Energy-Efficient ConvNets Through Approximate ComputingMar 22 2016Recently ConvNets or convolutional neural networks (CNN) have come up as state-of-the-art classification and detection algorithms, achieving near-human performance in visual detection. However, ConvNet algorithms are typically very computation and memory ... More
Thin-Slicing Network: A Deep Structured Model for Pose Estimation in VideosMar 31 2017Deep ConvNets have been shown to be effective for the task of human pose estimation from single images. However, several challenging issues arise in the video-based case such as self-occlusion, motion blur, and uncommon poses with few or no examples in ... More
Is Image Super-resolution Helpful for Other Vision Tasks?Sep 23 2015Jan 28 2016Despite the great advances made in the field of image super-resolution (ISR) during the last years, the performance has merely been evaluated perceptually. Thus, it is still unclear whether ISR is helpful for other vision tasks. In this paper, we present ... More
Deep Domain Adaptation by Geodesic Distance MinimizationJul 13 2017Oct 10 2017In this paper, we propose a new approach called Deep LogCORAL for unsupervised visual domain adaptation. Our work builds on the recently proposed Deep CORAL method, which proposed to train a convolutional neural network and simultaneously minimize the ... More
Transferring Object-Scene Convolutional Neural Networks for Event Recognition in Still ImagesSep 01 2016Event recognition in still images is an intriguing problem and has potential for real applications. This paper addresses the problem of event recognition by proposing a convolutional neural network that exploits knowledge of objects and scenes for event ... More
Branched Multi-Task Networks: Deciding What Layers To ShareApr 05 2019In the context of deep learning, neural networks with multiple branches have been used that each solve different tasks. Such ramified networks typically start with a number of shared layers, after which different tasks branch out into their own sequence ... More
PIRM Challenge on Perceptual Image Enhancement on Smartphones: ReportOct 03 2018This paper reviews the first challenge on efficient perceptual image enhancement with the focus on deploying deep learning models on smartphones. The challenge consisted of two tracks. In the first one, participants were solving the classical image super-resolution ... More
Query-adaptive Video Summarization via Quality-aware Relevance EstimationMay 01 2017Sep 28 2017Although the problem of automatic video summarization has recently received a lot of attention, the problem of creating a video summary that also highlights elements relevant to a search query has been less studied. We address this problem by posing query-relevant ... More
Acquiring Common Sense Spatial Knowledge through Implicit Spatial TemplatesNov 18 2017Nov 21 2017Spatial understanding is a fundamental problem with wide-reaching real-world applications. The representation of spatial knowledge is often modeled with spatial templates, i.e., regions of acceptability of two objects under an explicit spatial relationship ... More
Incremental Non-Rigid Structure-from-Motion with Unknown Focal LengthAug 13 2018The perspective camera and the isometric surface prior have recently gathered increased attention for Non-Rigid Structure-from-Motion (NRSfM). Despite the recent progress, several challenges remain, particularly the computational complexity and the unknown ... More
Model Adaptation with Synthetic and Real Data for Semantic Dense Foggy Scene UnderstandingAug 03 2018This work addresses the problem of semantic scene understanding under dense fog. Although considerable progress has been made in semantic scene understanding, it is mainly related to clear-weather scenes. Extending recognition methods to adverse weather ... More
Dynamic Filter NetworksMay 31 2016Jun 06 2016In a traditional convolutional layer, the learned filters stay fixed after training. In contrast, we introduce a new framework, the Dynamic Filter Network, where filters are generated dynamically conditioned on an input. We show that this architecture ... More
DeepCAMP: Deep Convolutional Action & Attribute Mid-Level PatternsAug 10 2016The recognition of human actions and the determination of human attributes are two tasks that call for fine-grained classification. Indeed, often rather small and inconspicuous objects and features have to be detected to tell their classes apart. In order ... More
Crossing Nets: Combining GANs and VAEs with a Shared Latent Space for Hand Pose EstimationFeb 11 2017Jul 18 2017State-of-the-art methods for 3D hand pose estimation from depth images require large amounts of annotated training data. We propose to model the statistical relationships of 3D hand poses and corresponding depth images using two deep generative models ... More
Dense 3D Regression for Hand Pose EstimationNov 24 2017We present a simple and effective method for 3D hand pose estimation from a single depth frame. As opposed to previous state-of-the-art methods based on holistic 3D regression, our method works on dense pixel-wise estimation. This is achieved by careful ... More
Learning Discriminative Model Prediction for TrackingApr 15 2019The current strive towards end-to-end trainable computer vision systems imposes major challenges for the task of visual tracking. In contrast to most other vision problems, tracking requires the learning of a robust target-specific appearance model online, ... More
Weakly Supervised Object Discovery by Generative Adversarial & Ranking NetworksNov 22 2017Apr 17 2018The deep generative adversarial networks (GAN) recently have been shown to be promising for different computer vision applications, like image edit- ing, synthesizing high resolution images, generating videos, etc. These networks and the corresponding ... More
On the continuum intensity distribution of the solar photosphereMay 05 2009Aug 18 2009We present a detailed comparison between simulations and seeing-free observations that takes into account the crucial influence of instrumental image degradation. We use images of quiet Sun granulation taken in the blue, green and red continuum bands ... More
A discrete invitation to quantum filtering and feedback controlJun 05 2006Dec 05 2006The engineering and control of devices at the quantum-mechanical level--such as those consisting of small numbers of atoms and photons--is a delicate business. The fundamental uncertainty that is inherently present at this scale manifests itself in the ... More
Lagrangian submanifolds with prescribed second fundamental formSep 17 2013We classify Lagrangian submanifolds of complex space forms, whose second fundamental form can be written in a certain way, depending on a real parameter. For some special values of this parameter, the resulting submanifolds are ideal in the sense that ... More
SEEDS: Superpixels Extracted via Energy-Driven SamplingSep 16 2013Superpixel algorithms aim to over-segment the image by grouping pixels that belong to the same object. Many state-of-the-art superpixel algorithms rely on minimizing objective functions to enforce color ho- mogeneity. The optimization is accomplished ... More
End-to-end Lane Detection through Differentiable Least-Squares FittingFeb 01 2019Lane detection is typically tackled with a two-step pipeline in which a segmentation mask of the lane markings is predicted first, and a lane line model (like a parabola or spline) is fitted to the post-processed mask next. The problem with such a two-step ... More
Sparse and noisy LiDAR completion with RGB guidance and uncertaintyFeb 14 2019This work proposes a new method to accurately complete sparse LiDAR maps guided by RGB images. For autonomous vehicles and robotics the use of LiDAR is indispensable in order to achieve precise depth predictions. A multitude of applications depend on ... More
Observation and analysis of chromospheric magnetic fieldsApr 05 2010The solar chromosphere is a vigorously dynamic region of the sun, where waves and magnetic fields play an important role. To improve chromospheric diagnostics, we present new observations in Ca II 8542 carried out with the SST/CRISP on La Palma, working ... More
One-Shot Video Object SegmentationNov 16 2016This paper tackles the task of semi-supervised video object segmentation, i.e., the separation of an object from the background in a video, given the mask of the first frame. We present One-Shot Video Object Segmentation (OSVOS), based on a fully-convolutional ... More
Night-to-Day Image Translation for Retrieval-based LocalizationSep 26 2018Visual localization is a key step in many robotics pipelines, allowing the robot to approximately determine its position and orientation in the world. An efficient and scalable approach to visual localization is to use image retrieval techniques. These ... More
Speech-Based Visual Question AnsweringMay 01 2017Sep 16 2017This paper introduces speech-based visual question answering (VQA), the task of generating an answer given an image and a spoken question. Two methods are studied: an end-to-end, deep neural network that directly uses audio waveforms as input versus a ... More
Fast video object segmentation with Spatio-Temporal GANsMar 28 2019Learning descriptive spatio-temporal object models from data is paramount for the task of semi-supervised video object segmentation. Most existing approaches mainly rely on models that estimate the segmentation mask based on a reference mask at the first ... More
Soft-to-Hard Vector Quantization for End-to-End Learning Compressible RepresentationsApr 03 2017Jun 08 2017We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization and entropy, which we anneal to their discrete counterparts ... More
3D Appearance Super-Resolution with Deep LearningJun 03 2019Jun 04 2019We tackle the problem of retrieving high-resolution (HR) texture maps of objects that are captured from multiple view points. In the multi-view case, model-based super-resolution (SR) methods have been recently proved to recover high quality texture maps. ... More
Exemplar Guided Unsupervised Image-to-Image Translation with Semantic ConsistencyMay 28 2018Oct 13 2018Image-to-image translation has recently received significant attention due to advances in deep learning. Most works focus on learning either a one-to-one mapping in an unsupervised way or a many-to-many mapping in a supervised way. However, a more practical ... More
