Results for "Alan Yuille"

total 4344took 0.13s
A Meta-Theory of Boundary Detection BenchmarksFeb 25 2013Human labeled datasets, along with their corresponding evaluation algorithms, play an important role in boundary detection. We here present a psychophysical experiment that addresses the reliability of such benchmarks. To find better remedies to evaluate ... More
Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image SegmentationFeb 09 2015Oct 05 2015Deep convolutional neural networks (DCNNs) trained on a large number of images with strong pixel-level annotations have recently significantly pushed the state-of-art in semantic image segmentation. We study the more challenging problem of learning DCNNs ... More
Robustness of Object Recognition under Extreme Occlusion in Humans and Computational ModelsMay 11 2019Jun 04 2019Most objects in the visual world are partially occluded, but humans can recognize them without difficulty. However, it remains unknown whether object recognition models like convolutional neural networks (CNNs) can handle real-world occlusion. It is also ... More
Parsing Occluded People by Flexible CompositionsDec 04 2014Nov 24 2015This paper presents an approach to parsing humans when there is significant occlusion. We model humans using a graphical model which has a tree structure building on recent work [32, 6] and exploit the connectivity prior that, even in presence of occlusion, ... More
Intriguing properties of adversarial trainingJun 10 2019Adversarial training is one of the main defenses against adversarial attacks. In this paper, we provide the first rigorous study on diagnosing elements of adversarial training, which reveals two intriguing properties. First, we study the role of normalization. ... More
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFsDec 22 2014Jun 07 2016Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models ... More
Training Multi-organ Segmentation Networks with Sample Selection by Relaxed Upper Confident BoundApr 07 2018Deep convolutional neural networks (CNNs), especially fully convolutional networks, have been widely applied to automatic medical image segmentation problems, e.g., multi-organ segmentation. Existing CNN-based segmentation methods mainly focus on looking ... More
Efficient variational inference in large-scale Bayesian compressed sensingJul 22 2011Sep 05 2011We study linear models under heavy-tailed priors from a probabilistic viewpoint. Instead of computing a single sparse most probable (MAP) solution as in standard deterministic approaches, the focus in the Bayesian compressed sensing framework shifts towards ... More
DOC: Deep OCclusion Estimation From a Single ImageNov 20 2015Jul 24 2016Recovering the occlusion relationships between objects is a fundamental human visual ability which yields important information about the 3D world. In this paper we propose a deep network architecture, called DOC, which acts on a single image, detects ... More
Symmetric Non-Rigid Structure from Motion for Category-Specific Object Structure EstimationSep 22 2016Many objects, especially these made by humans, are symmetric, e.g. cars and aeroplanes. This paper addresses the estimation of 3D structures of symmetric objects from multiple images of the same object category, e.g. different cars, seen from various ... More
Genetic CNNMar 04 2017The deep Convolutional Neural Network (CNN) is the state-of-the-art solution for large-scale visual recognition. Following basic principles such as increasing the depth and constructing highway connections, researchers have manually designed a lot of ... More
Thickened 2D Networks for 3D Medical Image SegmentationApr 02 2019There has been a debate in medical image segmentation on whether to use 2D or 3D networks, where both pipelines have advantages and disadvantages. This paper presents a novel approach which thickens the input of a 2D network, so that the model is expected ... More
Visual Concepts and Compositional VotingNov 13 2017It is very attractive to formulate vision in terms of pattern theory \cite{Mumford2010pattern}, where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is currently less ... More
A Fixed-Point Model for Pancreas Segmentation in Abdominal CT ScansDec 25 2016Jun 21 2017Deep neural networks have been widely adopted for automatic organ segmentation from abdominal CT scans. However, the segmentation accuracy of some small organs (e.g., the pancreas) is sometimes below satisfaction, arguably because deep networks are easily ... More
Semi-Supervised Multi-Organ Segmentation via Deep Multi-Planar Co-TrainingApr 07 2018Nov 19 2018In multi-organ segmentation of abdominal CT scans, most existing fully supervised deep learning algorithms require lots of voxel-wise annotations, which are usually difficult, expensive, and slow to obtain. In comparison, massive unlabeled 3D CT volumes ... More
Semantic Part Segmentation using Compositional Model combining Shape and AppearanceDec 18 2014In this paper, we study the problem of semantic part segmentation for animals. This is more challenging than standard object detection, object segmentation and pose estimation tasks because semantic parts of animals often have similar appearance and highly ... More
Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise RelationsJul 12 2014Nov 04 2014We present a method for estimating articulated human pose from a single static image based on a graphical model with novel pairwise relations that make adaptive use of local image measurements. More precisely, we specify a graphical model for human pose ... More
UnrealCV: Connecting Computer Vision to Unreal EngineSep 05 2016Computer graphics can not only generate synthetic images and ground truth but it also offers the possibility of constructing virtual worlds in which: (i) an agent can perceive, navigate, and take actions guided by AI algorithms, (ii) properties of the ... More
Prior-aware Neural Network for Partially-Supervised Multi-Organ SegmentationApr 12 2019Accurate multi-organ abdominal CT segmentation is essential to many clinical applications such as computer-aided intervention. As data annotation requires massive human labor from experienced radiologists, it is common that training data are partially ... More
Generation and Comprehension of Unambiguous Object DescriptionsNov 07 2015Apr 11 2016We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described. We show ... More
PCL: Proposal Cluster Learning for Weakly Supervised Object DetectionJul 09 2018Oct 13 2018Weakly Supervised Object Detection (WSOD), using only image-level annotations to train object detectors, is of growing importance in object recognition. In this paper, we propose a novel deep network for WSOD. Unlike previous networks that transfer the ... More
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFsJun 02 2016May 12 2017In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous ... More
Semantic Image Segmentation with Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain TransformNov 10 2015Jun 02 2016Deep convolutional neural networks (CNNs) are the backbone of state-of-art semantic image segmentation systems. Recent work has shown that complementing CNNs with fully-connected conditional random fields (CRFs) can significantly enhance their object ... More
Transfer of View-manifold Learning to Similarity Perception of Novel ObjectsMar 31 2017We develop a model of perceptual similarity judgment based on re-training a deep convolution neural network (DCNN) that learns to associate different views of each 3D object to capture the notion of object persistence and continuity in our visual experience. ... More
Exploiting Symmetry and/or Manhattan Properties for 3D Object Structure Estimation from Single and Multiple ImagesJul 25 2016Many man-made objects have intrinsic symmetries and Manhattan structure. By assuming an orthographic projection model, this paper addresses the estimation of 3D structures and camera projection using symmetry and/or Manhattan structure cues, for the two ... More
Complexity of Representation and Inference in Compositional Models with Part SharingJan 16 2013This paper describes serial and parallel compositional models of multiple objects with part sharing. Objects are built by part-subpart compositions and expressed in terms of a hierarchical dictionary of object parts. These parts are represented on lattices ... More
Exploiting Symmetry and/or Manhattan Properties for 3D Object Structure Estimation from Single and Multiple ImagesJul 25 2016Nov 19 2016Many man-made objects have intrinsic symmetries and Manhattan structure. By assuming an orthographic projection model, this paper addresses the estimation of 3D structures and camera projection using symmetry and/or Manhattan structure cues, for the two ... More
Deep Nets: What have they ever done for Vision?May 10 2018Jan 11 2019This is an opinion paper about the strengths and weaknesses of Deep Nets for vision. They are at the center of recent progress on artificial intelligence and are of growing importance in cognitive science and neuroscience. They have enormous successes ... More
Generating Multiple Diverse Hypotheses for Human 3D Pose Consistent with 2D Joint DetectionsFeb 08 2017Aug 20 2017We propose a method to generate multiple diverse and valid human pose hypotheses in 3D all consistent with the 2D detection of joints in a monocular RGB image. We use a novel generative model uniform (unbiased) in the space of anatomically plausible 3D ... More
Adversarial Examples for Edge Detection: They Exist, and They TransferJun 02 2019Convolutional neural networks have recently advanced the state of the art in many tasks including edge and object boundary detection. However, in this paper, we demonstrate that these edge detectors inherit a troubling property of neural networks: they ... More
Object Recognition with and without ObjectsNov 20 2016While recent deep neural network models have given promising performance on object recognition, they rely implicitly on the visual contents of the whole image. In this paper, we train deep neural networks on the foreground (object) and background (context) ... More
Object Recognition with and without ObjectsNov 20 2016Nov 22 2016While recent deep neural network models have given promising performance on object recognition, they rely implicitly on the visual contents of the whole image. In this paper, we train deep neural networks on the foreground (object) and background (context) ... More
PASCAL Boundaries: A Class-Agnostic Semantic Boundary DatasetNov 25 2015In this paper, we address the boundary detection task motivated by the ambiguities in current definition of edge detection. To this end, we generate a large database consisting of more than 10k images (which is 20x bigger than existing edge detection ... More
DeePM: A Deep Part-Based Model for Object Detection and Semantic Part LocalizationNov 23 2015Jan 26 2016In this paper, we propose a deep part-based model (DeePM) for symbiotic object detection and semantic part localization. For this purpose, we annotate semantic parts for all 20 object categories on the PASCAL VOC 2012 dataset, which provides information ... More
Robustness of Object Recognition under Extreme Occlusion in Humans and Computational ModelsMay 11 2019Most objects in the visual world are partially occluded, but humans can recognize them without difficulty. However, it remains unknown whether object recognition models like convolutional neural networks (CNNs) can handle real-world occlusion. It is also ... More
Semi-Supervised Sparse Representation Based Classification for Face Recognition with Insufficient Labeled SamplesSep 12 2016Mar 24 2017This paper addresses the problem of face recognition when there is only few, or even only a single, labeled examples of the face that we wish to recognize. Moreover, these examples are typically corrupted by nuisance variables, both linear (i.e., additive ... More
OriNet: A Fully Convolutional Network for 3D Human Pose EstimationNov 12 2018In this paper, we propose a fully convolutional network for 3D human pose estimation from monocular images. We use limb orientations as a new way to represent 3D poses and bind the orientation together with the bounding box of each limb region to better ... More
Progressive Neural Architecture SearchDec 02 2017Jul 26 2018We propose a new method for learning the structure of convolutional neural networks (CNNs) that is more efficient than recent state-of-the-art methods based on reinforcement learning and evolutionary algorithms. Our approach uses a sequential model-based ... More
Semi-Supervised Sparse Representation Based Classification for Face Recognition with Insufficient Labeled SamplesSep 12 2016This paper addresses the problem of face recognition when there is only few, or even only a single, labeled examples of the face that we wish to recognize. Moreover, these examples are typically corrupted by nuisance variables, both linear (i.e. additive ... More
Parsing Semantic Parts of Cars Using Graphical Models and Segment Appearance ConsistencyJun 09 2014Jun 11 2014This paper addresses the problem of semantic part parsing (segmentation) of cars, i.e.assigning every pixel within the car to one of the parts (e.g.body, window, lights, license plates and wheels). We formulate this as a landmark identification problem, ... More
Object Recognition with and without ObjectsNov 20 2016May 25 2017While recent deep neural networks have achieved a promising performance on object recognition, they rely implicitly on the visual contents of the whole image. In this paper, we train deep neural net- works on the foreground (object) and background (context) ... More
Patch-based 3D Human Pose RefinementMay 20 2019State-of-the-art 3D human pose estimation approaches typically estimate pose from the entire RGB image in a single forward run. In this paper, we develop a post-processing step to refine 3D human pose estimation from body part patches. Using local patches ... More
Attention Correctness in Neural Image CaptioningMay 31 2016Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision. But despite their popularity, the "correctness" of the implicitly-learned attention maps has only been assessed qualitatively ... More
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring ExpressionsJan 03 2019Referring object detection and referring image segmentation are important tasks that require joint understanding of visual information and natural language. Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art ... More
Attention Correctness in Neural Image CaptioningMay 31 2016Nov 23 2016Attention mechanisms have recently been introduced in deep learning for various tasks in natural language processing and computer vision. But despite their popularity, the "correctness" of the implicitly-learned attention maps has only been assessed qualitatively ... More
Zoom Better to See Clearer: Human and Object Parsing with Hierarchical Auto-Zoom NetNov 21 2015Mar 28 2016Parsing articulated objects, e.g. humans and animals, into semantic parts (e.g. body, head and arms, etc.) from natural images is a challenging and fundamental problem for computer vision. A big difficulty is the large variability of scale and location ... More
Weight StandardizationMar 25 2019In this paper, we propose Weight Standardization (WS) to accelerate deep network training. WS is targeted at the micro-batch training setting where each GPU typically has only 1-2 images for training. The micro-batch training setting is hard because small ... More
Few-shot Learning by Exploiting Visual Concepts within CNNsNov 22 2017Feb 13 2018Convolutional neural networks (CNNs) are one of the driving forces for the advancement of computer vision. Despite their promising performances on many tasks, CNNs still face major obstacles on the road to achieving ideal machine intelligence. One is ... More
Joint Multi-Person Pose Estimation and Semantic Part SegmentationAug 10 2017Human pose estimation and semantic part segmentation are two complementary tasks in computer vision. In this paper, we propose to solve the two tasks jointly for natural multi-person images, in which the estimated pose provides object-level shape prior ... More
Label Distribution Learning ForestsFeb 20 2017Oct 16 2017Label distribution learning (LDL) is a general learning framework, which assigns to an instance a distribution over a set of labels rather than a single label or multiple labels. Current LDL methods have either restricted assumptions on the expression ... More
Probabilistic Motion Estimation Based on Temporal CoherenceJan 05 2012We develop a theory for the temporal integration of visual motion motivated by psychophysical experiments. The theory proposes that input data are temporally grouped and used to predict and estimate the motion flows in the image sequence. This temporal ... More
Pose-Guided Human Parsing with Deep Learned FeaturesAug 17 2015Nov 25 2015Parsing human body into semantic regions is crucial to human-centric analysis. In this paper, we propose a segment-based parsing pipeline that explores human pose information, i.e. the joint location of a human model, which improves the part proposal, ... More
V-NAS: Neural Architecture Search for Volumetric Medical Image SegmentationJun 06 2019Deep learning algorithms, in particular 2D and 3D fully convolutional neural networks (FCNs), have rapidly become the mainstream methodology for volumetric medical image segmentation. However, 2D convolutions cannot fully leverage the rich spatial information ... More
Semantic-Aware Knowledge Preservation for Zero-Shot Sketch-Based Image RetrievalApr 05 2019Sketch-based image retrieval (SBIR) is widely recognized as an important vision problem which implies a wide range of real-world applications. Recently, research interests arise in solving this problem under the more realistic and challenging setting ... More
Mitigating Adversarial Effects Through RandomizationNov 06 2017Feb 28 2018Convolutional neural networks have demonstrated high accuracy on various tasks in recent years. However, they are extremely vulnerable to adversarial examples. For example, imperceptible perturbations added to clean images can cause convolutional neural ... More
NormFace: L2 Hypersphere Embedding for Face VerificationApr 21 2017Jul 26 2017Thanks to the recent developments of Convolutional Neural Networks, the performance of face verification methods has increased rapidly. In a typical face verification method, feature normalization is a critical step for boosting performance. This motivates ... More
Few-Shot Image Recognition by Predicting Parameters from ActivationsJun 12 2017Nov 25 2017In this paper, we are interested in the few-shot learning problem. In particular, we focus on a challenging scenario where the number of categories is large and the number of examples per novel category is very limited, e.g. 1, 2, or 3. Motivated by the ... More
ELASTIC: Improving CNNs with Dynamic Scaling PoliciesDec 13 2018Apr 08 2019Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have a similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). ... More
Snapshot Distillation: Teacher-Student Optimization in One GenerationDec 01 2018Optimizing a deep neural network is a fundamental task in computer vision, yet direct training methods often suffer from over-fitting. Teacher-student optimization aims at providing complementary cues from a model trained previously, but these approaches ... More
Knowledge Distillation in Generations: More Tolerant Teachers Educate Better StudentsMay 15 2018Sep 07 2018We focus on the problem of training a deep neural network in generations. The flowchart is that, in order to optimize the target network (student), another network (teacher) with the same architecture is first trained, and used to provide part of supervision ... More
Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene UnderstandingJun 16 2014Recent trends in image understanding have pushed for holistic scene understanding models that jointly reason about various tasks such as object detection, scene recognition, shape analysis, contextual reasoning, and local appearance based classifiers. ... More
Rethinking Monocular Depth Estimation with Adversarial TrainingAug 22 2018Sep 24 2018Monocular depth estimation is an extensively studied computer vision problem with a vast variety of applications. Deep learning-based methods have demonstrated promise for both supervised and unsupervised depth estimation from monocular images. Most existing ... More
UnrealStereo: Controlling Hazardous Factors to Analyze Stereo VisionDec 14 2016Sep 06 2018A reliable stereo algorithm is critical for many robotics applications. But textureless and specular regions can easily cause failure by making feature matching difficult. Understanding whether an algorithm is robust to these hazardous regions is important. ... More
Compositional Convolutional Networks For Robust Object Classification under OcclusionMay 28 2019May 29 2019Deep convolutional neural networks (DCNNs) are powerful models that yield impressive results at object classification. However, recent work has shown that they do not generalize well to partially occluded objects and to mask attacks. In contrast to DCNNs, ... More
Deep Co-Training for Semi-Supervised Image RecognitionMar 15 2018In this paper, we study the problem of semi-supervised image recognition, which is to learn classifiers using both labeled and unlabeled images. We present Deep Co-Training, a deep learning based method inspired by the Co-Training framework. The original ... More
CLEVR-Ref+: Diagnosing Visual Reasoning with Referring ExpressionsJan 03 2019Apr 06 2019Referring object detection and referring image segmentation are important tasks that require joint understanding of visual information and natural language. Yet there has been evidence that current benchmark datasets suffer from bias, and current state-of-the-art ... More
Multi-stage Multi-recursive-input Fully Convolutional Networks for Neuronal Boundary DetectionMar 24 2017Jul 31 2017In the field of connectomics, neuroscientists seek to identify cortical connectivity comprehensively. Neuronal boundary detection from the Electron Microscopy (EM) images is often done to assist the automatic reconstruction of neuronal circuit. But the ... More
Multi-Instance Visual-Semantic EmbeddingDec 22 2015Visual-semantic embedding models have been recently proposed and shown to be effective for image classification and zero-shot learning, by mapping images into a continuous semantic label space. Although several approaches have been proposed for single-label ... More
Representing Data by a Mixture of Activated SimplicesDec 12 2014We present a new model which represents data as a mixture of simplices. Simplices are geometric structures that generalize triangles. We give a simple geometric understanding that allows us to learn a simplicial structure efficiently. Our method requires ... More
Discovering Internal Representations from Object-CNNs Using Population EncodingNov 21 2015Jan 07 2016In this paper, we provide a method for understanding the internal representations of Convolutional Neural Networks (CNNs) trained on objects. We hypothesize that the information is distributed across multiple neuronal responses and propose a simple clustering ... More
The DLR Hierarchy of Approximate InferenceJul 04 2012We propose a hierarchy for approximate inference based on the Dobrushin, Lanford, Ruelle (DLR) equations. This hierarchy includes existing algorithms, such as belief propagation, and also motivates novel algorithms such as factorized neighbors (FN) algorithms ... More
Training and Evaluating Multimodal Word Embeddings with Large-scale Web Annotated ImagesNov 24 2016In this paper, we focus on training and evaluating effective word embeddings with both text and visual information. More specifically, we introduce a large-scale dataset with 300 million sentences describing over 40 million images crawled and downloaded ... More
InterActive: Inter-Layer Activeness PropagationApr 30 2016An increasing number of computer vision tasks can be tackled with deep features, which are the intermediate outputs of a pre-trained Convolutional Neural Network. Despite the astonishing performance, deep features extracted from low-level neurons are ... More
Geometric Neural Phrase Pooling: Modeling the Spatial Co-occurrence of NeuronsJul 21 2016Deep Convolutional Neural Networks (CNNs) are playing important roles in state-of-the-art visual recognition. This paper focuses on modeling the spatial co-occurrence of neuron responses, which is less studied in the previous work. For this, we consider ... More
SampleAhead: Online Classifier-Sampler Communication for Learning from Synthesized DataApr 01 2018Jul 28 2018State-of-the-art techniques of artificial intelligence, in particular deep learning, are mostly data-driven. However, collecting and manually labeling a large scale dataset is both difficult and expensive. A promising alternative is to introduce synthesized ... More
An Alarm System For Segmentation Algorithm Based On Shape ModelMar 26 2019Mar 27 2019It is usually hard for a learning system to predict correctly on rare events that never occur in the training data, and there is no exception for segmentation algorithms. Meanwhile, manual inspection of each case to locate the failures becomes infeasible ... More
Deep Collaborative Learning for Visual RecognitionMar 03 2017Deep neural networks are playing an important role in state-of-the-art visual recognition. To represent high-level visual concepts, modern networks are equipped with large convolutional layers, which use a large number of filters and contribute significantly ... More
MAT: A Multimodal Attentive Translator for Image CaptioningFeb 18 2017Aug 10 2017In this work we formulate the problem of image captioning as a multimodal translation task. Analogous to machine translation, we present a sequence-to-sequence recurrent neural networks (RNN) model for image caption generation. Different from most existing ... More
Neural Rejuvenation: Improving Deep Network Training by Enhancing Computational Resource UtilizationDec 02 2018In this paper, we study the problem of improving computational resource utilization of neural networks. Deep neural networks are usually over-parameterized for their tasks in order to achieve good performances, thus are likely to have underutilized computational ... More
ELASTIC: Improving CNNs with Instance Specific Scaling PoliciesDec 13 2018Scale variation has been a challenge from traditional to modern approaches in computer vision. Most solutions to scale issues have similar theme: a set of intuitive and manually designed policies that are generic and fixed (e.g. SIFT or feature pyramid). ... More
Spatial Transformer Introspective Neural NetworkMay 16 2018Natural images contain many variations such as illumination differences, affine transformations, and shape distortions. Correctly classifying these variations poses a long standing problem. The most commonly adopted solution is to build large-scale datasets ... More
Unsupervised learning of object semantic parts from internal states of CNNs by population encodingNov 21 2015Nov 12 2016We address the key question of how object part representations can be found from the internal states of CNNs that are trained for high-level tasks, such as object classification. This work provides a new unsupervised method to learn semantic parts and ... More
Deep Supervision for Pancreatic Cyst Segmentation in Abdominal CT ScansJun 22 2017Automatic segmentation of an organ and its cystic region is a prerequisite of computer-aided diagnosis. In this paper, we focus on pancreatic cyst segmentation in abdominal CT scan. This task is important and very useful in clinical practice yet challenging ... More
Gradually Updated Neural Networks for Large-Scale Image RecognitionNov 25 2017Feb 10 2018Depth is one of the keys that make neural networks succeed in the task of large-scale image recognition. The state-of-the-art network architectures usually increase the depths by cascading convolutional layers or building blocks. In this paper, we present ... More
Towards Resisting Large Data Variations via Introspective LearningMay 16 2018Apr 03 2019Learning deep networks which can resist large variations between training and testing data are essential to build accurate and robust image classifiers. Towards this end, a typical strategy is to apply data augmentation to enlarge the training set. However, ... More
ScaleNet: Guiding Object Proposal Generation in Supermarkets and BeyondApr 22 2017Motivated by product detection in supermarkets, this paper studies the problem of object proposal generation in supermarket images and other natural images. We argue that estimation of object scales in images is helpful for generating object proposals, ... More
Rethinking Monocular Depth Estimation with Adversarial TrainingAug 22 2018Jun 15 2019Monocular depth estimation is an extensively studied computer vision problem with a vast variety of applications. Deep learning-based methods have demonstrated promise for both supervised and unsupervised depth estimation from monocular images. Most existing ... More
Scene Graph Parsing as Dependency ParsingMar 25 2018In this paper, we study the problem of parsing structured knowledge graphs from textual descriptions. In particular, we consider the scene graph representation that considers objects together with their attributes and relations: this representation has ... More
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFsJun 02 2016In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous ... More
Detect What You Can: Detecting and Representing Objects using Holistic Models and Body PartsJun 08 2014Detecting objects becomes difficult when we need to deal with large shape deformation, occlusion and low resolution. We propose a novel approach to i) handle large deformations and partial occlusions in animals (as examples of highly deformable objects), ... More
Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic UnderstandingOct 14 2018Learning to estimate 3D geometry in a single frame and optical flow from consecutive frames by watching unlabeled videos via deep convolutional network has made significant process recently. Current state-of-the-art (SOTA) methods treat the tasks independently. ... More
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)Dec 20 2014Jun 11 2015In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated by ... More
Every Pixel Counts ++: Joint Learning of Geometry and Motion with 3D Holistic UnderstandingOct 14 2018Jul 11 2019Learning to estimate 3D geometry in a single frame and optical flow from consecutive frames by watching unlabeled videos via deep convolutional network has made significant progress recently. Current state-of-the-art (SoTA) methods treat the two tasks ... More
DeepVoting: A Robust and Explainable Deep Network for Semantic Part Detection under Partial OcclusionSep 14 2017Mar 29 2018In this paper, we study the task of detecting semantic parts of an object, e.g., a wheel of a car, under partial occlusion. We propose that all models should be trained without seeing occlusions while being able to transfer the learned knowledge to deal ... More
Recurrent Multimodal Interaction for Referring Image SegmentationMar 23 2017Aug 04 2017In this paper we are interested in the problem of image segmentation given natural language descriptions, i.e. referring expressions. Existing works tackle this problem by first modeling images and sentences independently and then segment images by combining ... More
Recurrent Saliency Transformation Network: Incorporating Multi-Stage Visual Cues for Small Organ SegmentationSep 13 2017Apr 08 2018We aim at segmenting small organs (e.g., the pancreas) from abdominal CT scans. As the target often occupies a relatively small region in the input image, deep neural networks can be easily confused by the complex and variable background. To alleviate ... More
Feature Denoising for Improving Adversarial RobustnessDec 09 2018Adversarial attacks to image classification systems present challenges to convolutional networks and opportunities for understanding them. This study suggests that adversarial perturbations on images lead to noise in the features constructed by these ... More
Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image SegmentationJan 10 2019Recently, Neural Architecture Search (NAS) has successfully identified neural network architectures that exceed human designed ones on large-scale image classification problems. In this paper, we study NAS for semantic image segmentation, an important ... More
Progressive Recurrent Learning for Visual RecognitionNov 29 2018Computer vision is difficult, partly because the mathematical function connecting input and output data is often complex, fuzzy and thus hard to learn. A currently popular solution is to design a deep neural network and optimize it on a large-scale dataset. ... More
Towards Accurate Task Accomplishment with Low-Cost Robotic ArmsDec 03 2018Training a robotic arm to accomplish real-world tasks has been attracting increasing attention in both academia and industry. This work discusses the role of computer vision algorithms in this field. We focus on low-cost arms on which no sensors are equipped ... More