Latest in cs.mm

total 2012took 0.16s
Speech2Face: Learning the Face Behind a VoiceMay 23 2019How much can we infer about a person's looks from the way they speak? In this paper, we study the task of reconstructing a facial image of a person from a short audio recording of that person speaking. We design and train a deep neural network to perform ... More
Multiple reconstruction compression framework based on PNG imageMay 22 2019It is shown that neural networks (NNs) achieve excellent performances in image compression and reconstruction. However, there are still many shortcomings in the practical application, which eventually lead to the loss of neural network image processing ... More
Evaluation of 4D Light Field Compression MethodsMay 17 2019Light field data records the amount of light at multiple points in space, captured e.g. by an array of cameras or by a light-field camera that uses microlenses. Since the storage and transmission requirements for such data are tremendous, compression ... More
Reactive Video Caching via long-short-term fusion approachMay 16 2019Video caching has been a basic network functionality in today's network architectures. Although the abundance of caching replacement algorithms has been proposed recently, these methods all suffer from a key limitation: due to their immature rules, inaccurate ... More
EVSO: Environment-aware Video Streaming Optimization of Power ConsumptionMay 16 2019Streaming services gradually support high-quality videos for better user experience. However, streaming high-quality video on mobile devices consumes a considerable amount of energy. This paper presents the design and prototype of EVSO, which achieves ... More
Food Recommendation: Framework, Existing Solutions and ChallengesMay 15 2019A growing proportion of the global population is becoming overweight or obese, leading to various diseases (e.g., diabetes, ischemic heart disease and even cancer) due to unhealthy eating patterns, such as increased intake of food with high energy and ... More
Statistical Learning Based Congestion Control for Real-time Video CommunicationMay 15 2019With the increasing demands on interactive video applications, how to adapt video bit rate to avoid network congestion has become critical, since congestion results in self-inflicted delay and packet loss which deteriorate the quality of real-time video ... More
Statistical Learning Based Congestion Control for Real-time Video CommunicationMay 15 2019May 16 2019With the increasing demands on interactive video applications, how to adapt video bit rate to avoid network congestion has become critical, since congestion results in self-inflicted delay and packet loss which deteriorate the quality of real-time video ... More
SmartBullets: A Cloud-Assisted Bullet Screen Filter based on Deep LearningMay 15 2019Bullet-screen is a technique that enables the website users to send real-time comment `bullet' cross the screen. Compared with the traditional review of a video, bullet-screen provides new features of feeling expression to video watching and more iterations ... More
Learning to Groove with Inverse Sequence TransformationsMay 14 2019We explore models for translating abstract musical ideas (scores, rhythms) into expressive performances using Seq2Seq and recurrent Variational Information Bottleneck (VIB) models. Though Seq2Seq models usually require painstakingly aligned corpora, we ... More
High Capacity Lossless Data Hiding for JPEG images by NLCM Relationship ConstrcutionMay 14 2019In this paper, we propose a high capacity lossless data hiding (LDH) scheme that achieves high embedding capacity and keeps the image quality unchanged. In JPEG bitstream, Huffman coding is adopted to encode image data. In fact, some Huffman codes are ... More
Expression Conditional GAN for Facial Expression-to-Expression TranslationMay 14 2019In this paper, we focus on the facial expression translation task and propose a novel Expression Conditional GAN (ECGAN) which can learn the mapping from one image domain to another one based on an additional expression attribute. The proposed ECGAN is ... More
Reversible data hiding based on reducing invalid shifting of pixels in histogram shiftingMay 14 2019In recent years, reversible data hiding (RDH), a new research hotspot in the field of information security, has been paid more and more attention by researchers. Most of the existing RDH schemes do not fully take it into account that natural image's texture ... More
FPGA-based Binocular Image Feature Extraction and Matching SystemMay 13 2019Image feature extraction and matching is a fundamental but computation intensive task in machine vision. This paper proposes a novel FPGA-based embedded system to accelerate feature extraction and matching. It implements SURF feature point detection and ... More
FPGA-based Binocular Image Feature Extraction and Matching SystemMay 13 2019May 14 2019Image feature extraction and matching is a fundamental but computation intensive task in machine vision. This paper proposes a novel FPGA-based embedded system to accelerate feature extraction and matching. It implements SURF feature point detection and ... More
Group Re-identification via Transferred Single and Couple Representation LearningMay 13 2019Group re-identification (G-ReID) is an important yet less-studied task. Its challenges not only lie in appearance changes of individuals which have been well-investigated in general person re-identification (ReID), but also derive from group layout and ... More
Deep Vocoder: Low Bit Rate Speech Compression of Speech with Deep AutoencoderMay 12 2019Inspired by the success of deep neural networks (DNNs) in speech processing, this paper presents Deep Vocoder, a direct end-to-end low bit rate speech compression method with deep autoencoder (DAE). In Deep Vocoder, DAE is used for extracting the latent ... More
Deep Vocoder: Low Bit Rate Compression of Speech with Deep AutoencoderMay 12 2019May 14 2019Inspired by the success of deep neural networks (DNNs) in speech processing, this paper presents Deep Vocoder, a direct end-to-end low bit rate speech compression method with deep autoencoder (DAE). In Deep Vocoder, DAE is used for extracting the latent ... More
Compressing Weight-updates for Image Artifacts Removal Neural NetworksMay 10 2019In this paper, we present a novel approach for fine-tuning a decoder-side neural network in the context of image compression, such that the weight-updates are better compressible. At encoder side, we fine-tune a pre-trained artifact removal network on ... More
DEMC: A Deep Dual-Encoder Network for Denoising Monte Carlo RenderingMay 10 2019In this paper, we present DEMC, a deep Dual-Encoder network to remove Monte Carlo noise efficiently while preserving details. Denoising Monte Carlo rendering is different from natural image denoising since inexpensive by-products (feature buffers) can ... More
A Taxonomy and Dataset for 360° VideosMay 09 2019In this paper, we propose a taxonomy for 360{\deg} videos that categorizes videos based on moving objects and camera motion. We gathered and produced 28 videos based on the taxonomy, and recorded viewport traces from 60 participants watching the videos. ... More
Reversible Data Hiding in JPEG Images with Multi-objective OptimizationMay 09 2019Among various methods of reversible data hiding (RDH) in JPEG images, the consideration in designing is only the image quality, but the image quality and the file size expansion are equally important in JPEG images. Based on this situation, we propose ... More
Methodology for accurately assessing the quality perceived by users on 360VR contentsMay 09 2019To properly evaluate the performance of 360VR-specific encoding and transmission schemes, and particularly of the solutions based on viewport adaptation, it is necessary to consider not only the bandwidth saved, but also the quality of the portion of ... More
Somewhat Reversible Data Hiding by Image to Image TranslationMay 08 2019The traditional reversible data hiding technique is based on image modification which is more easily analyzed and attacked. With the appearance of generative adversarial networks (GANs), a novel type of image steganography methods without image modification ... More
Learning Cascaded Siamese Networks for High Performance Visual TrackingMay 08 2019Visual tracking is one of the most challenging computer vision problems. In order to achieve high performance visual tracking in various negative scenarios, a novel cascaded Siamese network is proposed and developed based on two different deep learning ... More
Convolutional Neural Networks Considering Local and Global features for Image EnhancementMay 07 2019In this paper, we propose a novel convolutional neural network (CNN) architecture considering both local and global features for image enhancement. Most conventional image enhancement methods, including Retinex-based methods, cannot restore lost pixel ... More
Compressed Image Quality Assessment Based on Saak FeaturesMay 06 2019Compressed image quality assessment plays an important role in image services, especially in image compression applications, which can be utilized as a guidance to optimize image processing algorithms. In this paper, we propose an objective image quality ... More
Compressed Image Quality Assessment Based on Saak FeaturesMay 06 2019May 16 2019Compressed image quality assessment plays an important role in image services, especially in image compression applications, which can be utilized as a guidance to optimize image processing algorithms. In this paper, we propose an objective image quality ... More
A multimodal lossless coding method for skeletons in videosMay 06 2019Nowadays, skeleton information in videos plays an important role in human-centric video analysis but effective coding such massive skeleton information has never been addressed in previous work. In this paper, we make the first attempt to solve this problem ... More
A multimodal lossless coding method for skeletons in videosMay 06 2019May 11 2019Nowadays, skeleton information in videos plays an important role in human-centric video analysis but effective coding such massive skeleton information has never been addressed in previous work. In this paper, we make the first attempt to solve this problem ... More
Few-Shot Unsupervised Image-to-Image TranslationMay 05 2019Unsupervised image-to-image translation methods learn to map images in a given class to an analogous image in a different class, drawing on unstructured (non-registered) datasets of images. While remarkably successful, current methods require access to ... More
Game-theoretic Analysis to Content-adaptive Reversible WatermarkingMay 05 2019While many games were designed for steganography and robust watermarking, few focused on reversible watermarking. We present a two-encoder game related to the rate-distortion optimization of content-adaptive reversible watermarking. In the game, Alice ... More
Learning Spatio-Temporal Features with Two-Stream Deep 3D CNNs for LipreadingMay 04 2019We focus on the word-level visual lipreading, which requires recognizing the word being spoken, given only the video but not the audio. State-of-the-art methods explore the use of end-to-end neural networks, including a shallow (up to three layers) 3D ... More
Time-sync Video Tag Extraction Using Semantic Association GraphMay 03 2019Time-sync comments reveal a new way of extracting the online video tags. However, such time-sync comments have lots of noises due to users' diverse comments, introducing great challenges for accurate and fast video tag extractions. In this paper, we propose ... More
Efficient Discrete Supervised Hashing for Large-scale Cross-modal RetrievalMay 03 2019Supervised cross-modal hashing has gained increasing research interest on large-scale retrieval task owning to its satisfactory performance and efficiency. However, it still has some challenging issues to be further studied: 1) most of them fail to well ... More
Herding Effect based Attention for Personalized Time-Sync Video RecommendationMay 02 2019Time-sync comment (TSC) is a new form of user-interaction review associated with real-time video contents, which contains a user's preferences for videos and therefore well suited as the data source for video recommendations. However, existing review-based ... More
Fully Automatic Brain Tumor Segmentation using a Normalized Gaussian Bayesian Classifier and 3D Fluid Vector FlowMay 01 2019Brain tumor segmentation from Magnetic Resonance Images (MRIs) is an important task to measure tumor responses to treatments. However, automatic segmentation is very challenging. This paper presents an automatic brain tumor segmentation method based on ... More
Learned Image Compression with Soft Bit-based Rate-Distortion OptimizationMay 01 2019This paper introduces the notion of soft bits to address the rate-distortion optimization for learning-based image compression. Recent methods for such compression train an autoencoder end-to-end with an objective to strike a balance between distortion ... More
State-of-the-art in 360° Video/Image Processing: Perception, Assessment and CompressionMay 01 2019Nowadays, 360{\deg} video/image has been increasingly popular and drawn great attention. The spherical viewing range of 360{\deg} video/image accounts for huge data, which pose the challenges to 360{\deg} video/image processing in solving the bottleneck ... More
Effective and Efficient Indexing in Cross-Modal Hashing-Based DatasetsApr 30 2019To overcome the barrier of storage and computation, the hashing technique has been widely used for nearest neighbor search in multimedia retrieval applications recently. Particularly, cross-modal retrieval that searches across different modalities becomes ... More
Deep Learning-Based Video Coding: A Review and A Case StudyApr 29 2019The past decade has witnessed great success of deep learning technology in many disciplines, especially in computer vision and image processing. However, deep learning-based video coding remains in its infancy. This paper reviews the representative works ... More
3D Dynamic Point Cloud Denoising via Spatio-temporal Graph ModelingApr 28 2019The prevalence of accessible depth sensing and 3D laser scanning techniques has enabled the convenient acquisition of 3D dynamic point clouds, which provide efficient representation of arbitrarily-shaped objects in motion. Nevertheless, dynamic point ... More
Supervised Online Hashing via Hadamard Codebook LearningApr 28 2019In recent years, binary code learning, a.k.a hashing, has received extensive attention in large-scale multimedia retrieval. It aims to encode high-dimensional data points to binary codes, hence the original high-dimensional metric space can be efficiently ... More
Video coding technique with parametric modeling of noiseApr 26 2019This paper presents a video encoding method in which noise is encoded using a novel parametric model representing spectral envelope and spatial distribution of energy. The proposed method has been experimentally assessed using video test sequences in ... More
A Noise-aware Enhancement Method for Underexposed ImagesApr 24 2019A novel method of contrast enhancement is proposed for underexposed images, in which heavy noise is hidden. Under low light conditions, images taken by digital cameras have low contrast in dark or bright regions. This is due to a limited dynamic range ... More
Système d'indexation et de recherche de vidéo intégrant un système gestuel pour les personnes handicapéesApr 24 2019The amount of audio-visual information has increased dramatically with the advent of High Speed Internet. Furthermore, technological advances in recent years in the field of information technology, have simplified the use of video data in various fields ... More
Siamese Attentional Keypoint Network for High Performance Visual TrackingApr 23 2019In this paper, we investigate impacts of three main aspects of visual tracking, i.e., the backbone network, the attentional mechanism and the detection component, and propose a Siamese Attentional Keypoint Network, dubbed SATIN, to achieve efficient tracking ... More
A Novel QoE-Aware SDN-enabled, NFV-based Management Architecture for Future Multimedia Applications on 5G SystemsApr 22 2019This paper proposes a novel QoE-aware SDN enabled NFV architecture for controlling and managing Future Multimedia Applications on 5G systems. The aim is to improve the QoE of the delivered multimedia services through the fulfilment of personalized QoE ... More
StegoAppDB: a Steganography Apps Forensics Image DatabaseApr 19 2019In this paper, we present a new reference dataset simulating digital evidence for image steganography. Steganography detection is a digital image forensic topic that is relatively unknown in practical forensics, although stego app use in the wild is on ... More
Listen to the ImageApr 19 2019Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are usually employed ... More
AnonymousNet: Natural Face De-Identification with Measurable PrivacyApr 19 2019With billions of personal images being generated from social media and cameras of all sorts on a daily basis, security and privacy are unprecedentedly challenged. Although extensive attempts have been made, existing face image de-identification techniques ... More
On Acoustic Modeling for Broadband BeamformingApr 18 2019In this work, we describe limitations of the free-field propagation model for designing broadband beamformers for microphone arrays on a rigid surface. Towards this goal, we describe a general framework for quantifying the microphone array performance ... More
Exquisitor: Interactive Learning at LargeApr 18 2019Increasing scale is a dominant trend in today's multimedia collections, which especially impacts interactive applications. To facilitate interactive exploration of large multimedia collections, new approaches are needed that are capable of learning on ... More
Exquisitor: Interactive Learning at LargeApr 18 2019May 04 2019Increasing scale is a dominant trend in today's multimedia collections, which especially impacts interactive applications. To facilitate interactive exploration of large multimedia collections, new approaches are needed that are capable of learning on ... More
Computational Attention System for Children, Adults and ElderlyApr 18 2019The existing computational visual attention systems have focused on the objective to basically simulate and understand the concept of visual attention system in adults. Consequently, the impact of observer's age in scene viewing behavior has rarely been ... More
Deep AutoEncoder-based Lossy Geometry Compression for Point CloudsApr 18 2019Point cloud is a fundamental 3D representation which is widely used in real world applications such as autonomous driving. As a newly-developed media format which is characterized by complexity and irregularity, point cloud creates a need for compression ... More
An efficient multi-language Video Search Engine to facilitate the HADJ and the UMRAApr 17 2019Videos clips became the most important and prominent multimedia document to illustrate the rituals process of Hajj and Umrah. Therefore, it is necessary to develop a system to facilitate access to information related to the duties, the pillars, the stages ... More
SCE: A manifold regularized set-covering method for data partitioningApr 17 2019Cluster analysis plays a very important role in data analysis. In these years, cluster ensemble, as a cluster analysis tool, has drawn much attention for its robustness, stability, and accuracy. Many efforts have been done to combine different initial ... More
Adversarial Cross-Modal Retrieval via Learning and Transferring Single-Modal SimilaritiesApr 17 2019Cross-modal retrieval aims to retrieve relevant data across different modalities (e.g., texts vs. images). The common strategy is to apply element-wise constraints between manually labeled pair-wise items to guide the generators to learn the semantic ... More
Co-Separating Sounds of Visual ObjectsApr 16 2019Learning how objects sound from video is challenging, since they often heavily overlap in a single audio channel. Current methods for visually-guided audio source separation sidestep the issue by training with artificially mixed video clips, but this ... More
Steganographer IdentificationApr 16 2019Conventional steganalysis detects the presence of steganography within single objects. In the real-world, we may face a complex scenario that one or some of multiple users called actors are guilty of using steganography, which is typically defined as ... More
Saliency Prediction on Omnidirectional Images with Generative Adversarial Imitation LearningApr 15 2019When watching omnidirectional images (ODIs), subjects can access different viewports by moving their heads. Therefore, it is necessary to predict subjects' head fixations on ODIs. Inspired by generative adversarial imitation learning (GAIL), this paper ... More
A Personalized Preference Learning Framework for Caching in Mobile NetworksApr 15 2019This paper comprehensively studies a content-centric mobile network based on a preference learning framework, where each mobile user is equipped with finite-size cache. We consider a practical scenario where each user requests a content file according ... More
Proximal binaural sound can induce subjective frissonApr 15 2019Sound frisson is a subjective experience wherein people tend to perceive the feeling of chills in addition to a physiological response, such as goosebumps. Multiple examples of frisson inducing sounds have been reported in the large online community, ... More
Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image TranslationApr 15 2019Cross-view image translation is challenging because it involves images with drastically different views and severe deformation. In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible ... More
Multi-Channel Attention Selection GAN with Cascaded Semantic Guidance for Cross-View Image TranslationApr 15 2019Apr 16 2019Cross-view image translation is challenging because it involves images with drastically different views and severe deformation. In this paper, we propose a novel approach named Multi-Channel Attention SelectionGAN (SelectionGAN) that makes it possible ... More
dipIQ: Blind Image Quality Assessment by Learning-to-Rank Discriminable Image PairsApr 13 2019Objective assessment of image quality is fundamentally important in many image processing tasks. In this work, we focus on learning blind image quality assessment (BIQA) models which predict the quality of a digital image with no access to its original ... More
YouTube UGC Dataset for Video Compression ResearchApr 13 2019Non-professional video, commonly known as User Generated Content (UGC) has become very popular in today's video sharing applications. However, there are few public UGC datasets available for video compression and quality assessment research. This paper ... More
Black-box Adversarial Attacks on Video Recognition ModelsApr 10 2019Deep neural networks (DNNs) are known for their vulnerability to adversarial examples. These are examples that have undergone a small, carefully crafted perturbation, and which can easily fool a DNN into making misclassifications at test time. Thus far, ... More
A Framework for Multi-f0 Modeling in SATB Choir RecordingsApr 10 2019Fundamental frequency (f0) modeling is an important but relatively unexplored aspect of choir singing. Performance evaluation as well as auditory analysis of singing, whether individually or in a choir, often depend on extracting f0 contours for the singing ... More
Neuralogram: A Deep Neural Network Based Representation for Audio SignalsApr 10 2019We propose the Neuralogram -- a deep neural network based representation for understanding audio signals which, as the name suggests, transforms an audio signal to a dense, compact representation based upon embeddings learned via a neural architecture. ... More
Affordance Analysis of Virtual and Augmented Reality Mediated CommunicationApr 09 2019Virtual and augmented reality communication platforms are seen as promising modalities for next-generation remote face-to-face interactions. Our study attempts to explore non-verbal communication features in relation to their conversation context for ... More
Exploring Uncertainty Measures for Image-Caption Embedding-and-Retrieval TaskApr 09 2019With the wide development of black-box machine learning algorithms, particularly deep neural network (DNN), the practical demand for the reliability assessment is rapidly rising. On the basis of the concept that `Bayesian deep learning knows what it does ... More
Weakly Supervised Video Moment Retrieval From Text QueriesApr 05 2019There have been a few recent methods proposed in text to video moment retrieval using natural language queries, but requiring full supervision during training. However, acquiring a large number of training videos with temporal boundary annotations for ... More
MMED: A Multi-domain and Multi-modality Event DatasetApr 04 2019In this work, we construct and release a multi-domain and multi-modality event dataset (MMED), containing 25,165 textual news articles collected from hundreds of news media sites (e.g., Yahoo News, Google News, CNN News.) and 76,516 image posts shared ... More
MMED: A Multi-domain and Multi-modality Event DatasetApr 04 2019Apr 09 2019In this work, we construct and release a multi-domain and multi-modality event dataset (MMED), containing 25,165 textual news articles collected from hundreds of news media sites (e.g., Yahoo News, Google News, CNN News.) and 76,516 image posts shared ... More
Orthogonal Voronoi Diagram and TreemapApr 04 2019In this paper, we propose a novel space partitioning strategy for implicit hierarchy visualization such that the new plot not only has a tidy layout similar to the treemap, but also is flexible to data changes similar to the Voronoi treemap. To achieve ... More
A Comparative Study on Hierarchical Navigable Small World GraphsApr 03 2019Apr 12 2019Hierarchical navigable small world (HNSW) graphs get more and more popular on large-scale nearest neighbor search tasks since the source codes were released two years ago. The attractiveness of this approach lies in its superior performance over most ... More
A Comparative Study on Hierarchical Navigable Small World GraphsApr 03 2019Hierarchical navigable small world (HNSW) graphs get more and more popular on large-scale nearest neighbor search tasks since the source codes were released two years ago. The attractiveness of this approach lies in its superior performance over most ... More
A Comparative Study on Hierarchical Navigable Small World GraphsApr 03 2019Apr 06 2019Hierarchical navigable small world (HNSW) graphs get more and more popular on large-scale nearest neighbor search tasks since the source codes were released two years ago. The attractiveness of this approach lies in its superior performance over most ... More
SADIH: Semantic-Aware DIscrete HashingApr 03 2019Due to its low storage cost and fast query speed, hashing has been recognized to accomplish similarity search in large-scale multimedia retrieval applications. Particularly supervised hashing has recently received considerable research attention by leveraging ... More
SADIH: Semantic-Aware DIscrete HashingApr 03 2019Apr 16 2019Due to its low storage cost and fast query speed, hashing has been recognized to accomplish similarity search in large-scale multimedia retrieval applications. Particularly supervised hashing has recently received considerable research attention by leveraging ... More
Source Camera Attribution of Multi-Format DevicesApr 02 2019Photo Response Non-Uniformity (PRNU) based source camera attribution is an effective method to determine the origin camera of visual media (an image or a video). However, given that modern devices, especially smartphones, capture images, and videos at ... More
The bilateral solver for quality estimation based multi-focus image fusionApr 01 2019In this work, a fast Bilateral Solver for Quality Estimation Based multi-focus Image Fusion method (BS-QEBIF) is proposed. The all-in-focus image is generated by pixel-wise summing up the multi-focus source images with their focus-levels maps as weights. ... More
Constructing Hierarchical Q&A Datasets for Video Story UnderstandingApr 01 2019Video understanding is emerging as a new paradigm for studying human-like AI. Question-and-Answering (Q&A) is used as a general benchmark to measure the level of intelligence for video understanding. While several previous studies have suggested datasets ... More
Layered Image Compression using Scalable Auto-encoderApr 01 2019This paper presents a novel convolutional neural network (CNN) based image compression framework via scalable auto-encoder (SAE). Specifically, our SAE based deep image codec consists of hierarchical coding layers, each of which is an end-to-end optimized ... More
BlackMarks: Blackbox Multibit Watermarking for Deep Neural NetworksMar 31 2019Deep Neural Networks have created a paradigm shift in our ability to comprehend raw data in various important fields ranging from computer vision and natural language processing to intelligence warfare and healthcare. While DNNs are increasingly deployed ... More
Learning Affective Correspondence between Music and ImageMar 30 2019Apr 17 2019We introduce the problem of learning affective correspondence between audio (music) and visual data (images). For this task, a music clip and an image are considered similar (having true correspondence) if they have similar emotion content. In order to ... More
Learning Affective Correspondence between Music and ImageMar 30 2019We introduce the problem of learning affective correspondence between audio (music) and visual data (images). For this task, a music clip and an image are considered similar (having true correspondence) if they have similar emotion content. In order to ... More
A Study on the Characteristics of Douyin Short Videos and Implications for Edge CachingMar 29 2019Douyin, internationally known as TikTok, has become one of the most successful short-video platforms. To maintain its popularity, Douyin has to provide better Quality of Experience (QoE) to its growing user base. Understanding the characteristics of Douyin ... More
Quality Assessment of Free-viewpoint Videos by Quantifying the Elastic Changes of Multi-Scale Motion TrajectoriesMar 28 2019Virtual viewpoints synthesis is an essential process for many immersive applications including Free-viewpoint TV (FTV). A widely used technique for viewpoints synthesis is Depth-Image-Based-Rendering (DIBR) technique. However, such techniques may introduce ... More
GANs-NQM: A Generative Adversarial Networks based No Reference Quality Assessment Metric for RGB-D Synthesized ViewsMar 28 2019In this paper, we proposed a no-reference (NR) quality metric for RGB plus image-depth (RGB-D) synthesis images based on Generative Adversarial Networks (GANs), namely GANs-NQM. Due to the failure of the inpainting on dis-occluded regions in RGB-D synthesis ... More
Universal chosen-ciphertext attack for a family of image encryption schemesMar 28 2019During the past decades, there is a great popularity employing nonlinear dynamics and permutation-substitution architecture for image encryption. There are three primary procedures in such encryption schemes, the key schedule module for producing encryption ... More
SRDGAN: learning the noise prior for Super Resolution with Dual Generative Adversarial NetworksMar 28 2019Single Image Super Resolution (SISR) is the task of producing a high resolution (HR) image from a given low-resolution (LR) image. It is a well researched problem with extensive commercial applications such as digital camera, video compression, medical ... More
Resource Allocation Mechanism for Media Handling Services in Cloud Multimedia ConferencingMar 27 2019Multimedia conferencing is the conversational exchange of multimedia content between multiple parties. It has a wide range of applications (e.g., Massively Multiplayer Online Games (MMOGs) and distance learning). Media handling services (e.g., video mixing, ... More
Cross-modal subspace learning with Kernel correlation maximization and Discriminative structure preservingMar 26 2019The measure between heterogeneous data is still an open problem. Many research works have been developed to learn a common subspace where the similarity between different modalities can be calculated. However, most of existing works focus on learning ... More
Unsupervised Concatenation Hashing with Sparse Constraint for Cross-Modal RetrievalMar 26 2019With the advantage of low storage cost and high efficiency, hashing learning has received much attention in retrieval field. As multiple modal data representing a common object semantically are complementary, many works focus on learning unified binary ... More
Mask-ShadowGAN: Learning to Remove Shadows from Unpaired DataMar 26 2019This paper presents a new method for shadow removal using unpaired data, enabling us to avoid tedious annotations and obtain more diverse training samples. However, directly employing adversarial learning and cycle-consistency constraints is insufficient ... More
Wav2Pix: Speech-conditioned Face Generation using Generative Adversarial NetworksMar 25 2019Speech is a rich biometric signal that contains information about the identity, gender and emotional state of the speaker. In this work, we explore its potential to generate face images of a speaker by conditioning a Generative Adversarial Network (GAN) ... More
Analysis of Rolling Shutter Effect on ENF based Video ForensicsMar 23 2019ENF is a time-varying signal of the frequency of mains electricity in a power grid. It continuously fluctuates around a nominal value (50/60 Hz) due to changes in supply and demand of power over time. Depending on these ENF variations, the luminous intensity ... More