Latest in cs.mm

total 1341took 0.12s
Effectiveness of Crypto-Transcoding for H.264/AVC and HEVC Video Bit-streamsFeb 19 2019To avoid delays arising from a need to decrypt a video prior to transcoding and then re-encrypt it afterwards, this paper assesses a selective encryption (SE) content protection scheme. The scheme is suited to both recent standardized codecs, namely H.264/Advanced ... More
Multimodal music information processing and retrieval: survey and future challengesFeb 14 2019Towards improving the performance in various music information processing tasks, recent studies exploit different modalities able to capture diverse aspects of music. Such modalities include audio recordings, symbolic music scores, mid-level representations, ... More
Multi-task learning with compressible features for Collaborative IntelligenceFeb 14 2019A promising way to deploy Artificial Intelligence (AI)-based services on mobile devices is to run a part of the AI model (a deep neural network) on the mobile itself, and the rest in the cloud. This is sometimes referred to as collaborative intelligence. ... More
Development of Video Frame Enhancement Technique Using Pixel Intensity AnalysisFeb 13 2019This paper developed a brightness enhancement technique for video frame pixel intensity improvement. Frames extracted from the six sample video data used in this work were stored in the form of images in a buffer. Noise was added to the extracted image ... More
Super-Resolution of Brain MRI Images using Overcomplete Dictionaries and Nonlocal SimilarityFeb 13 2019Recently, the Magnetic Resonance Imaging (MRI) images have limited and unsatisfactory resolutions due to various constraints such as physical, technological and economic considerations. Super-resolution techniques can obtain high-resolution MRI images. ... More
Cross-Modal Music Retrieval and Applications: An Overview of Key MethodologiesFeb 12 2019There has been a rapid growth of digitally available music data, including audio recordings, digitized images of sheet music, album covers and liner notes, and video clips. This huge amount of data calls for retrieval strategies that allow users to explore ... More
A block-based inter-band predictor using multilayer propagation neural network for hyperspectral image compressionFeb 12 2019In this paper, a block-based inter-band predictor (BIP) with multilayer propagation neural network model (MLPNN) is presented by a completely new framework. This predictor can combine with diversity entropy coding methods. Hyperspectral (HS) images are ... More
Occupancy-map-based rate distortion optimization for video-based point cloud compressionFeb 11 2019The state-of-the-art video-based point cloud compression scheme projects the 3D point cloud to 2D patch by patch and organizes the patches into frames to compress them using the efficient video compression scheme. Such a scheme shows a good trade-off ... More
Towards an All-Purpose Content-Based Multimedia Information Retrieval SystemFeb 11 2019The growth of multimedia collections - in terms of size, heterogeneity, and variety of media types - necessitates systems that are able to conjointly deal with several forms of media, especially when it comes to searching for particular objects. However, ... More
Multi-tier Caching Analysis in CDN-based Over-the-top Video Streaming SystemsFeb 11 2019Internet video traffic has been been rapidly increasing and is further expected to increase with the emerging 5G applications such as higher definition videos, IoT and augmented/virtual reality applications. As end-users consume video in massive amounts ... More
Generative Moment Matching Network-based Random Modulation Post-filter for DNN-based Singing Voice Synthesis and Neural Double-trackingFeb 09 2019This paper proposes a generative moment matching network (GMMN)-based post-filter that provides inter-utterance pitch variation for deep neural network (DNN)-based singing voice synthesis. The natural pitch variation of a human singing voice leads to ... More
A Fast Iterative Method for Removing Sparse Noise from Sparse SignalsFeb 08 2019In this paper, we propose a new method to reconstruct a signal corrupted by noise where both signal and noise are sparse but in different domains. The problem investigated in this paper arises in different applications such as impulsive noise cancellation ... More
License Plate Recognition with Compressive Sensing Based Feature ExtractionFeb 07 2019License plate recognition is the key component to many automatic traffic control systems. It enables the automatic identification of vehicles in many applications. Such systems must be able to identify vehicles from images taken in various conditions ... More
Iris Image Processing in Compressive Sensing ScenarioFeb 07 2019This paper observes the application of the Compressive Sensing in reconstruction of the under-sampled iris images. Iris recognition represents form of biometric identification whose usage in real applications is growing. Compressive Sensing represents ... More
Vignette: Perceptual Compression for Video Storage and Processing SystemsFeb 04 2019Compressed videos constitute 70% of Internet traffic, and video upload growth rates far outpace compute and storage improvement trends. Past work in leveraging perceptual cues like saliency, i.e., regions where viewers focus their perceptual attention, ... More
Data Driven Analysis of Tiny Touchscreen Performance with MicroJamFeb 02 2019The widespread adoption of mobile devices, such as smartphones and tablets, has made touchscreens a common interface for musical performance. New mobile musical instruments have been designed that embrace collaborative creation and that explore the affordances ... More
Benefiting from Duplicates of Compressed Data: Shift-Based Holographic Compression of ImagesJan 30 2019Feb 07 2019Storage systems often rely on multiple copies of the same compressed data, enabling recovery in case of binary data errors, of course, at the expense of a higher storage cost. In this paper we show that a wiser method of duplication entails great potential ... More
A study for Image compression using Re-Pair algorithmJan 30 2019Feb 07 2019The compression is an important topic in computer science which allows we to storage more amount of data on our data storage. There are several techniques to compress any file. In this manuscript will be described the most important algorithm to compress ... More
On basis images for the digital image representationJan 23 2019Digital array orthogonal transformations that can be presented as a decomposition over basis items or basis images are considered. The orthogonal transform provides digital data scattering, a process of pixel energy redistributing, that is illustrated ... More
Efficient Image Splicing Localization via Contrastive Feature ExtractionJan 22 2019In this work, we propose a new data visualization and clustering technique for discovering discriminative structures in high-dimensional data. This technique, referred to as cPCA++, utilizes the fact that the interesting features of a "target" dataset ... More
SteganoGAN: High Capacity Image Steganography with GANsJan 12 2019Jan 30 2019Image steganography is a procedure for hiding messages inside pictures. While other techniques such as cryptography aim to prevent adversaries from reading the secret message, steganography aims to hide the presence of the message itself. In this paper, ... More
Somatic Practices for Understanding Real, Imagined, and Virtual RealitiesJan 11 2019In most VR experiences, the visual sense dominates other modes of sensory input, encouraging non-visual senses to respond as if the visual were real. The simulated visual world thus becomes a sort of felt actuality, where the 'actual' physical body and ... More
Handcrafted vs Deep Learning Classification for Scalable Video QoE ModelingJan 10 2019Mobile video traffic is dominant in cellular and enterprise wireless networks. With the advent of diverse applications, network administrators face the challenge to provide high QoE in the face of diverse wireless conditions and application contents. ... More
A Spatial-temporal 3D Human Pose Reconstruction FrameworkJan 08 2019Jan 10 20193D human pose reconstruction from single-view camera is a difficult and challenging topic. Many approaches have been proposed, but almost focusing on frame-by-frame independently while inter-frames are highly correlated in a pose sequence. In contrast, ... More
Visual Distortions in 360-degree VideosJan 07 2019Omnidirectional (or 360-degree) images and videos are emergent signals in many areas such as robotics and virtual/augmented reality. In particular, for virtual reality, they allow an immersive experience in which the user is provided with a 360-degree ... More
AVA-ActiveSpeaker: An Audio-Visual Dataset for Active Speaker DetectionJan 05 2019Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings, speech enhancement, and human-robot interaction. The absence of a large, carefully labeled audio-visual ... More
Introduction to Voice Presentation Attack Detection and Recent AdvancesJan 04 2019Over the past few years significant progress has been made in the field of presentation attack detection (PAD) for automatic speaker recognition (ASV). This includes the development of new speech corpora, standard evaluation protocols and advancements ... More
End-to-End Model for Speech Enhancement by Consistent Spectrogram MaskingJan 02 2019Recently, phase processing is attracting increasinginterest in speech enhancement community. Some researchersintegrate phase estimations module into speech enhancementmodels by using complex-valued short-time Fourier transform(STFT) spectrogram based ... More
Security analysis of a self-embedding fragile image watermark schemeDec 31 2018Recently, a self-embedding fragile watermark scheme based on reference-bits interleaving and adaptive selection of embedding mode is proposed. Reference bits are derived from the scrambled MSB bits of a cover image, and then are combined with authentication ... More
TextNet: Irregular Text Reading from Images with an End-to-End Trainable NetworkDec 24 2018Reading text from images remains challenging due to multi-orientation, perspective distortion and especially the curved nature of irregular text. Most of existing approaches attempt to solve the problem in two or multiple stages, which is considered to ... More
The Prefetch Aggressiveness Tradeoff in 360$^{\circ}$ Video StreamingDec 18 2018With 360$^{\circ}$ video, only a limited fraction of the full view is displayed at each point in time. This has prompted the design of streaming delivery techniques that allow alternative playback qualities to be delivered for each candidate viewing direction. ... More
BandNet: A Neural Network-based, Multi-Instrument Beatles-Style MIDI Music Composition MachineDec 18 2018In this paper, we propose a recurrent neural network (RNN)-based MIDI music composition machine that is able to learn musical knowledge from existing Beatles' songs and generate music in the style of the Beatles with little human intervention. In the ... More
Robust Graph Learning from Noisy DataDec 17 2018Learning graphs from data automatically has shown encouraging performance on clustering and semisupervised learning tasks. However, real data are often corrupted, which may cause the learned graph to be inexact or unreliable. In this paper, we propose ... More
VideoMem: Constructing, Analyzing, Predicting Short-term and Long-term Video MemorabilityDec 05 2018Humans share a strong tendency to memorize/forget some of the visual information they encounter. This paper focuses on providing computational models for the prediction of the intrinsic memorability of visual content. To address this new challenge, we ... More
Improving Semantic Segmentation via Video Propagation and Label RelaxationDec 04 2018Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy ... More
Sewer Rats in Teaching Action: An explorative field study on students' perception of a game-based learning app in graduate engineering educationNov 24 2018Game-based technologies and mobile learning aids open up many opportunities for learners; however, evidence-based decisions on their appropriate use are necessary. This explorative study (N = 100) examines the role of game elements in university education ... More
Tiyuntsong: A Self-Play Reinforcement Learning Approach for ABR Video StreamingNov 15 2018Existing reinforcement learning(RL)-based adaptive bitrate(ABR) approaches outperform the previous fixed control rules based methods by improving the Quality of Experience(QoE) score, while the QoE metric can hardly provide clear guidance for optimization, ... More
Referenceless Performance Evaluation of Audio Source Separation using Deep Neural NetworksNov 01 2018Current performance evaluation for audio source separation depends on comparing the processed or separated signals with reference signals. Therefore, common performance evaluation toolkits are not applicable to real-world situations where the ground truth ... More
Modeling Melodic Feature Dependency with Modularized Variational Auto-EncoderOct 31 2018Automatic melody generation has been a long-time aspiration for both AI researchers and musicians. However, learning to generate euphonious melodies has turned out to be highly challenging. This paper introduces 1) a new variant of variational autoencoder ... More
Domain Adaptation for Semantic Segmentation via Class-Balanced Self-TrainingOct 18 2018Oct 25 2018Recent deep networks achieved state of the art performance on a variety of semantic segmentation tasks. Despite such progress, these models often face challenges in real world `wild tasks' where large difference between labeled training/source data and ... More
UnrealROX: An eXtremely Photorealistic Virtual Reality Environment for Robotics Simulations and Synthetic Data GenerationOct 16 2018Data-driven algorithms have surpassed traditional techniques in almost every aspect in robotic vision problems. Such algorithms need vast amounts of quality data to be able to work properly after their training process. Gathering and annotating that sheer ... More
An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist ContinuationOct 02 2018The ACM Recommender Systems Challenge 2018 focused on the task of automatic music playlist continuation, which is a form of the more general task of sequential recommendation. Given a playlist of arbitrary length with some additional meta-data, the task ... More
Older Adults and Crowdsourcing: Android TV App for Evaluating TEDx Subtitle QualitySep 29 2018In this paper we describe the insights from an exploratory qualitative pilot study testing the feasibility of a solution that would encourage older adults to participate in online crowdsourcing tasks in a non-computer scenario. Therefore, we developed ... More
Multi-View Frame Reconstruction with Conditional GANSep 27 2018Multi-view frame reconstruction is an important problem particularly when multiple frames are missing and past and future frames within the camera are far apart from the missing ones. Realistic coherent frames can still be reconstructed using corresponding ... More
Adversarial Training Towards Robust Multimedia Recommender SystemSep 19 2018Jan 07 2019With the prevalence of multimedia content on the Web, developing recommender solutions that can effectively leverage the rich signal in multimedia data is in urgent need. Owing to the success of deep neural networks in representation learning, recent ... More
Intermediate Deep Feature Compression: the Next Battlefield of Intelligent SensingSep 17 2018The recent advances of hardware technology have made the intelligent analysis equipped at the front-end with deep learning more prevailing and practical. To better enable the intelligent sensing at the front-end, instead of compressing and transmitting ... More
Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised DetectionSep 16 2018Multi-label image classification is a fundamental but challenging task towards general visual understanding. Existing methods found the region-level cues (e.g., features from RoIs) can facilitate multi-label classification. Nevertheless, such methods ... More
Deep Learning of Human Perception in Audio Event ClassificationSep 03 2018In this paper, we introduce our recent studies on human perception in audio event classification by different deep learning models. In particular, the pre-trained model VGGish is used as feature extractor to process audio data, and DenseNet is trained ... More
Activity Recognition on a Large Scale in Short Videos - Moments in Time DatasetSep 01 2018Sep 13 2018Moments capture a huge part of our lives. Accurate recognition of these moments is challenging due to the diverse and complex interpretation of the moments. Action recognition refers to the act of classifying the desired action/activity present in a given ... More
Large-Scale Cover Song Detection in Digital Music Libraries Using Metadata, Lyrics and Audio FeaturesAug 30 2018Cover song detection is a very relevant task in Music Information Retrieval (MIR) studies and has been mainly addressed using audio-based systems. Despite its potential impact in industrial contexts, low performances and lack of scalability have prevented ... More
Representation Learning for Image-based Music RecommendationAug 28 2018Aug 29 2018Image perception is one of the most direct ways to provide contextual information about a user concerning his/her surrounding environment; hence images are a suitable proxy for contextual recommendation. We propose a novel representation learning framework ... More
Patch-based Contour Prior Image Denoising for Salt and Pepper NoiseAug 26 2018The salt and pepper noise brings a significant challenge to image denoising technology, i.e. how to removal the noise clearly and retain the details effectively? In this paper, we propose a patch-based contour prior denoising approach for salt and pepper ... More
Towards Machine Learning-Based Optimal HASAug 24 2018Mobile video consumption is increasing and sophisticated video quality adaptation strategies are required to deal with mobile throughput fluctuations. These adaptation strategies have to keep the switching frequency low, the average quality high and prevent ... More
IceBreaker: Solving Cold Start Problem for Video Recommendation EnginesAug 16 2018Internet has brought about a tremendous increase in content of all forms and, in that, video content constitutes the major backbone of the total content being published as well as watched. Thus it becomes imperative for video recommendation engines such ... More
Question-Guided Hybrid Convolution for Visual Question AnsweringAug 08 2018In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial ... More
Simultaneous Edge Alignment and LearningAug 06 2018Oct 26 2018Edge detection is among the most fundamental vision problems for its role in perceptual grouping and its wide applications. Recent advances in representation learning have led to considerable improvements in this area. Many state of the art edge detection ... More
X-GANs: Image Reconstruction Made Easy for Extreme CasesAug 06 2018Image reconstruction including image restoration and denoising is a challenging problem in the field of image computing. We present a new method, called X-GANs, for reconstruction of arbitrary corrupted resource based on a variant of conditional generative ... More
The Importance of Context When Recommending TV Content: Dataset and AlgorithmsJul 30 2018Home entertainment systems feature in a variety of usage scenarios with one or more simultaneous users, for whom the complexity of choosing media to consume has increased rapidly over the last decade. Users' decision processes are complex and highly influenced ... More
Semi-supervised Deep Generative Modelling of Incomplete Multi-Modality Emotional DataJul 27 2018There are threefold challenges in emotion recognition. First, it is difficult to recognize human's emotional states only considering a single modality. Second, it is expensive to manually annotate the emotional data. Third, emotional data often suffers ... More
Who is the director of this movie? Automatic style recognition based on shot featuresJul 25 2018We show how low-level formal features, such as shot duration, meant as length of camera takes, and shot scale, i.e. the distance between the camera and the subject, are distinctive of a director's style in art movies. So far such features were thought ... More
Invisible Steganography via Generative Adversarial NetworksJul 23 2018Oct 10 2018Nowadays, there are plenty of works introducing convolutional neural networks (CNNs) to the steganalysis and exceeding conventional steganalysis algorithms. These works have shown the improving potential of deep learning in information hiding domain. ... More
A Convolutional Neural Networks Denoising Approach for Salt and Pepper NoiseJul 21 2018The salt and pepper noise, especially the one with extremely high percentage of impulses, brings a significant challenge to image denoising. In this paper, we propose a non-local switching filter convolutional neural network denoising algorithm, named ... More
SoniControl - A Mobile Ultrasonic FirewallJul 19 2018The exchange of data between mobile devices in the near-ultrasonic frequency band is a new promising technology for near field communication (NFC) but also raises a number of privacy concerns. We present the first ultrasonic firewall that reliably detects ... More
Audio-to-Score Alignment using Transposition-invariant FeaturesJul 19 2018Audio-to-score alignment is an important pre-processing step for in-depth analysis of classical music. In this paper, we apply novel transposition-invariant audio features to this task. These low-dimensional features represent local pitch intervals and ... More
Photo-unrealistic Image Enhancement for Subject Placement in Outdoor PhotographyJul 17 2018Camera display reflections are an issue in bright light situations, as they may prevent users from correctly positioning the subject in the picture. We propose a software solution to this problem, which consists in modifying the image in the viewer, in ... More
Competitive Analysis System for Theatrical Movie Releases Based on Movie Trailer Deep Video RepresentationJul 12 2018Audience discovery is an important activity at major movie studios. Deep models that use convolutional networks to extract frame-by-frame features of a movie trailer and represent it in a form that is suitable for prediction are now possible thanks to ... More
Tracking Emerges by Colorizing VideosJun 25 2018Jul 27 2018We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision. We leverage the natural temporal coherency of color to create a model that learns to colorize gray-scale videos by copying colors from a reference ... More
Confidence Interval Estimators for MOS ValuesJun 04 2018For the quantification of QoE, subjects often provide individual rating scores on certain rating scales which are then aggregated into Mean Opinion Scores (MOS). From the observed sample data, the expected value is to be estimated. While the sample average ... More
Patch-Based Image Hallucination for Super Resolution with Detail Reconstruction from Similar Sample ImagesJun 03 2018Image hallucination and super-resolution have been studied for decades, and many approaches have been proposed to upsample low-resolution images using information from the images themselves, multiple example images, or large image databases. However, ... More
Deep Segment Hash Learning for Music GenerationMay 30 2018Music generation research has grown in popularity over the past decade, thanks to the deep learning revolution that has redefined the landscape of artificial intelligence. In this paper, we propose a novel approach to music generation inspired by musical ... More
Surface Light Field Compression using a Point Cloud CodecMay 29 2018Light field (LF) representations aim to provide photo-realistic, free-viewpoint viewing experiences. However, the most popular LF representations are images from multiple views. Multi-view image-based representations generally need to restrict the range ... More
Ask No More: Deciding when to guess in referential visual dialogueMay 17 2018Jun 12 2018Our goal is to explore how the abilities brought in by a dialogue manager can be included in end-to-end visually grounded conversational agents. We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a ... More
Convolutional Neural Network Architecture for Recovering Watermark SynchronizationMay 16 2018Since real-time contents can be captured and downloaded very easily, copyright infringement has become a serious problem. In order to reduce the loss caused by copyright infringement, copyright owners insert a watermark in the content to protect the copyright ... More
Robust curvelet domain watermarking technique that preserves cleanness of high quality imagesMay 16 2018Watermarking inserts invisible data into content to protect copyright. The embedded information provides proof of authorship and facilitates tracking illegal distribution, etc. Current robust watermarking techniques have been proposed to preserve inserted ... More
Low Rank Tensor Completion for Multiway Visual DataMay 08 2018Tensor completion recovers missing entries of multiway data. Teh missing of entries could often be caused during teh data acquisition and transformation. In dis paper, we provide an overview of recent development in low rank tensor completion for estimating ... More
QARC: Video Quality Aware Rate Control for Real-Time Video Streaming via Deep Reinforcement LearningMay 07 2018Oct 27 2018Due to the fluctuation of throughput under various network conditions, how to choose a proper bitrate adaptively for real-time video streaming has become an upcoming and interesting issue. Recent work focuses on providing high video bitrates instead of ... More
Fine-Grained Facial Expression Analysis Using Dimensional Emotion ModelMay 02 2018Automated facial expression analysis has a variety of applications in human-computer interaction. Traditional methods mainly analyze prototypical facial expressions of no more than eight discrete emotions as a classification task. However, in practice, ... More
Delay-Constrained Rate Control for Real-Time Video Streaming with Bounded Neural NetworkMay 02 2018Rate control is widely adopted during video streaming to provide both high video qualities and low latency under various network conditions. However, despite that many work have been proposed, they fail to tackle one major problem: previous methods determine ... More
Dynamic Adaptive Point Cloud StreamingApr 29 2018High-quality point clouds have recently gained interest as an emerging form of representing immersive 3D graphics. Unfortunately, these 3D media are bulky and severely bandwidth intensive, which makes it difficult for streaming to resource-limited and ... More
Off the Beaten Track: Using Deep Learning to Interpolate Between Music GenresApr 25 2018May 02 2018We describe a system based on deep learning that generates drum patterns in the electronic dance music domain. Experimental results reveal that generated patterns can be employed to produce musically sound and creative transitions between different genres, ... More
Cross-Modal Retrieval with Implicit Concept AssociationApr 12 2018Apr 25 2018Traditional cross-modal retrieval assumes explicit association of concepts across modalities, where there is no ambiguity in how the concepts are linked to each other, e.g., when we do the image search with a query "dogs", we expect to see dog images. ... More
Adaptive Spatial Steganography Based on Probability-Controlled Adversarial ExamplesApr 08 2018Apr 10 2018Deep learning model is vulnerable to adversarial attack, which generates special input sample to the deep learning model that can make the model misclassify the sample. Besides deep learning model, adversarial attack is also effective to feature-based ... More
Weakening the Detecting Capability of CNN-based SteganalysisMar 29 2018Recently, the application of deep learning in steganalysis has drawn many researchers' attention. Most of the proposed steganalytic deep learning models are derived from neural networks applied in computer vision. These kinds of neural networks have distinguished ... More
Joint Rate Allocation with Both Look-ahead And Feedback Model For High Efficiency Video CodingMar 15 2018The objective of joint rate allocation among multiple coded video streams is to share the bandwidth to meet the demands of minimum average distortion (minAVE) or minimum distortion variance (minVAR). In previous works on minVAR problems, bits are directly ... More
Raw Multi-Channel Audio Source Separation using Multi-Resolution Convolutional Auto-EncodersMar 02 2018Supervised multi-channel audio source separation requires extracting useful spectral, temporal, and spatial features from the mixed signals. The success of many existing systems is therefore largely dependent on the choice of features used for training. ... More
Perceptual Quality Assessment of Immersive Images Considering Peripheral Vision ImpactFeb 25 2018Conventional images/videos are often rendered within the central vision area of the human visual system (HVS) with uniform quality. Recent virtual reality (VR) device with head mounted display (HMD) extends the field of view (FoV) significantly to include ... More
Viewport Adaptation-Based Immersive Video Streaming: Perceptual Modeling and ApplicationsFeb 16 2018Immersive video offers the freedom to navigate inside virtualized environment. Instead of streaming the bulky immersive videos entirely, a viewport (also referred to as field of view, FoV) adaptive streaming is preferred. We often stream the high-quality ... More
Similarity measures for vocal-based drum sample retrieval using deep convolutional auto-encodersFeb 14 2018The expressive nature of the voice provides a powerful medium for communicating sonic ideas, motivating recent research on methods for query by vocalisation. Meanwhile, deep learning methods have demonstrated state-of-the-art results for matching vocal ... More
MemeSequencer: Sparse Matching for Embedding Image MacrosFeb 14 2018The analysis of the creation, mutation, and propagation of social media content on the Internet is an essential problem in computational social science, affecting areas ranging from marketing to political mobilization. A first step towards understanding ... More
Learning to score and summarize figure skating sport videosFeb 08 2018This paper focuses on fully understanding the figure skating sport videos. In particular, we present a large-scale figure skating sport video dataset, which include 500 figure skating videos. On average, the length of each video is 2 minute and 50 seconds. ... More
Fine-Grained Land Use Classification at the City Scale Using Ground-Level ImagesFeb 07 2018We perform fine-grained land use mapping at the city scale using ground-level images. Mapping land use is considerably more difficult than mapping land cover and is generally not possible using overhead imagery as it requires close-up views and seeing ... More
Computer-Aided Annotation for Video Tampering Dataset of Forensic ResearchFeb 07 2018The annotation of video tampering dataset is a boring task that takes a lot of manpower and financial resources. At present, there is no published literature which is capable to improve the annotation efficiency of forged videos. We presented a computer-aided ... More
The New Modality: Emoji Challenges in Prediction, Anticipation, and RetrievalJan 30 2018Feb 02 2018Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages. We propose to treat these ideograms as a new modality in their own right, distinct in their semantic ... More
Food recognition and recipe analysis: integrating visual content, context and external knowledgeJan 22 2018The central role of food in our individual and social life, combined with recent technological advances, has motivated a growing interest in applications that help to better monitor dietary habits as well as the exploration and retrieval of food-related ... More
How to augment a small learning set for improving the performances of a CNN-based steganalyzer?Jan 12 2018Feb 03 2018Deep learning and convolutional neural networks (CNN) have been intensively used in many image processing topics during last years. As far as steganalysis is concerned, the use of CNN allows reaching the state-of-the-art results. The performances of such ... More
Fake Colorized Image DetectionJan 09 2018Jan 14 2018Image forensics aims to detect the manipulation of digital images. Currently, splicing detection, copy-move detection and image retouching detection are drawing much attentions from researchers. However, image editing techniques develop with time goes ... More
Text Extraction and Retrieval from Smartphone Screenshots: Building a Repository for Life in MediaJan 04 2018Daily engagement in life experiences is increasingly interwoven with mobile device use. Screen capture at the scale of seconds is being used in behavioral studies and to implement "just-in-time" health interventions. The increasing psychological breadth ... More
Field Studies with Multimedia Big Data: Opportunities and Challenges (Extended Version)Dec 28 2017Social multimedia users are increasingly sharing all kinds of data about the world. They do this for their own reasons, not to provide data for field studies-but the trend presents a great opportunity for scientists. The Yahoo Flickr Creative Commons ... More
Towards Structured Analysis of Broadcast Badminton VideosDec 23 2017Sports video data is recorded for nearly every major tournament but remains archived and inaccessible to large scale data mining and analytics. It can only be viewed sequentially or manually tagged with higher-level labels which is time consuming and ... More
Probabilistic Semantic Retrieval for Surveillance Videos with Activity GraphsDec 17 2017Aug 22 2018We present a novel framework for finding complex activities matching user-described queries in cluttered surveillance videos. The wide diversity of queries coupled with unavailability of annotated activity data limits our ability to train activity models. ... More