total 5899took 0.12s

The AtLarge Vision on the Design of Distributed Systems and EcosystemsFeb 14 2019High-quality designs of distributed systems and services are essential for our digital economy and society. Threatening to slow down the stream of working designs, we identify the mounting pressure of scale and complexity of \mbox{(eco-)systems}, of ill-defined ... More

GPU Accelerated Keccak (SHA3) AlgorithmFeb 14 2019Hash functions like SHA-1 or MD5 are one of the most important cryptographic primitives, especially in the field of information integrity. Considering the fact that increasing methods have been proposed to break these hash algorithms, a competition for ... More

GPU Accelerated AES AlgorithmFeb 14 2019It has been widely accepted that Graphics Processing Units (GPU) is one of promising schemes for encryption acceleration, in particular, the support of complex mathematical calculations such as integer and logical operations makes the implementation easier; ... More

Two-Dimensional Batch Linear Programming on the GPUFeb 13 2019This paper presents a novel, high-performance, graphical processing unit-based algorithm for efficiently solving two-dimensional linear programs in batches. The domain of two-dimensional linear programs is particularly useful due to the prevalence of ... More

An Empirical Study of Blockchain-based Decentralized ApplicationsFeb 13 2019A decentralized application (dapp for short) refers to an application that is executed by multiple users over a decentralized network. In recent years, the number of dapp keeps fast growing, mainly due to the popularity of blockchain technology. Despite ... More

Arbitrary Pattern Formation by Asynchronous Opaque RobotsFeb 13 2019The Arbitrary Pattern Formation problem asks for a distributed algorithm that moves a set of autonomous mobile robots to form any arbitrary pattern given as input. The robots are assumed to be autonomous, anonymous and identical. They operate in Look-Compute-Move ... More

Local approximation of the Maximum Cut in regular graphsFeb 13 2019This paper is devoted to the distributed complexity of finding an approximation of the maximum cut in graphs. A classical algorithm consists in letting each vertex choose its side of the cut uniformly at random. This does not require any communication ... More

Petascale Cloud Supercomputing for Terapixel Visualization of a Digital TwinFeb 13 2019Background: Photo-realistic terapixel visualization is computationally intensive and to date there have been no such visualizations of urban digital twins, the few terapixel visualizations that exist have looked towards space rather than earth. Objective: ... More

Task-based Augmented Contour Trees with Fibonacci HeapsFeb 13 2019This paper presents a new algorithm for the fast, shared memory, multi-core computation of augmented contour trees on triangulations. In contrast to most existing parallel algorithms our technique computes augmented trees, enabling the full extent of ... More

Concurrent Computing with Shared Replicated MemoryFeb 13 2019The behavioural theory of concurrent systems states that any concurrent system can be captured by a behaviourally equivalent concurrent Abstract State Machine (cASM). While the theory in general assumes shared locations, it remains valid, if different ... More

Distributed Online Linear RegressionFeb 13 2019We study online linear regression problems in a distributed setting, where the data is spread over a network. In each round, each network node proposes a linear predictor, with the objective of fitting the \emph{network-wide} data. It then updates its ... More

Salus: Fine-Grained GPU Sharing Primitives for Deep Learning ApplicationsFeb 12 2019GPU computing is becoming increasingly more popular with the proliferation of deep learning (DL) applications. However, unlike traditional resources such as CPU or the network, modern GPUs do not natively support fine-grained sharing primitives. Consequently, ... More

Performance of All-Pairs Shortest-Paths Solvers with Apache SparkFeb 12 2019Algorithms for computing All-Pairs Shortest-Paths (APSP) are critical building blocks underlying many practical applications. The standard sequential algorithms, such as Floyd-Warshall and Johnson, quickly become infeasible for large input graphs, necessitating ... More

TensorSCONE: A Secure TensorFlow Framework using Intel SGXFeb 12 2019Machine learning has become a critical component of modern data-driven online services. Typically, the training phase of machine learning techniques requires to process large-scale datasets which may contain private and sensitive information of customers. ... More

Asymptotic Performance Analysis of Blockchain ProtocolsFeb 12 2019In the light of the recent fame of Blockchain technologies, numerous proposals and projects aiming at better practical viability have emerged. However, formally assessing their particularities and benefits has proven to be a difficult task. The aim of ... More

Distributed and Application-aware Task Scheduling in Edge-cloudsFeb 12 2019Edge computing is an emerging technology which places computing at the edge of the network to provide an ultra-low latency. Computation offloading, a paradigm that migrates computing from mobile devices to remote servers, can now use the power of edge ... More

Efficient Randomized Test-And-Set ImplementationsFeb 11 2019We study randomized test-and-set (TAS) implementations from registers in the asynchronous shared memory model with n processes. We introduce the problem of group election, a natural variant of leader election, and propose a framework for the implementation ... More

Energy-recycling Blockchain with Proof-of-Deep-LearningFeb 11 2019An enormous amount of energy is wasted in Proofof-Work (PoW) mechanisms adopted by popular blockchain applications (e.g., PoW-based cryptocurrencies), because miners must conduct a large amount of computation. Owing to this, one serious rising concern ... More

Mind the MiningFeb 11 2019In this paper we revisit the mining strategies in proof of work based cryptocurrencies and propose two strategies, we call smart and smarter mining, that in many cases strictly dominate honest mining. In contrast to other known attacks, like selfish mining, ... More

Mind the MiningFeb 11 2019Feb 12 2019In this paper we revisit the mining strategies in proof of work based cryptocurrencies and propose two strategies, we call smart and smarter mining, that in many cases strictly dominate honest mining. In contrast to other known attacks, like selfish mining, ... More

Verifiable Smart Contract PortabilityFeb 11 2019With the advent of blockchain technologies, the idea of decentralized applications has gained traction. Smart contracts permit the implementation of application logic to foster distributed systems that are capable of removing intermediaries. Hereby, lock ... More

A Distributed and Approximated Nearest Neighbors Algorithm for an Efficient Large Scale Mean Shift ClusteringFeb 11 2019In this paper we target the class of modal clustering methods where clusters are defined in terms of the local modes of the probability density function which generates the data. The most well-known modal clustering method is the k-means clustering. Mean ... More

Accelerating Partial Evaluation in Distributed SPARQL Query EvaluationFeb 11 2019Partial evaluation has recently been used for processing SPARQL queries over a large resource description framework (RDF) graph in a distributed environment. However, the previous approach is inefficient when dealing with complex queries. In this study, ... More

Cloud FuturologyFeb 10 2019The Cloud has become integral to most Internet-based applications and user gadgets. This article provides a brief history of the Cloud and presents a researcher's view of the prospects for innovating at the infrastructure, middleware, and application ... More

Semantically Enhanced Time Series Databases in IoT-Edge-Cloud InfrastructureFeb 10 2019Many IoT systems are data intensive and are for the purpose of monitoring for fault detection and diagnosis of critical systems. A large volume of data steadily come out of a large number of sensors in the monitoring system. Thus, we need to consider ... More

Multi-Dimensional Balanced Graph Partitioning via Projected Gradient DescentFeb 10 2019Motivated by performance optimization of large-scale graph processing systems that distribute the graph across multiple machines, we consider the balanced graph partitioning problem. Compared to the previous work, we study the multi-dimensional variant ... More

Performance Modeling of Microservice Platforms Considering the Dynamics of the underlying Cloud InfrastructureFeb 09 2019Microservice architecture has transformed the way developers are building and deploying applications in nowadays cloud computing centers. This new approach provides increased scalability, flexibility, manageability and performance while reduces the complexity ... More

PASTA: A Parallel Sparse Tensor Algorithm Benchmark SuiteFeb 08 2019Tensor methods have gained increasingly attention from various applications, including machine learning, quantum chemistry, healthcare analytics, social network analysis, data mining, and signal processing, to name a few. Sparse tensors and their algorithms ... More

Consistency models in distributed systems: A survey on definitions, disciplines, challenges and applicationsFeb 08 2019The replication mechanism resolves some challenges with big data such as data durability, data access, and fault tolerance. Yet, replication itself gives birth to another challenge known as the consistency in distributed systems. Scalability and availability ... More

Optimum Bi-level Hierarchical Clustering for Wireless Mobile Tracking SystemsFeb 08 2019A novel technique is proposed to optimize energy efficiency for wireless networks based on hierarchical mobile clustering. The new bi-level clustering technique minimizes mutual interference and energy consumption in large-scale tracking systems used ... More

Iterative Clustering for Energy-Efficient Large-Scale Tracking SystemsFeb 08 2019A new technique is presented to design energy-efficient large-scale tracking systems based on mobile clustering. The new technique optimizes the formation of mobile clusters to minimize energy consumption in large-scale tracking systems. This technique ... More

A Framework for Autonomous Robot Deployment with Perfect Demand Satisfaction using Virtual ForcesFeb 08 2019In many applications, robots autonomous deployment is preferable and sometimes it is the only affordable solution. To address this issue, virtual force (VF) is one of the prominent approaches to performing multirobot deployment autonomously. However, ... More

Random Gossip Processes in Smartphone Peer-to-Peer NetworksFeb 07 2019In this paper, we study random gossip processes in communication models that describe the peer-to-peer networking functionality included in standard smartphone operating systems. Random gossip processes spread information through the basic mechanism of ... More

PAI Data, Summary of the Project PAI Data ProtocolFeb 07 2019The Project PAI Data Protocol ("PAI Data") is a specification that extends the Project PAI Blockchain Protocol to include a method of securing and provisioning access to arbitrary data. In the context of PAI Coin Development Proposal (PDP) 2, this paper ... More

Prospective Hybrid Consensus for Project PAIFeb 07 2019PAI Coin's Proof-of-Work (PoW) consensus mechanism utilizes the double SHA-256 hashing protocol-- the same mechanism used by Bitcoin Core. This compatibility with classic Bitcoin-style mining provides low barrier to entry for PAI Coin mining, consequently ... More

Storm: a fast transactional dataplane for remote data structuresFeb 06 2019RDMA is an exciting technology that enables a host to access the memory of a remote host without involving the remote CPU. Prior work shows how to use RDMA to improve the performance of distributed in-memory storage systems. However, RDMA is widely believed ... More

Exploration of Performance and Energy Trade-offs for Heterogeneous Multicore ArchitecturesFeb 06 2019Energy-efficiency has become a major challenge in modern computer systems. To address this challenge, candidate systems increasingly integrate heterogeneous cores in order to satisfy diverse computation requirements by selecting cores with suitable features. ... More

Blockchain Storage Load Balancing Among DHT Clustered NodesFeb 06 2019In Bitcoin, to independently verify whether new transactions are correct or not, a type of a node called "Full Node" has to hold the whole of historical transactions. The transactions are stored in ledger called "Blockchain. " Blockchain is an append-only ... More

Fast Strassen-based $A^t A$ Parallel MultiplicationFeb 06 2019Matrix multiplication $A^t A$ appears as intermediate operation during the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm for the $A^t A$ multiplication. Our algorithm, A$\scriptstyle \mathsf{T}$A, calls ... More

Scheduling and Trade-off Analysis for Multi-Source Multi-Processor Systems with Divisible LoadsFeb 06 2019The main goal of parallel processing is to provide users with performance that is much better than that of single processor systems. The execution of jobs is scheduled, which requires certain resources in order to meet certain criteria. Divisible load ... More

CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed LearningFeb 06 2019We focus on the commonly used synchronous Gradient Descent paradigm for large-scale distributed learning, for which there has been a growing interest to develop efficient and robust gradient aggregation strategies that overcome two key bottlenecks: communication ... More

Reinforcement Learning for Optimal Load Distribution Sequencing in Resource-Sharing SystemFeb 05 2019Divisible Load Theory (DLT) is a powerful tool for modeling divisible load problems in data-intensive systems. This paper studied an optimal divisible load distribution sequencing problem using a machine learning framework. The problem is to decide the ... More

Optimal Divisible Load Scheduling for Resource-Sharing NetworkFeb 05 2019Scheduling is an important task allowing parallel systems to perform efficiently and reliably. For modern computation systems, divisible load is a special type of data which can be divided into arbitrary sizes and independently processed in parallel. ... More

A Generalized Framework for Population Based TrainingFeb 05 2019Population Based Training (PBT) is a recent approach that jointly optimizes neural network weights and hyperparameters which periodically copies weights of the best performers and mutates hyperparameters during training. Previous PBT implementations have ... More

Etude de la Distribution de Calculs Creux sur une Grappe Multi-coeursFeb 05 2019Nowadays, high performance computing is becoming more and more important in different fields research and industry, such as medical imaging and diagnostics, mathematics as well as oil exploration. It refers to intensive computing in some applications ... More

Fatal Brain DamageFeb 05 2019The loss of a few neurons in a brain often does not result in a visible loss of function. We propose to advance the understanding of neural networks through their remarkable ability to sustain individual neuron failures, i.e. their fault tolerance. Before ... More

Expressive Power of Oblivious Consensus ProtocolsFeb 05 2019Population protocols are a formal model of computation by identical, anonymous mobile agents interacting in pairs. It has been shown that their computational power is rather limited: They can only compute the predicates expressible in Presburger arithmetic. ... More

A Framework for Allocating Server Time to Spot and On-demand Services in Cloud ComputingFeb 04 2019Cloud computing delivers value to users by facilitating their access to computing capacity in periods when their need arises. An approach is to provide both on-demand and spot services on shared servers. The former allows users to access servers on demand ... More

Towards Federated Learning at Scale: System DesignFeb 04 2019Federated Learning is a distributed machine learning approach which enables model training on a large corpus of decentralized data. We have built a scalable production system for Federated Learning in the domain of mobile devices, based on TensorFlow. ... More

A Billion Updates per Second Using 30,000 Hierarchical In-Memory D4M DatabasesFeb 03 2019Analyzing large scale networks requires high performance streaming updates of graph representations of these data. Associative arrays are mathematical objects combining properties of spreadsheets, databases, matrices, and graphs, and are well-suited for ... More

Phoenix: An Epidemic Approach to Time ReconstructionFeb 02 2019Harsh deployment environments and uncertain run-time conditions create numerous challenges for postmortem time reconstruction methods. For example, motes often reboot and thus lose their clock state, considering that the majority of mote platforms lack ... More

Learning-based Dynamic Cache Management in a CloudFeb 02 2019Caches are an important component of modern computing systems given their significant impact on performance. In particular, caches play a key role in the cloud due to the nature of large-scale, data-intensive processing. One of the key challenges for ... More

Clubmark: a Parallel Isolation Framework for Benchmarking and Profiling Clustering Algorithms on NUMA ArchitecturesFeb 01 2019There is a great diversity of clustering and community detection algorithms, which are key components of many data analysis and exploration systems. To the best of our knowledge, however, there does not exist yet any uniform benchmarking framework, which ... More

Decentralized Stochastic Optimization and Gossip Algorithms with Compressed CommunicationFeb 01 2019We consider decentralized stochastic optimization with the objective function (e.g. data samples for machine learning task) being distributed over $n$ machines that can only communicate to their neighbors on a fixed communication graph. To reduce the ... More

Tight bounds on the convergence rate of generalized ratio consensus algorithmsJan 31 2019The problems discussed in this paper are motivated by the ratio consensus problem formulated and solved in the context of the push-sum algorithm proposed in Kempe et al. (2003), and extended in B\'en\'ezit et al. (2010) under the name weighted gossip ... More

Formal methods and software engineering for DL. Security, safety and productivity for DL systems developmentJan 31 2019Deep Learning (DL) techniques are now widespread and being integrated into many important systems. Their classification and recognition abilities ensure their relevance for multiple application domains. As machine-learning that relies on training instead ... More

Distributed Correlation-Based Feature Selection in SparkJan 31 2019CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the ... More

Design and Evaluation of Smart-Contract-based System Operations for Permissioned Blockchain-based SystemsJan 31 2019Recently, enterprises have paid attention to permissioned blockchain (BC), where business transactions among inter-authorized organizations (forming a consortium) can automatically be executed on the basis of a distributed consensus protocol, and applications ... More

Accuracy vs. Efficiency: Achieving Both through FPGA-Implementation Aware Neural Architecture SearchJan 31 2019A fundamental question lies in almost every application of deep neural networks: what is the optimal neural architecture given a specific dataset? Recently, several Neural Architecture Search (NAS) frameworks have been developed that use reinforcement ... More

High Performance Algorithms for Counting Collisions and Pairwise InteractionsJan 31 2019The problem of counting collisions or interactions is common in areas as computer graphics and scientific simulations. Since it is a major bottleneck in applications of these areas, a lot of research has been done on such subject, mainly focused on techniques ... More

MediChainTM: A Secure Decentralized Medical Data Asset Management SystemJan 30 2019The set of distributed ledger architectures known as blockchain is best known for cryptocurrency applications such as Bitcoin and Ethereum. These permissionless block chains are showing the potential to be disruptive to the financial services industry. ... More

A Pseudo-Deterministic RNC Algorithm for General Graph Perfect MatchingJan 29 2019Jan 31 2019We give an NC reduction from search to decision for the problem of finding a minimum weight perfect matching, provided edge weights are polynomially bounded. As a consequence, for settling the long-standing open problem of obtaining an NC perfect matching ... More

A Modular Benchmarking Infrastructure for High-Performance and Reproducible Deep LearningJan 29 2019We introduce Deep500: the first customizable benchmarking infrastructure that enables fair comparison of the plethora of deep learning frameworks, algorithms, libraries, and techniques. The key idea behind Deep500 is its modular design, where deep learning ... More

A Parallel Projection Method for Metric Constrained OptimizationJan 29 2019Many clustering applications in machine learning and data mining rely on solving metric-constrained optimization problems. These problems are characterized by $O(n^3)$ constraints that enforce triangle inequalities on distance variables associated with ... More

The OoO VLIW JIT Compiler for GPU InferenceJan 28 2019Jan 31 2019Current trends in Machine Learning~(ML) inference on hardware accelerated devices (e.g., GPUs, TPUs) point to alarmingly low utilization. As ML inference is increasingly time-bounded by tight latency SLOs, increasing data parallelism is not an option. ... More

Asynchronous Accelerated Proximal Stochastic Gradient for Strongly Convex Distributed Finite SumsJan 28 2019In this work, we study the problem of minimizing the sum of strongly convex functions split over a network of $n$ nodes. We propose the decentralized and asynchronous algorithm ADFS to tackle the case when local functions are themselves finite sums with ... More

A Privacy Preserving Randomized Gossip Algorithm via Controlled Noise InsertionJan 27 2019In this work we present a randomized gossip algorithm for solving the average consensus problem while at the same time protecting the information about the initial private values stored at the nodes. We give iteration complexity bounds for the method ... More

Heterogeneity-aware Gradient Coding for Straggler ToleranceJan 27 2019Gradient descent algorithms are widely used in machine learning. In order to deal with huge volume of data, we consider the implementation of gradient descent algorithms in a distributed computing setting where multiple workers compute the gradient over ... More

Distributed Convolutional Dictionary Learning (DiCoDiLe): Pattern Discovery in Large Images and SignalsJan 26 2019Convolutional dictionary learning (CDL) estimates shift invariant basis adapted to multidimensional data. CDL has proven useful for image denoising or inpainting, as well as for pattern discovery on multivariate signals. As estimated patterns can be positioned ... More

Distributed Matrix-Vector Multiplication: A Convolutional Coding ApproachJan 25 2019Distributed computing systems are well-known to suffer from the problem of slow or failed nodes; these are referred to as stragglers. Straggler mitigation (for distributed matrix computations) has recently been investigated from the standpoint of erasure ... More

Ambitious Data Science Can Be PainlessJan 25 2019Modern data science research can involve massive computational experimentation; an ambitious PhD in computational fields may do experiments consuming several million CPU hours. Traditional computing practices, in which researchers use laptops or shared ... More

HRDBMS: Combining the Best of Modern and Traditional Relational DatabasesJan 24 2019HRDBMS is a novel distributed relational database that uses a hybrid model combining the best of traditional distributed relational databases and Big Data analytics platforms such as Hive. This allows HRDBMS to leverage years worth of research regarding ... More

Communication-Efficient and Decentralized Multi-Task Boosting while Learning the Collaboration GraphJan 24 2019Jan 25 2019We study the decentralized machine learning scenario where many users collaborate to learn personalized models based on (i) their local datasets and (ii) a similarity graph over the users' learning tasks. Our approach trains nonlinear classifiers in a ... More

Asynchronous Decentralized Optimization in Directed NetworksJan 24 2019A popular asynchronous protocol for decentralized optimization is randomized gossip where a pair of neighbors concurrently update via pairwise averaging. In practice, this creates deadlocks and is vulnerable to information delays. It can also be problematic ... More

Fundamental Limits of Approximate Gradient CodingJan 23 2019It has been established that when the gradient coding problem is distributed among $n$ servers, the computation load (number of stored data partitions) of each worker is at least $s+1$ in order to resists $s$ stragglers. This scheme incurs a large overhead ... More

Elastic Multi-resource Network Slicing: Can Protection Lead to Improved Performance?Jan 22 2019In order to meet the performance/privacy requirements of future data-intensive mobile applications, e.g., self-driving cars, mobile data analytics, and AR/VR, service providers are expected to draw on shared storage/computation/connectivity resources ... More

Adapting The Secretary Hiring Problem for Optimal Hot-Cold Tier Placement under Top-$K$ WorkloadsJan 22 2019Top-K queries are an established heuristic in information retrieval. This paper presents an approach for optimal tiered storage allocation under stream processing workloads using this heuristic: those requiring the analysis of only the top-$K$ ranked ... More

SVE-enabling Lattice QCD CodesJan 22 2019Optimization of applications for supercomputers of the highest performance class requires parallelization at multiple levels using different techniques. In this contribution we focus on parallelization of particle physics simulations through vector instructions. ... More

A Time-driven Data Placement Strategy for a Scientific Workflow Combining Edge Computing and Cloud ComputingJan 22 2019Jan 24 2019Compared to traditional distributed computing environments such as grids, cloud computing provides a more cost-effective way to deploy scientific workflows. Each task of a scientific workflow requires several large datasets that are located in different ... More

Using Big Data Technologies for HEP AnalysisJan 22 2019The HEP community is approaching an era were the excellent performances of the particle accelerators in delivering collision at high rate will force the experiments to record a large amount of information. The growing size of the datasets could potentially ... More

The Physics of Eccentric Binary Black Hole Mergers. A Numerical Relativity PerspectiveJan 21 2019Gravitational wave observations of eccentric binary black hole mergers will provide unequivocal evidence for the formation of these systems through dynamical assembly in dense stellar environments. The study of these astrophysically motivated sources ... More

Distributed Nesterov gradient methods over arbitrary graphsJan 21 2019In this letter, we introduce a distributed Nesterov method, termed as $\mathcal{ABN}$, that does not require doubly-stochastic weight matrices. Instead, the implementation is based on a simultaneous application of both row- and column-stochastic weights ... More

Turning Privacy Constraints into Syslog Analysis AdvantageJan 21 2019The mean time between failures (MTBF) of HPC systems is rapidly reducing, and that current failure recovery mechanisms e.g., checkpoint-restart, will no longer be able to recover the systems from failures. Early failure detection is a new class of failure ... More

No DNN Left Behind: Improving Inference in the Cloud with Multi-TenancyJan 21 2019Jan 23 2019With the rise of machine learning, inference on deep neural networks (DNNs) has become a core building block on the critical path for many cloud applications. Applications today rely on isolated ad-hoc deployments that force users to compromise on consistent ... More

On the Radius of Nonsplit Graphs and Information Dissemination in Dynamic NetworksJan 21 2019A nonsplit graph is a directed graph where each pair of nodes has a common incoming neighbor. We show that the radius of such graphs is in $O(\log \log n)$, where $n$ is the number of nodes. We then generalize the result to products of nonsplit graphs. ... More

Fitting ReLUs via SGD and Quantized SGDJan 19 2019In this paper we focus on the problem of finding the optimal weights of the shallowest of neural networks consisting of a single Rectified Linear Unit (ReLU). These functions are of the form $\mathbf{x}\rightarrow \max(0,\langle\mathbf{w},\mathbf{x}\rangle)$ ... More

On the Necessary Memory to Compute the Plurality in Multi-Agent SystemsJan 19 2019We consider the Relative-Majority Problem (also known as Plurality), in which, given a multi-agent system where each agent is initially provided an input value out of a set of $k$ possible ones, each agent is required to eventually compute the input value ... More

An Optimal Vector Clock Algorithm for Multithreaded SystemsJan 19 2019Tracking causality (or happened-before relation) between events is useful for many applications such as debugging and recovery from failures. Consider a concurrent system with $n$ threads and $m$ objects. For such systems, either a vector clock of size ... More

Tunable Approximations to Control Time-to-Solution in an HPC Molecular Docking Mini-AppJan 18 2019The drug discovery process involves several tasks to be performed in vivo, in vitro and in silico. Molecular docking is a task typically performed in silico. It aims at finding the three-dimensional pose of a given molecule when it interacts with the ... More

Cloud Resource Optimization for Processing Multiple Streams of Visual DataJan 18 2019Hundreds of millions of network cameras have been installed throughout the world. Each is capable of providing a vast amount of real-time data. Analyzing the massive data generated by these cameras requires significant computational resources and the ... More

Are Smart Contracts and Blockchains Suitable for Decentralized Railway Control?Jan 18 2019Conventional railway operations employ specialized software and hardware to ensure safe and secure train operations. Track occupation and signaling are governed by central control offices, while trains (and their drivers) receive instructions. To make ... More

Exploiting OpenMP & OpenACC to Accelerate a Molecular Docking Mini-App in Heterogeneous HPC NodesJan 18 2019In drug discovery, molecular docking is the task in charge of estimating the position of a molecule when interacting with the docking site. This task is usually used to perform screening of a large library of molecules, in the early phase of the process. ... More

An Efficient Monte Carlo-based Probabilistic Time-Dependent Routing Calculation Targeting a Server-Side Car Navigation SystemJan 18 2019Incorporating speed probability distribution to the computation of the route planning in car navigation systems guarantees more accurate and precise responses. In this paper, we propose a novel approach for dynamically selecting the number of samples ... More

High-Performance Ultrasonic Levitation with FPGA-based Phased ArraysJan 18 2019Jan 24 2019We present a flexible and self-contained platform for acoustic levitation research based on the Xilinx Zynq SoC using an array of ultrasonic emitters. The platform employs an inexpensive ZedBoard and provides fast movement of the levitated objects as ... More

The ANTAREX Domain Specific Language for High Performance ComputingJan 18 2019The ANTAREX project relies on a Domain Specific Language (DSL) based on Aspect Oriented Programming (AOP) concepts to allow applications to enforce extra functional properties such as energy-efficiency and performance and to optimize Quality of Service ... More

Accelerated Training for CNN Distributed Deep Learning through Automatic Resource-Aware Layer PlacementJan 17 2019The Convolutional Neural Network (CNN) model, often used for image classification, requires significant training time to obtain high accuracy. To this end, distributed training is performed with the parameter server (PS) architecture using multiple servers. ... More

How to Place Your Apps in the Fog - State of the Art and Open ChallengesJan 17 2019Fog computing aims at extending the Cloud towards the IoT so to achieve improved QoS and to empower latency-sensitive and bandwidth-hungry applications. The Fog calls for novel models and algorithms to distribute multi-component applications in such a ... More

Trends in Demand, Growth, and Breadth in Scientific Computing Training Delivered by a High-Performance Computing CenterJan 16 2019We analyze the changes in the training and educational efforts of the SciNet HPC Consortium, a Canadian academic High Performance Computing center, in the areas of Scientific Computing and High-Performance Computing, over the last six years. Initially, ... More

AI Pipeline - bringing AI to you. End-to-end integration of data, algorithms and deployment toolsJan 15 2019Next generation of embedded Information and Communication Technology (ICT) systems are interconnected collaborative intelligent systems able to perform autonomous tasks. Training and deployment of such systems on Edge devices however require a fine-grained ... More

Self-Stabilization Through the Lens of Game TheoryJan 15 2019In 1974 E.W. Dijkstra introduced the seminal concept of self-stabilization that turned out to be one of the main approaches to fault-tolerant computing. We show here how his three solutions can be formalized and reasoned about using the concepts of game ... More