Latest in cs.pf

total 1691took 0.15s
A JIT Compiler for Neural Network InferenceJun 13 2019This paper describes a C++ library that compiles neural network models at runtime into machine code that performs inference. This approach in general promises to achieve the best performance possible since it is able to integrate statically known properties ... More
Optimizing Redundancy Levels in Master-Worker Compute Clusters for Straggler MitigationJun 12 2019Runtime variability in computing systems causes some tasks to straggle and take much longer than expected to complete. These straggler tasks are known to significantly slowdown distributed computation. Job execution with speculative execution of redundant ... More
Markovian model for Broadcast in Wireless Body Area NetworksJun 12 2019Wireless body area networks became recently a vast field of investigation. A large amount of research in this field is dedicated to the evaluation of various communication protocols, e.g., broadcast or convergecast, against human body mobility. Most of ... More
Power Gradient DescentJun 11 2019The development of machine learning is promoting the search for fast and stable minimization algorithms. To this end, we suggest a change in the current gradient descent methods that should speed up the motion in flat regions and slow it down in steep ... More
ROOT I/O compression algorithms and their performance impact within Run 3Jun 11 2019The LHCs Run3 will push the envelope on data-intensive workflows and, since at the lowest level this data is managed using the ROOT software framework, preparations for managing this data are starting already. At the beginning of LHC Run 1, all ROOT data ... More
Performance Analysis and Characterization of Training Deep Learning Models on NVIDIA TX2Jun 10 2019Training deep learning models on mobile devices recently becomes possible, because of increasing computation power on mobile hardware and the advantages of enabling high user experiences. Most of the existing work on machine learning at mobile devices ... More
Analysis of parallel I/O use on the UK national supercomputing service, ARCHER using Cray LASSi and EPCC SAFEJun 10 2019In this paper, we describe how we have used a combination of the LASSi tool (developed by Cray) and the SAFE software (developed by EPCC) to collect and analyse Lustre I/O performance data for all jobs running on the UK national supercomputing service, ... More
LASSi: Metric based I/O analytics for HPCJun 10 2019LASSi is a tool aimed at analyzing application usage and contention caused by use of shared resources (filesystem or network) in a HPC system. LASSi was initially developed to support the ARCHER system where there are large variations in application requirements ... More
Architectural Middleware that Supports Building High-performance, Scalable, Ubiquitous, Intelligent Personal AssistantsJun 05 2019Intelligent Personal Assistants (IPAs) are software agents that can perform tasks on behalf of individuals and assist them on many of their daily activities. IPAs capabilities are expanding rapidly due to the recent advances on areas such as natural language ... More
Adroitness: An Android-based Middleware for Fast Development of High-performance AppsJun 05 2019As smartphones become increasingly more powerful, a new generation of highly interactive user-centric mobile apps emerge to make user's life simpler and more productive. Mobile phones applications have to sustain limited resource availability on mobile ... More
pCAMP: Performance Comparison of Machine Learning Packages on the EdgesJun 05 2019Machine learning has changed the computing paradigm. Products today are built with machine intelligence as a central attribute, and consumers are beginning to expect near-human interaction with the appliances they use. However, much of the deep learning ... More
pCAMP: Performance Comparison of Machine Learning Packages on the EdgesJun 05 2019Jun 06 2019Machine learning has changed the computing paradigm. Products today are built with machine intelligence as a central attribute, and consumers are beginning to expect near-human interaction with the appliances they use. However, much of the deep learning ... More
Performance Modelling of Deep Learning on Intel Many Integrated Core ArchitecturesJun 04 2019Many complex problems, such as natural language processing or visual object detection, are solved using deep learning. However, efficient training of complex deep convolutional neural networks for large data sets is computationally demanding and requires ... More
Assessing Performance Implications of Deep Copy Operations via MicrobenchmarkingJun 03 2019Jun 11 2019As scientific frameworks become sophisticated, so do their data structures. Current data structures are no longer simple in design and they have been progressively complicated. The typical trend in designing data structures in scientific applications ... More
Assessing Performance Implications of Deep Copy Operations via MicrobenchmarkingJun 03 2019As scientific frameworks become sophisticated, so do their data structures. Current data structures are no longer simple in design and they have been progressively complicated. The typical trend in designing data structures in scientific applications ... More
Robust stability of moving horizon estimation for nonlinear systems with bounded disturbances using adaptive arrival costJun 03 2019In this paper, the robust stability and convergence to the true state of moving horizon estimator based on an adaptive arrival cost are established for nonlinear detectable systems. Robust global asymptotic stability is shown for the case of non-vanishing ... More
A Technique for Finding Optimal Program Launch Parameters Targeting Manycore AcceleratorsJun 01 2019In this paper, we present a new technique to dynamically determine the values of program parameters in order to optimize the performance of a multithreaded program P. To be precise, we describe a novel technique to statically build another program, say, ... More
Fast Online "Next Best Offers" using Deep LearningMay 31 2019In this paper, we present iPrescribe, a scalable low-latency architecture for recommending 'next-best-offers' in an online setting. The paper presents the design of iPrescribe and compares its performance for implementations using different real-time ... More
Using Metrics Suites to Improve the Measurement of Privacy in GraphsMay 30 2019Social graphs are widely used in research (e.g., epidemiology) and business (e.g., recommender systems). However, sharing these graphs poses privacy risks because they contain sensitive information about individuals. Graph anonymization techniques aim ... More
Visualizing a Moving Target: A Design Study on Task Parallel Programs in the Presence of Evolving Data and ConcernsMay 30 2019Common pitfalls in visualization projects include lack of data availability and the domain users' needs and focus changing too rapidly for the design process to complete. While it is often prudent to avoid such projects, we argue it can be beneficial ... More
Designing and Implementing Data Warehouse for Agricultural Big DataMay 29 2019In recent years, precision agriculture that uses modern information and communication technologies is becoming very popular. Raw and semi-processed agricultural data are usually collected through various sources, such as: Internet of Thing (IoT), sensors, ... More
Categorization of Program Regions for Agile Compilation using Machine Learning and Hardware SupportMay 29 2019A compiler processes the code written in a high level language and produces machine executable code. The compiler writers often face the challenge of keeping the compilation times reasonable. That is because aggressive optimization passes which potentially ... More
Function-as-a-Service Benchmarking FrameworkMay 28 2019Cloud Service Providers deliver their products in form of 'as-a-Service', which are typically categorized by the level of abstraction. This approach hides the implementation details and shows only functionality to the user. However, the problem is that ... More
The Impact of GPU DVFS on the Energy and Performance of Deep Learning: an Empirical StudyMay 27 2019Over the past years, great progress has been made in improving the computing power of general-purpose graphics processing units (GPGPUs), which facilitates the prosperity of deep neural networks (DNNs) in multiple fields like computer vision and natural ... More
Scaling Video Analytics on Constrained Edge NodesMay 24 2019As video camera deployments continue to grow, the need to process large volumes of real-time data strains wide area network infrastructure. When per-camera bandwidth is limited, it is infeasible for applications such as traffic monitoring and pedestrian ... More
Semi-Quantitative Abstraction and Analysis of Chemical Reaction NetworksMay 23 2019Analysis of large continuous-time stochastic systems is a computationally intensive task. In this work we focus on population models arising from chemical reaction networks (CRNs), which play a fundamental role in analysis and design of biochemical systems. ... More
The Supermarket Model with Known and Predicted Service TimesMay 23 2019The supermarket model typically refers to a system with a large number of queues, where arriving customers choose $d$ queues at random and join the queue with fewest customers. The supermarket model demonstrates the power of even small amounts of choice, ... More
In-DRAM Bulk Bitwise Execution EngineMay 23 2019Jun 04 2019Many applications heavily use bitwise operations on large bitvectors as part of their computation. In existing systems, performing such bulk bitwise operations requires the processor to transfer a large amount of data on the memory channel, thereby consuming ... More
In-DRAM Bulk Bitwise Execution EngineMay 23 2019Many applications heavily use bitwise operations on large bitvectors as part of their computation. In existing systems, performing such bulk bitwise operations requires the processor to transfer a large amount of data on the memory channel, thereby consuming ... More
Online Collection and Forecasting of Resource Utilization in Large-Scale Distributed SystemsMay 22 2019Large-scale distributed computing systems often contain thousands of distributed nodes (machines). Monitoring the conditions of these nodes is important for system management purposes, which, however, can be extremely resource demanding as this requires ... More
NTP : A Neural Network Topology ProfilerMay 22 2019Performance of end-to-end neural networks on a given hardware platform is a function of its compute and memory signature, which in-turn, is governed by a wide range of parameters such as topology size, primitives used, framework used, batching strategy, ... More
NTP : A Neural Network Topology ProfilerMay 22 2019May 25 2019Performance of end-to-end neural networks on a given hardware platform is a function of its compute and memory signature, which in-turn, is governed by a wide range of parameters such as topology size, primitives used, framework used, batching strategy, ... More
Instructions' Latencies Characterization for NVIDIA GPGPUsMay 21 2019The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Nowadays, Graphics Processing Units (GPUs) are in a variety of systems from supercomputers to mobile phones and tablets. They are not ... More
Performance Analysis of Deep Learning Workloads on Leading-edge SystemsMay 21 2019This work examines the performance of leading-edge systems designed for machine learning computing, including the NVIDIA DGX-2, Amazon Web Services (AWS) P3, IBM Power System Accelerated Compute Server AC922, and a consumer-grade Exxact TensorEX TS4 GPU ... More
The Stabilized Explicit Variable-Load Solver with Machine Learning Acceleration for the Rapid Solution of Stiff Chemical KineticsMay 21 2019May 24 2019Numerical solutions to differential equations are at the core of computational fluid dynamics calculations. As the size and complexity of the simulations grow, so does the need for computational power and time. Solving the equations in parallel can dramatically ... More
Evaluation of Docker Containers for Scientific Workloads in the CloudMay 21 2019The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environments. Among the various technologies, containers have recently gained importance as they have significantly better performance ... More
Exploring the Fairness and Resource Distribution in an Apache Mesos EnvironmentMay 21 2019Apache Mesos, a cluster-wide resource manager, is widely deployed in massive scale at several Clouds and Data Centers. Mesos aims to provide high cluster utilization via fine grained resource co-scheduling and resource fairness among multiple users through ... More
Tromino: Demand and DRF Aware Multi-Tenant Queue Manager for Apache Mesos ClusterMay 21 2019Apache Mesos, a two-level resource scheduler, provides resource sharing across multiple users in a multi-tenant cluster environment. Computational resources (i.e., CPU, memory, disk, etc. ) are distributed according to the Dominant Resource Fairness (DRF) ... More
Scylla: A Mesos Framework for Container Based MPI JobsMay 20 2019Open source cloud technologies provide a wide range of support for creating customized compute node clusters to schedule tasks and managing resources. In cloud infrastructures such as Jetstream and Chameleon, which are used for scientific research, users ... More
Exploiting Parallelism on Shared Memory in the QED Particle-in-Cell Code PICADOR with Greedy Load BalancingMay 20 2019State-of-the-art numerical simulations of laser plasma by means of the Particle-in-Cell method are often extremely computationally intensive. Therefore there is a growing need for development of approaches for efficient utilization of resources of modern ... More
Online Research Report: rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent TasksMay 20 2019Scientific applications often contain large and computationally intensive parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a balanced load execution of such applications on high performance computing (HPC) systems. Large HPC systems ... More
Online Research Report: rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent TasksMay 20 2019May 22 2019Scientific applications often contain large and computationally intensive parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a balanced load execution of such applications on high performance computing (HPC) systems. Large HPC systems ... More
A caching system with object sharingMay 18 2019We consider a public content caching system that is shared by a number of proxies. The cache could be located in an edge-cloud datacenter and the the proxies could each serve a large population of mobile end-users. The proxies operate their own LRU-list ... More
EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity DevicesMay 17 2019In recent years, advances in deep learning have resulted in unprecedented leaps in diverse tasks spanning from speech and object recognition to context awareness and health monitoring. As a result, an increasing number of AI-enabled applications are being ... More
Blockchain Goes Green? An Analysis of Blockchain on Low-Power NodesMay 16 2019Motivated by the massive energy usage of blockchain, on the one hand, and by significant performance improvements in low-power, wimpy systems, on the other hand, we perform an in-depth time-energy analysis of blockchain systems on low-power nodes in comparison ... More
Performance Analysis of SPAD-based OFDMMay 15 2019In this paper, an analytical approach for the nonlinear distorted bit error rate performance of optical orthogonal frequency division multiplexing (O-OFDM) with single photon avalanche diode (SPAD) receivers is presented. Major distortion effects of passive ... More
Significance of parallel computing on the performance of Digital Image Correlation algorithms in MATLABMay 15 2019Digital Image Correlation (DIC) is a powerful tool used to evaluate displacements and deformations in a non-intrusive manner. By comparing two images, one of the undeformed reference state of a specimen and another of the deformed target state, the relative ... More
Performance Analysis of Non-DC-Biased OFDMMay 14 2019The performance analysis of a novel optical modulation scheme is presented in this paper. The basic concept is to transmit signs of modulated optical orthogonal frequency division multiplexing (O-OFDM) symbols and absolute values of the symbols separately ... More
Coded Distributed TrackingMay 14 2019We consider the problem of tracking the state of a process that evolves over time in a distributed setting, with multiple observers each observing parts of the state, which is a fundamental information processing problem with a wide range of applications. ... More
Optimizing the Linear Fascicle Evaluation Algorithm for Multi-Core and Many-Core SystemsMay 14 2019Sparse matrix-vector multiplication (SpMV) operations are commonly used in various scientific applications. The performance of the SpMV operation often depends on exploiting regularity patterns in the matrix. Various representations have been proposed ... More
QPS-r: A Cost-Effective Crossbar Scheduling Algorithm and Its Stability and Delay AnalysisMay 14 2019Parallel iterative maximal matching algorithms (adapted for switching) has long been recognized as a cost-effective family for crossbar scheduling. On one hand, they provide the following Quality of Service (QoS) guarantees: Using maximal matchings as ... More
K-Athena: a performance portable structured grid finite volume magnetohydrodynamics codeMay 10 2019Large scale simulations are a key pillar of modern research and require ever increasing computational resources. Different novel manycore architectures have emerged in recent years on the way towards the exascale era. Performance portability is required ... More
Inferring Catchment in Internet RoutingMay 10 2019BGP is the de-facto Internet routing protocol for exchanging prefix reachability information between Autonomous Systems (AS). It is a dynamic, distributed, path-vector protocol that enables rich expressions of network policies (typically treated as secrets). ... More
On the Distribution of AoI for the GI/GI/1/1 and GI/GI/1/2* Systems: Exact Expressions and BoundsMay 10 2019Since Age of Information (AoI) has been proposed as a metric that quantifies the freshness of information updates in a communication system, there has been a constant effort in understanding and optimizing different statistics of the AoI process for classical ... More
Multiplicação de matrizes: uma comparação entre as abordagens sequencial (CPU) e paralela (GPU)May 09 2019Designing problems using matrices is very important in Computer Science. Fields like graph computer, graphs theory, and machine learning use matrices very often to solve their own problems. The most often matrix operation is the multiplication. It may ... More
Enhanced Performance and Privacy for TLS over TCP Fast OpenMay 09 2019Small TCP flows make up the majority of web flows. For them, the TCP three-way handshake represents a significant delay overhead. The TCP Fast Open (TFO) protocol provides zero round-trip time (0-RTT) handshakes for subsequent TCP connections to the same ... More
Load Balancing Guardrails: Keeping Your Heavy Traffic on the Road to Low Response TimesMay 09 2019Load balancing systems, comprising a central dispatcher and a scheduling policy at each server, are widely used in practice, and their response time has been extensively studied in the theoretical literature. While much is known about the scenario where ... More
Performance Engineering for a Tall & Skinny Matrix Multiplication Kernel on GPUsMay 08 2019General matrix-matrix multiplications (GEMM) in vendor-supplied BLAS libraries are best optimized for square matrices but often show bad performance for tall & skinny matrices, which are much taller than wide. Nvidia's current CUBLAS implementation delivers ... More
Efficient Similarity-aware Compression to Reduce Bit-writes in Non-Volatile Main Memory for Image-based ApplicationsMay 07 2019Image bitmaps have been widely used in in-memory applications, which consume lots of storage space and energy. Compared with legacy DRAM, non-volatile memories (NVMs) are suitable for bitmap storage due to the salient features in capacity and power savings. ... More
Internet Speed Measurement: Current Challenges and Future RecommendationsMay 07 2019Government organizations, regulators, consumers, Internet service providers, and application providers alike all have an interest in measuring user Internet "speed". A decade ago, speed measurement was more straightforward. Today, as access speeds have ... More
Selection Combining Scheme over Non-identically Distributed Fisher-Snedecor $\mathcal{F}$ Fading ChannelsMay 06 2019In this paper, the performance of the selection combining (SC) scheme over Fisher-Snedecor $\mathcal{F}$ fading channels with independent and non-identically distributed (i.n.i.d.) branches is analysed. The probability density function (PDF) and the moment ... More
Evolutionary Optimisation of Real-Time Systems and NetworksMay 06 2019May 20 2019The design space of networked embedded systems is very large, posing challenges to the optimisation of such platforms when it comes to support applications with real-time guarantees. Recent research has shown that a number of inter-related optimisation ... More
Evolutionary Optimisation of Real-Time Systems and NetworksMay 06 2019The design space of networked embedded systems is very large, posing challenges to the optimisation of such platforms when it comes to support applications with real-time guarantees. Recent research has shown that a number of inter-related optimisation ... More
Machine Learning Based Routing Congestion Prediction in FPGA High-Level SynthesisMay 06 2019High-level synthesis (HLS) shortens the development time of hardware designs and enables faster design space exploration at a higher abstraction level. Optimization of complex applications in HLS is challenging due to the effects of implementation issues ... More
An Improved Accurate Solver for the Time-Dependent RTE in Underwater Optical Wireless CommunicationsMay 04 2019In this paper, an improved numerical solver to evaluate the time-dependent radiative transfer equation (RTE) for underwater optical wireless communications (UOWC) is investigated. The RTE evaluates the optical path-loss of light wave in an underwater ... More
On the Impact of Memory Allocation on High-Performance Query ProcessingMay 03 2019Somewhat surprisingly, the behavior of analytical query engines is crucially affected by the dynamic memory allocator used. Memory allocators highly influence performance, scalability, memory efficiency and memory fairness to other processes. In this ... More
On Linear Learning with Manycore ProcessorsMay 02 2019A new generation of manycore processors is on the rise that offers dozens and more cores on a chip and, in a sense, fuses host processor and accelerator. In this paper we target the efficient training of generalized linear models on these machines. We ... More
On Linear Learning with Manycore ProcessorsMay 02 2019May 03 2019A new generation of manycore processors is on the rise that offers dozens and more cores on a chip and, in a sense, fuses host processor and accelerator. In this paper we target the efficient training of generalized linear models on these machines. We ... More
Empirically Analyzing Ethereum's Gas MechanismMay 02 2019Ethereum's Gas mechanism attempts to set transaction fees in accordance with the computational cost of transaction execution: a cost borne by default by every node on the network to ensure correct smart contract execution. Gas encourages users to author ... More
Computational Petri Nets: Adjunctions Considered HarmfulApr 29 2019We review some of the endeavors in trying to connect Petri nets with free symmetric monoidal categories. We give a list of requirement such connections should respect if they are meant to be useful for practical/implementation purposes. We show how previous ... More
Computational Petri Nets: Adjunctions Considered HarmfulApr 29 2019May 08 2019We review some of the endeavors in trying to connect Petri nets with free symmetric monoidal categories. We give a list of requirement such connections should respect if they are meant to be useful for practical/implementation purposes. We show how previous ... More
SPH-EXA: Enhancing the Scalability of SPH codes Via an Exascale-Ready SPH Mini-AppApr 29 2019Numerical simulations of fluids in astrophysics and computational fluid dynamics (CFD) are among the most computationally-demanding calculations, in terms of sustained floating-point operations per second, or FLOP/s. It is expected that these numerical ... More
Reinforcement Learning Based Orchestration for Elastic ServicesApr 26 2019Due to the highly variable execution context in which edge services run, adapting their behavior to the execution context is crucial to comply with their requirements. However, adapting service behavior is a challenging task because it is hard to anticipate ... More
Tracking Performance Limitations of MIMO Networked Control Systems with Multiple Communication ConstraintsApr 25 2019In this paper, the tracking performance limitation of networked control systems (NCSs) is studied. The NCSs is considered as continuous-time linear multi-input multi-output (MIMO) systems with random reference noises. The controlled plants include unstable ... More
DTLS Performance - How Expensive is Security?Apr 25 2019Secure communication is an integral feature of many Internet services. The widely deployed TLS protects reliable transport protocols. DTLS extends TLS security services to protocols relying on plain UDP packet transport, such as VoIP or IoT applications. ... More
Performance of a Quantum Annealer for Ising Ground State Computations on Chimera GraphsApr 25 2019Quantum annealing is getting increasing attention in combinatorial optimization. The quantum processing unit by D-Wave is constructed to approximately solve Ising models on so-called Chimera graphs. Ising models are equivalent to quadratic unconstrained ... More
A mechanism for balancing accuracy and scope in cross-machine black-box GPU performance modelingApr 21 2019The ability to model, analyze, and predict execution time of computations is an important building block supporting numerous efforts, such as load balancing, performance optimization, and automated performance tuning for high performance, parallel applications. ... More
Memory and Parallelism Analysis Using a Platform-Independent ApproachApr 18 2019Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, ... More
Inversion formula with hypergeometric polynomials and its application to an integral equationApr 16 2019For any complex parameters $x$ and $\nu$, we provide a new class of linear inversion formulas $T = A(x,\nu) \cdot S \Leftrightarrow S = B(x,\nu) \cdot T$ between sequences $S = (S_n)_{n \in \mathbb{N}^*}$ and $T = (T_n)_{n \in \mathbb{N}^*}$, where the ... More
Low-Power Computer Vision: Status, Challenges, OpportunitiesApr 15 2019Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions ... More
Performance Models for Data Transfers: A Case Study with Molecular Chemistry KernelsApr 15 2019With increasing complexity of hardwares, systems with different memory nodes are ubiquitous in High Performance Computing (HPC). It is paramount to develop strategies to overlap the data transfers between memory nodes with computations in order to exploit ... More
Dynamic scheduling in a partially fluid, partially lossy queueing systemApr 13 2019We consider a single server queueing system with two classes of jobs: eager jobs with small sizes that require service to begin almost immediately upon arrival, and tolerant jobs with larger sizes that can wait for service. While blocking probability ... More
Energy Saving Strategy Based on ProfilingApr 12 2019Constraints imposed by power consumption and the related costs are one of the key roadblocks to the design and development of next generation exascale systems. To mitigate these issues, strategies that reduce the power consumption of the processor are ... More
Defence EfficiencyApr 11 2019In order to automate actions, such as defences against network attacks, one needs to quantify their efficiency. This can subsequently be used in post-evaluation, learning, etc. In order to quantify the defence efficiency as a function of the impact of ... More
The distribution of age-of-information performance measures for message processing systemsApr 11 2019The idea behind the recently introduced "age of information" performance measure of a networked message processing system is that it indicates our knowledge regarding the "freshness" of the most recent piece of information that can be used as a criterion ... More
Sound, Fine-Grained Traversal Fusion for Heterogeneous Trees - Extended VersionApr 11 2019Applications in many domains are based on a series of traversals of tree structures, and fusing these traversals together to reduce the total number of passes over the tree is a common, important optimization technique. In applications such as compilers ... More
Reducing Communication in Algebraic Multigrid with Multi-step Node Aware CommunicationApr 11 2019Algebraic multigrid (AMG) is often viewed as a scalable $\mathcal{O}(n)$ solver for sparse linear systems. Yet, parallel AMG lacks scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid ... More
Reducing Communication in Algebraic Multigrid with Multi-step Node Aware CommunicationApr 11 2019Apr 24 2019Algebraic multigrid (AMG) is often viewed as a scalable $\mathcal{O}(n)$ solver for sparse linear systems. Yet, parallel AMG lacks scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid ... More
Higher aggregation of gNodeBs in Cloud-RAN architectures via parallel computingApr 11 2019In this paper, we address the virtualization and the centralization of real-time network functions, notably in the framework of Cloud RAN (C-RAN). We thoroughly analyze the required fronthaul capacity for the deployment of the proposed C-RAN architecture. ... More
On the sojourn of an arbitrary customer in an $M/M/1$ Processor Sharing QueueApr 11 2019In this paper, we consider the number of both arrivals and departures seen by a tagged customer while in service in a classical $M/M/1$ processor sharing queue. By exploiting the underlying orthogonal structure of this queuing system revealed in an earlier ... More
A Processor-Sharing model for the Performance of Virtualized Network FunctionsApr 11 2019The parallel execution of requests in a Cloud Computing platform, as for Virtualized Network Functions, is modeled by an $M^{[X]}/M/1$ Processor-Sharing (PS) system, where each request is seen as a batch of unit jobs. The performance of such paralleled ... More
R-Storm: Resource-Aware Scheduling in StormApr 10 2019The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems ... More
Cross-Platform Performance Portability Using Highly Parametrized SYCL KernelsApr 10 2019Over recent years heterogeneous systems have become more prevalent across HPC systems, with over 100 supercomputers in the TOP500 incorporating GPUs or other accelerators. These hardware platforms have different performance characteristics and optimization ... More
Optimisation of stochastic networks with blocking: a functional-form approachApr 10 2019Many stochastic networks encountered in practice exhibit some kind of blocking behaviour, where traffic is lost due to congestion. Examples include call dropping in cellular networks, difficulties with task migration in mobile cloud computing, and depleted ... More
Modeling Corruption in Eventually-Consistent Graph DatabasesApr 09 2019We present a model and analysis of an eventually consistent graph database where loosely cooperating servers accept concurrent updates to a partitioned, distributed graph. The model is high-fidelity and preserves design choices from contemporary graph ... More
A High-Performance Energy Management System based on Evolving GraphApr 08 2019As the fast growth and large integration of distributed generation, renewable energy resource, energy storage system and load response, the modern power system operation becomes much more complicated with increasing uncertainties and frequent changes. ... More
Accelerated Neural Networks on OpenCL Devices Using SYCL-DNNApr 08 2019Over the past few years machine learning has seen a renewed explosion of interest, following a number of studies showing the effectiveness of neural networks in a range of tasks which had previously been considered incredibly hard. Neural networks' effectiveness ... More
PerfVis: Pervasive Visualization in Immersive AugmentedReality for Performance AwarenessApr 05 2019Developers are usually unaware of the impact of code changes to the performance of software systems. Although developers can analyze the performance of a system by executing, for instance, a performance test to compare the performance of two consecutive ... More
Model Slicing for Supporting Complex Analytics with Elastic Inference Cost and Resource ConstraintsApr 03 2019Deep learning models have been used to support analytics beyond simple aggregation, where deeper and wider models have been shown to yield great results. These models consume a huge amount of memory and computational operations. However, most of the large-scale ... More
Parallel algorithms development for programmable logic devicesApr 01 2019Programmable Logic Devices (PLDs) continue to grow in size and currently contain several millions of gates. At the same time, research effort is going into higher-level hardware synthesis methodologies for reconfigurable computing that can exploit PLD ... More