Latest in cs.pf

total 1874took 0.13s
Do Energy-oriented Changes Hinder Maintainability?Aug 22 2019Energy efficiency is a crucial quality requirement for mobile applications. However, improving energy efficiency is far from trivial as developers lack the knowledge and tools to aid in this activity. In this paper we study the impact of changes to improve ... More
Computing System Congestion Management Using Exponential Smoothing ForecastingAug 21 2019An overloaded computer must finish what it starts and not start what will fail or hang. A congestion management algorithm the author developed, and Siemens Corporation patented for telecom products, effectively manages traffic overload with its unique ... More
An Autonomous Performance Testing Framework using Self-Adaptive Fuzzy Reinforcement LearningAug 19 2019Test automation can result in reduction in cost and human effort. If the optimal policy, the course of actions taken, for the intended objective in a testing process could be learnt by the testing system (e.g., a smart tester agent), then it could be ... More
Across-Stack Profiling and Characterization of Machine Learning Models on GPUsAug 19 2019The world sees a proliferation of machine learning/deep learning (ML) models and their wide adoption in different application domains recently. This has made the profiling and characterization of ML models an increasingly pressing task for both hardware ... More
Workload-Aware Opportunistic Energy Efficiency in Multi-FPGA PlatformsAug 18 2019The continuous growth of big data applications with high computational and scalability demands has resulted in increasing popularity of cloud computing. Optimizing the performance and power consumption of cloud resources is therefore crucial to relieve ... More
New Results on Parameter Estimation via Dynamic Regressor Extension and Mixing: Continuous and Discrete-time CasesAug 14 2019We present some new results on the dynamic regressor extension and mixing parameter estimators for linear regression models recently proposed in the literature. This technique has proven instrumental in the solution of several open problems in system ... More
Micro-architectural Analysis of OLAP: Limitations and OpportunitiesAug 13 2019Understanding micro-architectural behavior is profound in efficiently using hardware resources. Recent work has shown that, despite being aggressively optimized for modern hardware, in-memory online transaction processing (OLTP) systems severely underutilize ... More
Exploiting Parallelism Opportunities with Deep Learning FrameworksAug 13 2019State-of-the-art machine learning frameworks support a wide variety of design features to enable a flexible machine learning programming interface and to ease the programmability burden on machine learning developers. Identifying and using a performance-optimal ... More
Enhanced Performance and Privacy via Resolver-Less DNSAug 13 2019The domain name resolution into IP addresses can significantly delay connection establishments on the web. Moreover, the common use of recursive DNS resolvers presents a privacy risk as they can closely monitor the user's browsing activities. In this ... More
Type-Directed Program Synthesis and Constraint Generation for Accelerator Library PortabilityAug 13 2019Fast numerical libraries have been a cornerstone of scientific computing for decades, but this comes at a price. Programs may be tied to vendor specific software ecosystems resulting in polluted, non-portable code. As we enter an era of heterogeneous ... More
Type-Directed Program Synthesis and Constraint Generation for Library PortabilityAug 13 2019Aug 14 2019Fast numerical libraries have been a cornerstone of scientific computing for decades, but this comes at a price. Programs may be tied to vendor specific software ecosystems resulting in polluted, non-portable code. As we enter an era of heterogeneous ... More
uPredict: A User-Level Profiler-Based Predictive Framework for Single VM Applications in Multi-Tenant CloudsAug 13 2019Most existing studies on performance prediction for virtual machines (VMs) in multi-tenant clouds are at system level and generally require access to performance counters in Hypervisors. In this work, we propose uPredict, a user-level profiler-based performance ... More
MLP Aware Scheduling Techniques in Multithreaded ProcessorsAug 12 2019Major chip manufacturers have all introduced Multithreaded processors. These processors are used for running a variety of workloads. Efficient resource utilization is an important design aspect in such processors. Particularly, it is important to take ... More
Performance of Devito on HPC-Optimised ARM ProcessoAug 09 2019We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared ... More
Performance of Devito on HPC-Optimised ARM ProcessorsAug 09 2019Aug 19 2019We evaluate the performance of Devito, a domain specific language (DSL) for finite differences on Arm ThunderX2 processors. Experiments with two common seismic computational kernels demonstrate that Arm processors can deliver competitive performance compared ... More
An Empirical Guide to the Behavior and Use of Scalable Persistent MemoryAug 09 2019After nearly a decade of anticipation, scalable nonvolatile memory DIMMs are finally commercially available with the release of Intel's 3D XPoint DIMM. This new nonvolatile DIMM supports byte-granularity accesses with access times on the order of DRAM, ... More
Performance Comparison for Neuroscience Application BenchmarksAug 07 2019Researchers within the Human Brain Project and related projects have in the last couple of years expanded their needs for high-performance computing infrastructures. The needs arise from a diverse set of science challenges that range from large-scale ... More
Near-Memory Computing: Past, Present, and FutureAug 07 2019The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D integration ... More
Redundancy Scheduling in Systems with Bi-Modal Job Service Time DistributionAug 07 2019Queuing systems with redundant requests have drawn great attention because of their promise to reduce the job completion time and its variability. Despite a large body of work on this topic, we are still far from fully understanding the benefits of redundancy ... More
Analytical Performance Models for NoCs with Multiple Priority Traffic ClassesAug 07 2019Networks-on-chip (NoCs) have become the standard for interconnect solutions in industrial designs ranging from client CPUs to many-core chip-multiprocessors. Since NoCs play a vital role in system performance and power consumption, pre-silicon evaluation ... More
Edge AIBench: Towards Comprehensive End-to-end Edge Computing BenchmarkingAug 06 2019In edge computing scenarios, the distribution of data and collaboration of workloads on different layers are serious concerns for performance, privacy, and security issues. So for edge computing benchmarking, we must take an end-to-end view, considering ... More
A Repairable System Supported by Two Spare Units and Serviced by Two Types of RepairersAug 04 2019We study a one-unit repairable system, supported by two identical spare units on cold standby, and serviced by two types of repairers. The model applies, for instance, to ANSI (American National Standard Institute) centrifugal pumps in a chemical plant. ... More
Testing performance with and without Block Low Rank Compression in MUMPS and the new PaStiX 6.0 for JOREK nonlinear MHD simulationsJul 31 2019The interface to the MUMPS solver was updated in the JOREK MHD code to support Block Low Rank (BLR) compression and an interface to the new PaStiX solver version 6 has been implemented supporting BLR as well. First tests were carried out with JOREK, which ... More
A performance comparison of Dask and Apache Spark for data-intensive neuroimaging pipelinesJul 30 2019Jul 31 2019In the past few years, neuroimaging has entered the Big Data era due to the joint increase in image resolution, data sharing, and study sizes. However, no particular Big Data engines have emerged in this field, and several alternatives remain available. ... More
A performance comparison of Dask and Apache Spark for data-intensive neuroimaging pipelinesJul 30 2019In the past few years, neuroimaging has entered the Big Data era due to the joint increase in image resolution, data sharing, and study sizes. However, no particular Big Data engines have emerged in this field, and several alternatives remain available. ... More
Beyond Safety Drivers: Staffing a Teleoperations System for Autonomous VehiclesJul 30 2019Driverless vehicles promise a host of societal benefits including dramatically improved safety, increased accessibility, greater productivity, and higher quality of life. As this new technology approaches widespread deployment, both industry and government ... More
Modeling Shared Cache Performance of OpenMP Programs using Reuse DistanceJul 29 2019Performance modeling of parallel applications on multicore computers remains a challenge in computational co-design due to the complex design of multicore processors including private and shared memory hierarchies. We present a Scalable Analytical Shared ... More
ICE: An Interactive Configuration Explorer for High Dimensional Categorical Parameter SpacesJul 29 2019There are many applications where users seek to explore the impact of the settings of several categorical variables with respect to one dependent numerical variable. For example, a computer systems analyst might want to study how the type of file system ... More
The Preliminary Evaluation of a Hypervisor-based Virtualization Mechanism for Intel Optane DC Persistent Memory ModuleJul 28 2019Non-volatile memory (NVM) technologies, being accessible in the same manner as DRAM, are considered indispensable for expanding main memory capacities. Intel Optane DCPMM is a long-awaited product that drastically increases main memory capacities. However, ... More
Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore SystemsJul 27 2019Innovations in Next-Generation Sequencing are enabling generation of DNA sequence data at ever faster rates and at very low cost. Large sequencing centers typically employ hundreds of such systems. Such high-throughput and low-cost generation of data ... More
HPC AI500: A Benchmark Suite for HPC AI SystemsJul 27 2019Aug 13 2019In recent years, with the trend of applying deep learning (DL) in high performance scientific computing, the unique characteristics of emerging DL workloads in HPC raise great challenges in designing, implementing HPC AI systems. The community needs a ... More
HPC AI500: A Benchmark Suite for HPC AI SystemsJul 27 2019In recent years, with the trend of applying deep learning (DL) in high performance scientific computing, the unique characteristics of emerging DL workloads in HPC raise great challenges in designing, implementing HPC AI systems. The community needs a ... More
Anonymity Mixes as (Partial) Assembly Queues: Modeling and AnalysisJul 26 2019Anonymity platforms route the traffic over a network of special routers that are known as mixes and implement various traffic disruption techniques to hide the communicating users' identities. Batch mixes in particular anonymize communicating peers by ... More
MDS coding is better than replication for job completion timesJul 25 2019In a multi-server system, how can one get better performance than random assignment of jobs to servers if queue-states cannot be queried by the dispatcher? A replication strategy has recently been proposed where $d$ copies of each arriving job are sent ... More
Simple Near-Optimal Scheduling for the M/G/1Jul 25 2019We consider the problem of preemptively scheduling jobs to minimize mean response time of an M/G/1 queue. When the scheduler knows each job's size, the shortest remaining processing time (SRPT) policy is optimal. Unfortunately, in many settings we do ... More
Simple Near-Optimal Scheduling for the M/G/1Jul 25 2019Aug 08 2019We consider the problem of preemptively scheduling jobs to minimize mean response time of an M/G/1 queue. When the scheduler knows each job's size, the shortest remaining processing time (SRPT) policy is optimal. Unfortunately, in many settings we do ... More
Benchmarking TPU, GPU, and CPU Platforms for Deep LearningJul 24 2019Aug 06 2019Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for ... More
Benchmarking TPU, GPU, and CPU Platforms for Deep LearningJul 24 2019Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for ... More
Benchmarking TPU, GPU, and CPU Platforms for Deep LearningJul 24 2019Jul 31 2019Training deep learning models is compute-intensive and there is an industry-wide trend towards hardware specialization to improve performance. To systematically benchmark deep learning platforms, we introduce ParaDnn, a parameterized benchmark suite for ... More
Multiple Server SRPT with speed scaling is competitiveJul 21 2019Can the popular shortest remaining processing time (SRPT) algorithm achieve a constant competitive ratio on multiple servers when server speeds are adjustable (speed scaling) with respect to the flow time plus energy consumption metric? This question ... More
A Hermite-like basis for faster matrix-free evaluation of interior penalty discontinuous Galerkin operatorsJul 19 2019This work proposes a basis for improved throughput of matrix-free evaluation of discontinuous Galerkin symmetric interior penalty discretizations on hexahedral elements. The basis relies on ideas of Hermite polynomials. It is used in a fully discontinuous ... More
Quantitative Impact Evaluation of an Abstraction Layer for Data Stream Processing SystemsJul 18 2019With the demand to process ever-growing data volumes, a variety of new data stream processing frameworks have been developed. Moving an implementation from one such system to another, e.g., for performance reasons, requires adapting existing applications ... More
Approximate Solution Approach and Performability Evaluation of Large Scale Beowulf ClustersJul 18 2019Beowulf clusters are very popular and deployed worldwide in support of scientific computing, because of the high computational power and performance. However, they also pose several challenges, and yet they need to provide high availability. The practical ... More
A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector MultiplicationJul 15 2019The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores ... More
Simulating Nonlinear Neutrino Oscillations on Next-Generation Many-Core ArchitecturesJul 12 2019In this work an astrophysical simulation code, XFLAT, is developed to study neutrino oscillations in supernovae. XFLAT is designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both ... More
Profiling based Out-of-core Hybrid Method for Large Neural NetworksJul 11 2019GPUs are widely used to accelerate deep learning with NNs (NNs). On the other hand, since GPU memory capacity is limited, it is difficult to implement efficient programs that compute large NNs on GPU. To compute NNs exceeding GPU memory capacity, data-swapping ... More
A Unified Analysis Approach for Hardware and Software ImplementationsJul 10 2019Smart gadgets are being embedded almost in every aspect of our lives. From smart cities to smart watches, modern industries are increasingly supporting the Internet-of-Things (IoT). SysMART aims at making supermarkets smart, productive, and with a touch ... More
Scheduling With Inexact Job Sizes: The Merits of Shortest Processing Time FirstJul 10 2019It is well known that size-based scheduling policies, which take into account job size (i.e., the time it takes to run them), can perform very desirably in terms of both response time and fairness. Unfortunately, the requirement of knowing a priori the ... More
Bi-objective Optimisation of Data-parallel Applications on Heterogeneous Platforms for Performance and Energy via Workload DistributionJul 09 2019Performance and energy are the two most important objectives for optimisation on modern parallel platforms. Latest research demonstrated the importance of workload distribution as a decision variable in the bi-objective optimisation for performance and ... More
Methodologies of Link-Level Simulator and System-Level Simulator for C-V2X CommunicationJul 09 2019At the time of the development, standardization, and further improvement are vital to the modern cellular systems such as the next generation wireless communication (5G). Simulations are essential to test and optimize algorithms and procedures prior to ... More
Barriers towards no-reference metrics application to compressed video quality analysis: on the example of no-reference metric NIQEJul 08 2019This paper analyses the application of no-reference metric NIQE to the task of video-codec comparison. A number of issues in the metric behaviour on videos was detected and described. The metric has outlying scores on black and solid-coloured frames. ... More
CHOP: Bypassing Runtime Bounds Checking Through Convex Hull OPtimizationJul 08 2019Unsafe memory accesses in programs written using popular programming languages like C/C++ have been among the leading causes for software vulnerability. Prior memory safety checkers such as SoftBound enforce memory spatial safety by checking if every ... More
Metamorphic IOTAJul 08 2019IOTA opened recently a new line of research in distributed ledgers area by targeting algorithms that ensure a high throughput for the transactions generated in IoT systems. Transactions are continuously appended to an acyclic structure called tangle and ... More
Guidelines for benchmarking of optimization approaches for fitting mathematical modelsJul 08 2019Insufficient performance of optimization approaches for fitting of mathematical models is still a major bottleneck in systems biology. In this manuscript, the reasons and methodological challenges are summarized as well as their impact in benchmark studies. ... More
Etalumis: Bringing Probabilistic Programming to Scientific Simulators at ScaleJul 08 2019Probabilistic programming languages (PPLs) are receiving widespread attention for performing Bayesian inference in complex generative models. However, applications to science remain limited because of the impracticability of rewriting complex scientific ... More
Optimizing Xeon Phi for Interactive Data AnalysisJul 06 2019The Intel Xeon Phi manycore processor is designed to provide high performance matrix computations of the type often performed in data analysis. Common data analysis environments include Matlab, GNU Octave, Julia, Python, and R. Achieving optimal performance ... More
Streaming 1.9 Billion Hypersparse Network Updates per Second with D4MJul 06 2019The Dynamic Distributed Dimensional Data Model (D4M) library implements associative arrays in a variety of languages (Python, Julia, and Matlab/Octave) and provides a lightweight in-memory database implementation of hypersparse arrays that are ideal for ... More
RegDem: Increasing GPU Performance via Shared Memory Register SpillingJul 05 2019GPU utilization, measured as occupancy, is limited by the parallel threads' combined usage of on-chip resources, such as registers and the programmer-managed shared memory. Higher resource demand means lower effective parallel thread count, and therefore ... More
Qualitative Benchmarking of Deep Learning Hardware and Frameworks: Review and TutorialJul 05 2019Previous survey papers offer knowledge of deep learning hardware devices and software frameworks. This paper introduces benchmarking principles, surveys machine learning devices including GPUs, FPGAs, and ASICs, and reviews deep learning software frameworks. ... More
Benchmarking Deep Learning Hardware and Frameworks: Qualitative MetricsJul 05 2019Jul 09 2019Previous survey papers offer knowledge of deep learning hardware devices and software frameworks. This paper introduces benchmarking principles, surveys machine learning devices including GPUs, FPGAs, and ASICs, and reviews deep learning software frameworks. ... More
Automatic Differentiation for Adjoint Stencil LoopsJul 05 2019Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, ... More
Energy of Computing on Multicore CPUs: Predictive Models and Energy Conservation LawJul 05 2019Energy is now a first-class design constraint along with performance in all computing settings. Energy predictive modelling based on performance monitoring counts (PMCs) is the leading method used for prediction of energy consumption during an application ... More
CloudCoaster: Transient-aware Bursty DatacenterWorkload SchedulingJul 03 2019Today's clusters often have to divide resources among a diverse set of jobs. These jobs are heterogeneous both in execution time and in their rate of arrival. Execution time heterogeneity has lead to the development of hybrid schedulers that can schedule ... More
Accelerator-level ParallelismJul 02 2019With the slowing of technology scaling, the only known way to further improve computer system performance under energy constraints is to employ hardware accelerators. Already today, many chips in mobile, edge and cloud computing concurrently employ multiple ... More
Accelerator-level ParallelismJul 02 2019Jul 08 2019With the slowing of technology scaling, the only known way to further improve computer system performance under energy constraints is to employ hardware accelerators. Already today, many chips in mobile, edge and cloud computing concurrently employ multiple ... More
Bridging the Architecture Gap: Abstracting Performance-Relevant Properties of Modern Server ProcessorsJul 01 2019We describe a universal modeling approach for predicting single- and multicore runtime of steady-state loops on server processors. To this end we strictly differentiate between application and machine models: An application model comprises the loop code, ... More
A Fast-rate WLAN Measurement Tool for Improved Miss-rate in Indoor NavigationJun 30 2019Recently, location-based services (LBS) have steered attention to indoor positioning systems (IPS). WLAN-based IPSs relying on received signal strength (RSS) measurements such as fingerprinting are gaining popularity due to proven high accuracy of their ... More
Fast prototyping of an SDR WLAN 802.11b receiver for an indoor positioning systemJun 30 2019Indoor positioning systems (IPS) are emerging technologies due to an increasing popularity and demand in location based service (LBS). Because traditional positioning systems such as GPS are limited to outdoor applications, many IPS have been proposed ... More
Exploiting Acceleration Features of LabVIEW platform for Real-Time GNSS Software Receiver OptimizationJun 30 2019This paper presents the new generation of LabVIEW-based GPS receiver testbed that is based on National Instruments' (NI) LabVIEW (LV) platform in conjunction to C/C++ dynamic link libraries (DLL) used inside the platform for performance execution. This ... More
Open-MPI over MOSIX: paralleled computing in a clustered worldJun 29 2019Recent increased interest in Cloud computing emphasizes the need to find an adequate solution to the load-balancing problem in parallel computing -- efficiently running several jobs concurrently on a cluster of shared computers (nodes). One approach to ... More
Pinpointing Performance Inefficiencies in JavaJun 28 2019Many performance inefficiencies such as inappropriate choice of algorithms or data structures, developers' inattention to performance, and missed compiler optimizations show up as wasteful memory operations. Wasteful memory operations are those that produce/consume ... More
State-of-the-Art on Query & Transaction Processing AccelerationJun 27 2019The vast amount of processing power and memory bandwidth provided by modern Graphics Processing Units (GPUs) make them a platform for data-intensive applications. The database community identified GPUs as effective co-processors for data processing. In ... More
One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-off in Machine Learning Cloud Service APIs via Tolerance TiersJun 26 2019Today's cloud service architectures follow a "one size fits all" deployment strategy where the same service version instantiation is provided to the end users. However, consumers are broad and different applications have different accuracy and responsiveness ... More
Stress-SGX: Load and Stress your Enclaves for Fun and ProfitJun 26 2019The latest generation of Intel processors supports Software Guard Extensions (SGX), a set of instructions that implements a Trusted Execution Environment (TEE) right inside the CPU, by means of so-called enclaves. This paper presents Stress-SGX, an easy-to-use ... More
FPGA-based Multi-Chip Module for High-Performance ComputingJun 26 2019Current integration, architectural design and manufacturing technologies are not suited for the computing density and power efficiency requested by Exascale computing. New approaches in hardware architecture are thus needed to overcome the technological ... More
Q-Learning Inspired Self-Tuning for Energy Efficiency in HPCJun 26 2019System self-tuning is a crucial task to lower the energy consumption of computers. Traditional approaches decrease the processor frequency in idle or synchronisation periods. However, in High-Performance Computing (HPC) this is not sufficient: if the ... More
Security Rating Metrics for Distributed Wireless SystemsJun 26 2019The paper examines quantitative assessment of wireless distribution system security, as well as an assessment of risks from attacks and security violations. Furthermore, it describes typical security breach and formal attack models and five methods for ... More
Straggler Mitigation at ScaleJun 25 2019Runtime performance variability at the servers has been a major issue, hindering the predictable and scalable performance in modern distributed systems. Executing requests or jobs redundantly over multiple servers has been shown to be effective for mitigating ... More
Fast Data: Moving beyond from Big Data's map-reduceJun 25 2019Big Data may not be the solution many are looking for. The latest rise of Big Data methods and systems is partly due to the new abilities these techniques provide, partly to the simplicity of the software design and partly because the buzzword itself ... More
Mirovia: A Benchmarking Suite for Modern Heterogeneous ComputingJun 25 2019This paper presents Mirovia, a benchmark suite developed for modern day heterogeneous computing. Previous benchmark suites such as Rodinia and SHOC are well written and have many desirable features. However, these tools were developed years ago when hardware ... More
EasyCrash: Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under FailuresJun 24 2019Emerging non-volatile memory (NVM) is promising for building future HPC. Leveraging the non-volatility of NVM as main memory, we can restart the application using data objects remaining on NVM when the application crashes. This paper explores this solution ... More
Platform Independent Software Analysis for Near Memory ComputingJun 24 2019Near-memory Computing (NMC) promises improved performance for the applications that can exploit the features of emerging memory technologies such as 3D-stacked memory. However, it is not trivial to find such applications and specialized tools are needed ... More
On The Performance of ARM TrustZoneJun 24 2019The TrustZone technology, available in the vast majority of recent ARM processors, allows the execution of code inside a so-called secure world. It effectively provides hardware-isolated areas of the processor for sensitive data and code, i.e., a trusted ... More
On The Performance of ARM TrustZoneJun 24 2019Jun 26 2019The TrustZone technology, available in the vast majority of recent ARM processors, allows the execution of code inside a so-called secure world. It effectively provides hardware-isolated areas of the processor for sensitive data and code, i.e., a trusted ... More
Retrial Queueing Models: A Survey on Theory and ApplicationsJun 23 2019Retrial phenomenon naturally arises in various systems such as call centers, cellular networks and random access protocols in local area networks. This paper gives a comprehensive survey on theory and applications of retrial queues in these systems. We ... More
On the Secrecy Rate of Spatial Modulation Based Indoor Visible Light CommunicationsJun 22 2019In this paper, we investigate the physical-layer security for a spatial modulation (SM) based indoor visible light communication (VLC) system, which includes multiple transmitters, a legitimate receiver, and a passive eavesdropper (Eve). At the transmitters, ... More
Optimal Message Bundling with Delay and Synchronization Constraints in Wireless Sensor NetworksJun 21 2019Message bundling is an effective way to reduce the energy consumption for message transmissions in wireless sensor networks. However, bundling more messages could increase both end-to-end delay and message transmission interval; the former needs to be ... More
A Beaconless Asymmetric Energy-Efficient Time Synchronization Scheme for Resource-Constrained Multi-Hop Wireless Sensor NetworksJun 21 2019The ever-increasing number of WSN deployments based on a large number of battery-powered, low-cost sensor nodes, which are limited in their computing and power resources, puts the focus of WSN time synchronization research on three major aspects, i.e., ... More
Toward a Standard Interface for User-Defined Scheduling in OpenMPJun 21 2019Jul 08 2019Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain ... More
Toward a Standard Interface for User-Defined Scheduling in OpenMPJun 21 2019Parallel loops are an important part of OpenMP programs. Efficient scheduling of parallel loops can improve performance of the programs. The current OpenMP specification only offers three options for loop scheduling, which are insufficient in certain ... More
Using Machine Learning to Optimize Web Interactions on Heterogeneous Mobile Multi-coresJun 20 2019Aug 06 2019The web has become a ubiquitous application development platform for mobile systems. Yet, web access on mobile devices remains an energy-hungry activity. Prior work in the field mainly focuses on the initial page loading stage, but fails to exploit the ... More
Using Machine Learning to Optimize Web Interactions on Heterogeneous Mobile Multi-coresJun 20 2019The web has become a ubiquitous application development platform for mobile systems. Yet, energy-efficient mobile web browsing remains an outstanding challenge. Prior work in the field mainly focuses on the initial page loading stage but fails to exploit ... More
Using Machine Learning to Optimize Web Interactions on Heterogeneous Mobile Multi-coresJun 20 2019Jul 19 2019The web has become a ubiquitous application development platform for mobile systems. Yet, energy-efficient mobile web browsing remains an outstanding challenge. Prior work in the field mainly focuses on the initial page loading stage but fails to exploit ... More
Enhancing Spectral Utilization by Maximizing the Reuse in LTE NetworkJun 20 2019Need for increased spectral efficiency is key to improve the quality of experience for next-generation wireless applications like online gaming, HD Video, etc.,. In our work, we consider an LTE Device-to-device (D2D) network where LTE UEs have primary ... More
Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECTJun 19 2019Stencil algorithms have been receiving considerable interest in HPC research for decades. The techniques used to approach multi-core stencil performance modeling and engineering span basic runtime measurements, elaborate performance models, detailed hardware ... More
Collecting and Presenting Reproducible Intranode Stencil Performance: INSPECTJun 19 2019Jul 02 2019Stencil algorithms have been receiving considerable interest in HPC research for decades. The techniques used to approach multi-core stencil performance modeling and engineering span basic runtime measurements, elaborate performance models, detailed hardware ... More
MultiCloud Resource Management using Apache Mesos with Apache AiravataJun 18 2019We discuss initial results and our planned approach for incorporating Apache Mesos based resource management that will enable design and development of scheduling strategies for Apache Airavata jobs so that they can be launched on multiple clouds, wherein ... More
Monotonically relaxing concurrent data-structure semantics for performance: An efficient 2D design frameworkJun 17 2019There has been a significant amount of work in the literature proposing semantic relaxation of concurrent data structures for improving scalability and performance. By relaxing the semantics of a data structure, a bigger design space, that allows weaker ... More
Diffusing Your Mobile Apps: Extending In-Network Function Virtualization to Mobile Function OffloadingJun 14 2019Motivated by the huge disparity between the limited battery capacity of user devices and the ever-growing energy demands of modern mobile apps, we propose INFv. It is the first offloading system able to cache, migrate and dynamically execute on demand ... More
A JIT Compiler for Neural Network InferenceJun 13 2019This paper describes a C++ library that compiles neural network models at runtime into machine code that performs inference. This approach in general promises to achieve the best performance possible since it is able to integrate statically known properties ... More