Latest in cs.dc

total 8169took 0.11s
A Blockchain-based Decentralized Self-balancing Architecture for the Web of ThingsApr 21 2019Edge computing is a distributed computing paradigm that relies on computational resources of end devices in a network to bring benefits such as low bandwidth utilization, responsiveness, scalability and privacy preservation. Applications range from large ... More
Data Races and the Discrete Resource-time Tradeoff Problem with Resource Reuse over PathsApr 19 2019A determinacy race occurs if two or more logically parallel instructions access the same memory location and at least one of them tries to modify its content. Races often lead to nondeterministic and incorrect program behavior. A data race is a special ... More
HEPCloud, an Elastic Hybrid HEP Facility using an Intelligent Decision Support SystemApr 18 2019HEPCloud is rapidly becoming the primary system for provisioning compute resources for all Fermilab-affiliated experiments. In order to reliably meet the peak demands of the next generation of High Energy Physics experiments, Fermilab must plan to elastically ... More
Harmonia: Near-Linear Scalability for Replicated Storage with In-Network Conflict DetectionApr 18 2019Distributed storage employs replication to mask failures and improve availability. However, these systems typically exhibit a hard tradeoff between consistency and performance. Ensuring consistency introduces coordination overhead, and as a result the ... More
Memory and Parallelism Analysis Using a Platform-Independent ApproachApr 18 2019Emerging computing architectures such as near-memory computing (NMC) promise improved performance for applications by reducing the data movement between CPU and memory. However, detecting such applications is not a trivial task. In this ongoing work, ... More
Investigating the Dirac operator evaluation with FPGAsApr 18 2019In recent years the computational capacity of single Field Programmable Gate Arrays (FPGA) devices as well as their versatility has increased significantly. Adding to that the High Level Synthesis frameworks allowing to program such processors in a high ... More
Terra: Scalable Cross-Layer GDA OptimizationsApr 17 2019Geo-distributed analytics (GDA) frameworks transfer large datasets over the wide-area network (WAN). Yet existing frameworks often ignore the WAN topology. This disconnect between WAN-bound applications and the WAN itself results in missed opportunities ... More
Low-Latency Graph Streaming Using Compressed Purely-Functional TreesApr 17 2019Due to the dynamic nature of real-world graphs, there has been a growing interest in the graph-streaming setting where a continuous stream of graph updates is mixed with arbitrary graph queries. In principle, purely-functional trees are an ideal choice ... More
Truxen: A Trusted Computing Enhanced BlockchainApr 17 2019Truxen is a Trusted Computing enhanced blockchain that uses Proof of Integrity protocol as the consensus. Proof of Integrity protocol is derived from Trusted Computing and associated Remote Attestations, that can be used to vouch a node's identity and ... More
Improved Distributed Expander Decomposition and Nearly Optimal Triangle EnumerationApr 17 2019An $(\epsilon,\phi)$-expander decomposition of a graph $G=(V,E)$ is a clustering of the vertices $V=V_{1}\cup\cdots\cup V_{x}$ such that (1) each cluster $V_{i}$ induces subgraph with conductance at least $\phi$, and (2) the number of inter-cluster edges ... More
PL-NMF: Parallel Locality-Optimized Non-negative Matrix FactorizationApr 16 2019Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including topic modeling, recommender systems and bioinformatics. Due to the compute-intensive nature of applications that ... More
Distributed Computing in the Asynchronous LOCAL modelApr 16 2019The LOCAL model is among the main models for studying locality in the framework of distributed network computing. This model is however subject to pertinent criticisms, including the facts that all nodes wake up simultaneously, perform in lock steps, ... More
Parallel Balanced Allocations: The Heavily Loaded CaseApr 16 2019We study parallel algorithms for the classical balls-into-bins problem, in which $m$ balls acting in parallel as separate agents are placed into $n$ bins. Algorithms operate in synchronous rounds, in each of which balls and bins exchange messages once. ... More
Efficient Distributed Community Detection in the Stochastic Block ModelApr 16 2019Designing effective algorithms for community detection is an important and challenging problem in {\em large-scale} graphs, studied extensively in the literature. Various solutions have been proposed, but many of them are centralized with expensive procedures ... More
Just-in-Time Dynamic-BatchingApr 16 2019Batching is an essential technique to improve computation efficiency in deep learning frameworks. While batch processing for models with static feed-forward computation graphs is straightforward to implement, batching for dynamic computation graphs such ... More
Exploiting Computation Power of Blockchain for Biomedical Image SegmentationApr 15 2019Biomedical image segmentation based on Deep neuralnetwork (DNN) is a promising approach that assists clin-ical diagnosis. This approach demands enormous com-putation power because these DNN models are compli-cated, and the size of the training data is ... More
White-Box Atomic Multicast (Extended Version)Apr 15 2019Atomic multicast is a communication primitive that delivers messages to multiple groups of processes according to some total order, with each group receiving the projection of the total order onto messages addressed to it. To be scalable, atomic multicast ... More
Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent MemoryApr 15 2019Intel Optane DC Persistent Memory is a new kind of byte-addressable memory with higher density and lower cost than DRAM. This enables affordable systems that support up to 6TB of memory. In this paper, we use such a system for massive graphs analytics. ... More
Specifying Concurrent Programs in Separation Logic: Morphisms and SimulationsApr 15 2019In addition to pre- and postconditions, program specifications in recent separation logics for concurrency have employed an algebraic structure of resources - a form of state transition systems - to describe the state-based program invariants that must ... More
Distributed Matrix Multiplication Using Speed Adaptive CodingApr 15 2019While performing distributed computations in today's cloud-based platforms, execution speed variations among compute nodes can significantly reduce the performance and create bottlenecks like stragglers. Coded computation techniques leverage coding theory ... More
The DEEP-ER project: I/O and resiliency extensions for the Cluster-Booster architectureApr 15 2019The recently completed research project DEEP-ER has developed a variety of hardware and software technologies to improve the I/O capabilities of next generation high-performance computers, and to enable applications recovering from the larger hardware ... More
Repeat-Authenticate Scheme for Multicasting of Blockchain Information in IoT SystemsApr 15 2019We study the problem of efficiently disseminating authenticated blockchain information from blockchain nodes (servers) to Internet of Things (IoT) devices, through a wireless base station (BS). In existing blockchain protocols, upon generation of a new ... More
Efficient Blockchain Synchronization for Internet of Things using Signature AmortizationApr 15 2019Apr 18 2019We study the problem of efficiently disseminating authenticated blockchain information from blockchain nodes (servers) to Internet of Things (IoT) devices, through a wireless base station (BS). In existing blockchain protocols, upon generation of a new ... More
See the World through Network CamerasApr 14 2019Millions of network cameras have been deployed worldwide. Real-time data from many network cameras can offer instant views of multiple locations with applications in public safety, transportation management, urban planning, agriculture, forestry, social ... More
Secure Consistency Verification for Untrusted Cloud Storage by Public BlockchainsApr 14 2019This work presents ContractChecker, a Blockchain-based security protocol for verifying the storage consistency between the mutually distrusting cloud provider and clients. Unlike existing protocols, the ContractChecker uniquely delegates log auditing ... More
Got: Git, but for ObjectsApr 13 2019We look at one important category of distributed applications characterized by the existence of multiple collaborating, and competing, components sharing mutable, long-lived, replicated objects. The problem addressed by our work is that of object state ... More
Cryptocurrency with Fully Asynchronous Communication based on Banks and DemocracyApr 13 2019Cryptocurrencies came to the world in the recent decade and attempted to put a new order where the financial system is not governed by a centralized entity, and where you have complete control over your account without the need to trust strangers (governments ... More
Evaluation of the RIKEN Post-K Processor SimulatorApr 13 2019For the purpose of developing applications for Post-K at an early stage, RIKEN has developed a post-K processor simulator. This simulator is based on the general-purpose processor simulator gem5. It does not simulate the actual hardware of a post-K processor. ... More
Fast and Resource Competitive Broadcast in Multi-channel Radio NetworksApr 12 2019Consider a single-hop, multi-channel, synchronous radio network in which a source node needs to disseminate a message to all other $n-1$ nodes. An adversary called Eve, which captures environmental noise and potentially malicious interference, aims to ... More
Parallel parametric linear programming solving, and application to polyhedral computationsApr 12 2019Parametric linear programming is central in polyhedral computations and in certain control applications.We propose a task-based scheme for parallelizing it, with quasi-linear speedup over large problems.
Management of mobile resources in Physical Internet logistic modelsApr 12 2019This paper deals with the concept of a 'Physical Internet', the idea of building large logistics systems like the very successful Digital Internet network. The idea is to handle mobile resources, such as containers, just like Internet data packets. Thus, ... More
ezBFT: Decentralizing Byzantine Fault-Tolerant State Machine ReplicationApr 12 2019We present ezBFT, a novel leaderless, distributed consensus protocol capable of tolerating byzantine faults. ezBFT's main goal is to minimize the client-side latency in WAN deployments. It achieves this by (i) having no designated primary replica, and ... More
Survey of Major Load Balancing Algorithms in Distributed SystemApr 11 2019The classification of the most used load balancing algorithms in distributed systems (including cloud technology, cluster systems, grid systems) is described. Comparative analysis of types of the load balancing algorithms is conducted in accordance with ... More
Energy-Efficient High-Throughput Data Transfers via Dynamic CPU Frequency and Core ScalingApr 11 2019The energy footprint of global data movement has surpassed 100 terawatt hours, costing more than 20 billion US dollars to the world economy. Depending on the number of switches, routers, and hubs between the source and destination nodes, the networking ... More
Reducing Communication in Algebraic Multigrid with Multi-step Node Aware CommunicationApr 11 2019Algebraic multigrid (AMG) is often viewed as a scalable $\mathcal{O}(n)$ solver for sparse linear systems. Yet, parallel AMG lacks scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid ... More
FECBench: A Holistic Interference-aware Approach for Application Performance ModelingApr 11 2019Services hosted in multi-tenant cloud platforms often encounter performance interference due to contention for non-partitionable resources, which in turn causes unpredictable behavior and degradation in application performance. To grapple with these problems ... More
FECBench: A Holistic Interference-aware Approach for Application Performance ModelingApr 11 2019Apr 12 2019Services hosted in multi-tenant cloud platforms often encounter performance interference due to contention for non-partitionable resources, which in turn causes unpredictable behavior and degradation in application performance. To grapple with these problems ... More
Modular programming of computing media using spatial types, for artificial physicsApr 11 2019Our long term goal is to execute General Purpose computation on homogeneous computing media consisting of millions of small identical Processing Elements (PE) communicating locally. We proceed by simulating the Self-Development of a Network (SDN) of membranes, ... More
Information Leakage in Encrypted Deduplication via Frequency Analysis: Attacks and DefensesApr 11 2019Encrypted deduplication combines encryption and deduplication to simultaneously achieve both data security and storage efficiency. State-of-the-art encrypted deduplication systems mainly build on deterministic encryption to preserve deduplication effectiveness. ... More
On Byzantine Fault Tolerance in Multi-Master Kubernertes ClustersApr 11 2019Docker container virtualization technology is being widely adopted in cloud computing environments because of its lightweight and effiency. However, it requires adequate control and management via an orchestrator. As a result, cloud providers are adopting ... More
Locality of not-so-weak coloringApr 11 2019Many graph problems are locally checkable: a solution is globally feasible if it looks valid in all constant-radius neighborhoods. This idea is formalized in the concept of locally checkable labelings (LCLs), introduced by Naor and Stockmeyer (1995). ... More
Optimal Edge User Allocation in Edge Computing with Variable Sized Vector Bin PackingApr 11 2019In mobile edge computing, edge servers are geographically distributed around base stations placed near end-users to provide highly accessible and efficient computing capacities and services. In the mobile edge computing environment, a service provider ... More
Timely-Throughput Optimal Coded Computing over Cloud NetworksApr 11 2019In modern distributed computing systems, unpredictable and unreliable infrastructures result in high variability of computing resources. Meanwhile, there is significantly increasing demand for timely and event-driven services with deadline constraints. ... More
Efficient Distributed Workload (Re-)EmbeddingApr 10 2019Modern networked systems are increasingly reconfigurable, enabling demand-aware infrastructures whose resources can be adjusted according to the workload they currently serve. Such dynamic adjustments can be exploited to improve network utilization and ... More
R-Storm: Resource-Aware Scheduling in StormApr 10 2019The era of big data has led to the emergence of new systems for real-time distributed stream processing, e.g., Apache Storm is one of the most popular stream processing systems in industry today. However, Storm, like many other stream processing systems ... More
Performance Analysis of Linear Algebraic Functions using Reconfigurable ComputingApr 10 2019This paper introduces a new mapping of geometrical transformation on the MorphoSys (M1) reconfigurable computing (RC) system. New mapping techniques for some linear algebraic functions are recalled. A new mapping for geometrical transformation operations ... More
Cross-Platform Performance Portability Using Highly Parametrized SYCL KernelsApr 10 2019Over recent years heterogeneous systems have become more prevalent across HPC systems, with over 100 supercomputers in the TOP500 incorporating GPUs or other accelerators. These hardware platforms have different performance characteristics and optimization ... More
Application performance on a Cluster-Booster systemApr 10 2019The DEEP projects have developed a variety of hardware and software technologies aiming at improving the efficiency and usability of next generation high-performance computers. They evolve around an innovative concept for heterogeneous systems: the Cluster-Booster ... More
Analyzes of the Distributed System Load with Multifractal Input Data FlowsApr 10 2019The paper proposes a solution an actual scientific problem related to load balancing and efficient utilization of resources of the distributed system. The proposed method is based on calculation of load CPU, memory, and bandwidth by flows of different ... More
Applicability study of the PRIMAD model to LIGO gravitational wave search workflowsApr 10 2019The PRIMAD model with its six components (i.e., Platform, Research Objective, Implementation, Methods, Actors, and Data), provides an abstract taxonomy to represent computational experiments and enforce reproducibility by design. In this paper, we assess ... More
A Three-Level Parallelisation Scheme and Application to the Nelder-Mead AlgorithmApr 10 2019We consider a three-level parallelisation scheme. The second and third levels define a classical two-level parallelisation scheme and some load balancing algorithm is used to distribute tasks among processes. It is well-known that for many applications ... More
Knowledge Discovery on Blockchains: Challenges and OpportunitiesApr 10 2019We study the applicability of blockchain technology for distributed event detection under resource constraints. Therefore we provide a test-suite with several promising consensus methods (Proof-of-Work, Proof-of-Stake, Distributed Proof-of-Work, and Practical ... More
A Proposal for an Open Logistics Interconnection Reference Model for a Physical InternetApr 10 2019This paper presents a New Open Logistics Interconnection (NOLI) reference model for a Physical Internet, inspired by the Open Systems Interconnection (OSI) reference model for data networks. This NOLI model is compared to the OSI model, and to the Transmission ... More
Parallel Hardware for Faster Morphological AnalysisApr 09 2019Morphological analysis in the Arabic language is computationally intensive, has numerous forms and rules, and is intrinsically parallel. The investigation presented in this paper confirms that the effective development of parallel algorithms and the derivation ... More
The Proceedings of First Work-in-Progress Session of The CSI International Symposium on Real-Time and Embedded Systems and TechnologiesApr 09 2019The present volume contains the proceedings of RTEST WiP 2018, chaired by Marco Caccamo, University of Illinois at Urbana-Champaign. This event has been organized by the School of Electrical and Computer Engineering at the University of Tehran, in conjunction ... More
Cold Storage Data Archives: More Than Just a Bunch of TapesApr 09 2019The abundance of available sensor and derived data from large scientific experiments, such as earth observation programs, radio astronomy sky surveys, and high-energy physics already exceeds the storage hardware globally fabricated per year. To that end, ... More
Modeling Corruption in Eventually-Consistent Graph DatabasesApr 09 2019We present a model and analysis of an eventually consistent graph database where loosely cooperating servers accept concurrent updates to a partitioned, distributed graph. The model is high-fidelity and preserves design choices from contemporary graph ... More
Distributed Computation of Top-$k$ Degrees in Hidden Bipartite GraphsApr 09 2019Hidden graphs are flexible abstractions that are composed of a set of known vertices (nodes), whereas the set of edges are not known in advance. To uncover the set of edges, multiple edge probing queries must be executed by evaluating a function $f(u,v)$ ... More
Learning the undecidable from networked systemsApr 08 2019This article presents a theoretical investigation of computation beyond the Turing barrier from emergent behavior in distributed (or parallel) systems. In particular, we present an algorithmic network that is a mathematical model of a networked population ... More
New Phenomena in Large-Scale Internet TrafficApr 08 2019The Internet is transforming our society, necessitating a quantitative understanding of Internet traffic. Our team collects and curates the largest publicly available Internet traffic data containing 50 billion packets. Analysis of this streaming data ... More
Evaluating the Arm Ecosystem for High Performance ComputingApr 08 2019In recent years, Arm-based processors have arrived on the HPC scene, offering an alternative the existing status quo, which was largely dominated by x86 processors. In this paper, we evaluate the Arm ecosystem, both the hardware offering and the software ... More
Distributed Edge Connectivity in Sublinear TimeApr 08 2019We present the first sublinear-time algorithm for a distributed message-passing network sto compute its edge connectivity $\lambda$ exactly in the CONGEST model, as long as there are no parallel edges. Our algorithm takes $\tilde O(n^{1-1/353}D^{1/353}+n^{1-1/706})$ ... More
Consensus-based Distributed Discrete Optimal Transport for Decentralized Resource MatchingApr 08 2019Optimal transport has been used extensively in resource matching to promote the efficiency of resources usages by matching sources to targets. However, it requires a significant amount of computations and storage spaces for large-scale problems. In this ... More
Analysis of Commutativity with State-Chart Graph Representation of Concurrent ProgramsApr 08 2019We present a new approach to check for commutativity in concurrent programs from their state-chart graphs. A set of operations are commutative if changing the order of their execution on an object does not affect the abstract state of the object and returns ... More
A High-Performance Energy Management System based on Evolving GraphApr 08 2019As the fast growth and large integration of distributed generation, renewable energy resource, energy storage system and load response, the modern power system operation becomes much more complicated with increasing uncertainties and frequent changes. ... More
Accelerated Neural Networks on OpenCL Devices Using SYCL-DNNApr 08 2019Over the past few years machine learning has seen a renewed explosion of interest, following a number of studies showing the effectiveness of neural networks in a range of tasks which had previously been considered incredibly hard. Neural networks' effectiveness ... More
A Survey of Distributed Consensus Protocols for Blockchain NetworksApr 08 2019Since the inception of Bitcoin, cryptocurrencies and the underlying blockchain technology have attracted an increasing interest from both academia and industry. Among various core components, consensus protocol is the defining technology behind the security ... More
Smart systems, the fourth industrial revolution and new challenges in distributed computingApr 08 2019Smart systems and the smart world concept are addressed in the framework of the fourth industrial revolution. New challenges in distributed autonomous robots and computing are considered. An illustration of a new kind of smart and reconfigurable distributed ... More
A Survey on Parallel Genetic Algorithms for Shop Scheduling ProblemsApr 08 2019There have been extensive works dealing with genetic algorithms (GAs) for seeking optimal solutions of shop scheduling problems. Due to the NP hardness, the time cost is always heavy. With the development of high performance computing (HPC) in last decades, ... More
Criteria and Approaches for Virtualization on Modern FPGAsApr 08 2019Modern field programmable gate arrays (FPGAs) can produce high performance in a wide range of applications, and their computational capacity is becoming abundant in personal computers. Regardless of this fact, FPGA virtualization is an emerging research ... More
Higher-Level Hardware Synthesis of The KASUMI AlgorithmApr 07 2019Programmable Logic Devices (PLDs) continue to grow in size and currently contain several millions of gates. At the same time, research effort is going into higher-level hardware synthesis methodologies for reconfigurable computing that can exploit PLD ... More
Obtaining Progress Guarantee and GreaterConcurrency in Multi-Version Object SemanticsApr 07 2019Software Transactional Memory Systems (STMs) provides ease of multithreading to the programmer withoutworrying about concurrency issues such as deadlock, livelock, priority inversion, etc. Most of the STMs workson read-write operations known as RWSTMs. ... More
Multi-GPU Acceleration of the iPIC3D Implicit Particle-in-Cell CodeApr 07 2019iPIC3D is a widely used massively parallel Particle-in-Cell code for the simulation of space plasmas. However, its current implementation does not support execution on multiple GPUs. In this paper, we describe the porting of iPIC3D particle mover to GPUs ... More
Improving Hyperconnected Logistics with Blockchains and Smart ContractsApr 07 2019The Physical Internet and hyperconnected logistics concepts promise an open, more efficient and environmentally friendly supply chain for goods. Blockchain and Internet of Things technologies are increasingly regarded as main enablers of improvements ... More
Fast Grid Splitting Detection for N-1 Contingency Analysis by Graph ComputingApr 07 2019In this study, a graph-computing based grid splitting detection algorithm is proposed for contingency analysis in a graph-based EMS (Energy Management System). The graph model of a power system is established by storing its bus-branch information into ... More
An Asynchronous, Decentralized Solution Framework for the Large Scale Unit Commitment ProblemApr 07 2019With increased reliance on cyber infrastructure, large scale power networks face new challenges owing to computational scalability. In this paper we focus on developing an asynchronous decentralized solution framework for the Unit Commitment(UC) problem ... More
An Asynchronous, Decentralized Solution Framework for the Large Scale Unit Commitment ProblemApr 07 2019Apr 12 2019With increased reliance on cyber infrastructure, large scale power networks face new challenges owing to computational scalability. In this paper we focus on developing an asynchronous decentralized solution framework for the Unit Commitment(UC) problem ... More
Load-Balanced Sparse MTTKRP on GPUsApr 06 2019Sparse matricized tensor times Khatri-Rao product (MTTKRP) is one of the most computationally expensive kernels in sparse tensor computations. This work focuses on optimizing the MTTKRP operation on GPUs, addressing both performance and storage requirements. ... More
Measuring scheduling efficiency of RNNs for NLP applicationsApr 05 2019Recurrent neural networks (RNNs) have shown state of the art results for speech recognition, natural language processing, image captioning and video summarizing applications. Many of these applications run on low-power platforms, so their energy efficiency ... More
Optimal Communication Rates for Zero-Error Distributed Simulation under Blackboard Communication ProtocolsApr 05 2019We study the distributed simulation problem where $n$ users aim to generate \emph{same} sequences of random coin flips. Some subsets of the users share an independent common coin which can be tossed multiple times, and there is a publicly seen blackboard ... More
Transfer Learning for Performance Modeling of Deep Neural Network SystemsApr 04 2019Modern deep neural network (DNN) systems are highly configurable with large a number of options that significantly affect their non-functional behavior, for example inference time and energy consumption. Performance models allow to understand and predict ... More
Metabolomics in the Cloud: Scaling Computational Tools to Big DataApr 04 2019Background: Metabolomics datasets are becoming increasingly large and complex, with multiple types of algorithms and workflows needed to process and analyse the data. A cloud infrastructure with portable software tools can provide much needed resources ... More
Metabolomics in the Cloud: Scaling Computational Tools to Big DataApr 04 2019Apr 09 2019Background: Metabolomics datasets are becoming increasingly large and complex, with multiple types of algorithms and workflows needed to process and analyse the data. A cloud infrastructure with portable software tools can provide much needed resources ... More
GraphCage: Cache Aware Graph Processing on GPUsApr 03 2019Efficient Graph processing is challenging because of the irregularity of graph algorithms. Using GPUs to accelerate irregular graph algorithms is even more difficult to be efficient, since GPU's highly structured SIMT architecture is not a natural fit ... More
Recoverable Mutual Exclusion with Sub-logarithmic RMR Complexity on CC and DSM machinesApr 03 2019In light of recent advances in non-volatile main memory technology, Golab and Ramaraju reformulated the traditional mutex problem into the novel {\em Recoverable Mutual Exclusion} (RME) problem. In the best known solution for RME, due to Golab and Hendler ... More
Stratum: A Serverless Framework for Lifecycle Management of Machine Learning based Data Analytics TasksApr 03 2019With the proliferation of machine learning (ML) libraries and frameworks, and the programming languages that they use, along with operations of data loading, transformation, preparation and mining, ML model development is becoming a daunting task. Furthermore, ... More
DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic AnalysisApr 02 2019Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these ... More
Nested Dithered Quantization for Communication Reduction in Distributed TrainingApr 02 2019In distributed training, the communication cost due to the transmission of gradients or the parameters of the deep model is a major bottleneck in scaling up the number of processing nodes. To address this issue, we propose \emph{dithered quantization} ... More
BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction ServicesApr 02 2019Apr 11 2019Pre-trained deep learning models are increasingly being used to offer a variety of compute-intensive predictive analytics services such as fitness tracking, speech and image recognition. The stateless and highly parallelizable nature of deep learning ... More
BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction ServicesApr 02 2019Pre-trained deep learning models are increasingly being used to offer a variety of compute-intensive predictive analytics services such as fitness tracking, speech and image recognition. The stateless and highly parallelizable nature of deep learning ... More
Customer churn prediction in telecom using machine learning and social network analysis in big data platformApr 01 2019Customer churn is a major problem and one of the most important concerns for large companies. Due to the direct effect on the revenues of the companies, especially in the telecom field, companies are seeking to develop means to predict potential customer ... More
Multigrid Solvers in Reconfigurable HardwareApr 01 2019The problem of finding the solution of Partial Differential Equations (PDEs) plays a central role in modeling real world problems. Over the past years, Multigrid solvers have showed their robustness over other techniques, due to its high convergence rate ... More
MESH: A Flexible Distributed Hypergraph Processing SystemApr 01 2019With the rapid growth of large online social networks, the ability to analyze large-scale social structure and behavior has become critically important, and this has led to the development of several scalable graph processing systems. In reality, however, ... More
A Comparative Study of Asynchronous Many-Tasking Runtimes: Cilk, Charm++, ParalleX and AM++Apr 01 2019We evaluate and compare four contemporary and emerging runtimes for high-performance computing(HPC) applications: Cilk, Charm++, ParalleX and AM++. We compare along three bases: programming model, execution model and the implementation on an underlying ... More
Achieving Greater Concurrency in Execution of Smart Contracts using Object SemanticsMar 31 2019Popular blockchain such as Ethereum and several others execute complex transactions in blocks through user defined scripts known as smart contracts. Normally, a block of the chain consists of multiple transactions of smart contracts which are added by ... More
An Analysis Framework for Hardware and Software Implementations with Applications from CryptographyMar 30 2019With the richness of present-day hardware architectures, tightening the synergy between hardware and software has attracted a great attention. The interest in unified approaches paved the way for newborn frameworks that target hardware and software co-design. ... More
A "poor man's" approach for high-resolution three-dimensional topology optimization of natural convection problemsMar 30 2019This paper treats topology optimization of natural convection problems. A simplified model is suggested to describe the flow of an incompressible fluid in steady state conditions, similar to Darcy's law for fluid flow in porous media. The equations for ... More
Graph Computing based Fast Screening in Contingency AnalysisMar 29 2019During last decades, contingency analysis has been facing challenges from significant load demand increase and high penetrations of intermittent renewable energy, fluctuant responsive loads and non-linear power electronic interfaces. It requires an advanced ... More
SysML: The New Frontier of Machine Learning SystemsMar 29 2019Machine learning (ML) techniques are enjoying rapidly increasing adoption. However, designing and implementing the systems that support ML models in real-world deployments remains a significant obstacle, in large part due to the radically different development ... More
Parallelizable global conformal parameterization of simply-connected surfaces via partial weldingMar 29 2019Conformal surface parameterization is useful in graphics, imaging and visualization, with applications to texture mapping, atlas construction, registration, remeshing and so on. With the increasing capability in scanning and storing data, dense 3D surface ... More