No abstract available.
Cost-Optimal Execution of Boolean Query Trees with Shared Streams
The processing of queries expressed as trees of boolean operators applied to predicates on sensor data streams has several applications in mobile computing. Sensor data must be retrieved from the sensors, which incurs a cost, e.g., an energy expense ...
It's About Time: On Optimal Virtual Network Embeddings under Temporal Flexibilities
Distributed applications often require high-performance networks with strict connectivity guarantees. For instance, many cloud applications suffer from today's variations of the intra-cloud bandwidth, which leads to poor and unpredictable application ...
Exploiting Geometric Partitioning in Task Mapping for Parallel Computers
- Mehmet Deveci,
- Sivasankaran Rajamanickam,
- Vitus J. Leung,
- Kevin Pedretti,
- Stephen L. Olivier,
- David P. Bunde,
- Umit V. Çatalyürek,
- Karen Devine
We present a new method for mapping applications' MPI tasks to cores of a parallel computer such that communication and execution time are reduced. We consider the case of sparse node allocation within a parallel machine, where the nodes assigned to a ...
Communication-Efficient Distributed Variance Monitoring and Outlier Detection for Multivariate Time Series
Modern scale-out services are comprised of thousands of individual machines, which must be continuously monitored for unexpected failures. One recent approach to monitoring is latent fault detection, an adaptive statistical framework for scale-out, load-...
MobiStreams: A Reliable Distributed Stream Processing System for Mobile Devices
Multi-core phones are now pervasive. Yet, existing applications rely predominantly on a client-server computing paradigm, using phones only as thin clients, sending sensed information via the cellular network to servers for processing. This makes the ...
MapReuse: Reusing Computation in an In-Memory MapReduce System
MapReduce programming model is being increasingly adopted for data intensive high performance computing. Recently, it has been observed that in data-intensive environment, programs are often run multiple times with either identical or slightly-changed ...
PAGE: A Framework for Easy PArallelization of GEnomic Applications
With the availability of high-throughput and low-cost sequencing technologies, an increasing amount of genetic data is becoming available to researchers. There is clearly a potential for significant new scientific and medical advances by analysis of ...
Pythia: Faster Big Data in Motion through Predictive Software-Defined Network Optimization at Runtime
The rise of Internet of Things sensors, social networking and mobile devices has led to an explosion of available data. Gaining insights into this data has led to the area of Big Data analytics. The MapReduce framework, as implemented in Hadoop, is one ...
A Case for a Flexible Scalar Unit in SIMT Architecture
The wide availability and the Single-Instruction Multiple-Thread (SIMT)-style programming model have made graphics processing units (GPUs) a promising choice for high performance computing. However, because of the SIMT style processing, an instruction ...
Scalar Waving: Improving the Efficiency of SIMD Execution on GPUs
GPUs take advantage of uniformity in program control flow and utilize SIMD execution to obtain execution efficiency. In SIMD execution, threads are batched into SIMD groups to share a common program counter and execute identical instructions on SIMD ...
Power and Performance Characterization and Modeling of GPU-Accelerated Systems
Graphics processing units (GPUs) provide an order-of-magnitude improvement on peak performance and performance-per-watt as compared to traditional multicore CPUs. However, GPU-accelerated systems currently lack a generalized method of power and ...
Energy Efficient HPC on Embedded SoCs: Optimization Techniques for Mali GPU
A lot of effort from academia and industry has been invested in exploring the suitability of low-power embedded technologies for HPC. Although state-of-the-art embedded systems-on-chip (SoCs) inherently contain GPUs that could be used for HPC, their ...
Bursting the Cloud Data Bubble: Towards Transparent Storage Elasticity in IaaS Clouds
Storage elasticity on IaaS clouds is an important feature for data-intensive workloads: storage requirements can vary greatly during application runtime, making worst-case over-provisioning a poor choice that leads to unnecessarily tied-up storage and ...
Scibox: Online Sharing of Scientific Data via the Cloud
Collaborative science demands global sharing of scientific data. But it cannot leverage universally accessible cloud-based infrastructures like Drop Box, as those offer limited interfaces and inadequate levels of access bandwidth. We present the Scibox ...
CALCioM: Mitigating I/O Interference in HPC Systems through Cross-Application Coordination
Unmatched computation and storage performance in new HPC systems have led to a plethora of I/O optimizations ranging from application-side collective I/O to network and disk-level request scheduling on the file system side. As we deal with ever larger ...
Active Measurement of the Impact of Network Switch Utilization on Application Performance
Inter-node networks are a key capability of High-Performance Computing (HPC) systems that differentiates them from less capable classes of machines. However, in spite of their very high performance, the increasing computational power of HPC compute ...
Multi-resource Real-Time Reader/Writer Locks for Multiprocessors
A fine-grained locking protocol permits multiple locks to be held simultaneously by the same task. In the case of real-time multiprocessor systems, prior work on such protocols has considered only mutex constraints. This unacceptably limits concurrency ...
Remote Invalidation: Optimizing the Critical Path of Memory Transactions
Software Transactional Memory (STM) systems are increasingly emerging as a promising alternative to traditional locking algorithms for implementing generic concurrent applications. To achieve generality, STM systems incur overheads to the normal ...
Revisiting Asynchronous Linear Solvers: Provable Convergence Rate through Randomization
Asynchronous methods for solving systems of linear equations have been researched since Chazan and Mir Anker's pioneering 1969 paper. The underlying idea of asynchronous methods is to avoid processor idle time by allowing the processors to continue to ...
Accelerating MPI Collective Communications through Hierarchical Algorithms Without Sacrificing Inter-Node Communication Flexibility
This paper presents and evaluates a universal algorithm to improve the performance of MPI collective communication operations on hierarchical clusters with many-core nodes. This algorithm exploits shared-memory buffers for efficient intra-node ...