PPoPP 2015- Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

Full Citation in the ACM Digital Library

SESSION: Concurrency

More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms

The SprayList: a scalable relaxed priority queue

Predicate RCU: an RCU for scalable concurrent updates

Automatic scalable atomicity via semantic locking

SESSION: Code Generation

A framework for practical parallel fast matrix multiplication

PLUTO+: near-complete modeling of affine transformations for parallelism and locality

Distributed memory code generation for mixed Irregular/Regular computations

SESSION: Transactional Memory

Software partitioning of hardware transactions

Performance implications of dynamic memory allocators on transactional memory systems

Low-overhead software transactional memory with progress guarantees and strong semantics

SESSION: Large Scale Parallelism

Barrier elision for production parallel programs

Scalable and efficient implementation of 3d unstructured meshes computation: a case study on matrix assembly

Diagnosing the causes and severity of one-sided message contention

SESSION: Verification and Accelerators

A parallel algorithm for global states enumeration in concurrent systems

Dynamic deadlock verification for general barrier synchronisation

VirtCL: a framework for OpenCL device abstraction and management

On optimizing machine learning workloads via kernel fusion

SESSION: Algorithms

NUMA-aware graph-structured analytics

SYNC or ASYNC: time to fuse for distributed graph-parallel computation

Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency

SESSION: Locking and Locality

High performance locks for multi-level NUMA systems

A library for portable and composable data locality optimizations for NUMA systems

MPI+Threads: runtime contention and remedies

SESSION: Poster Abstracts

Fence placement for legacy data-race-free programs via synchronization read detection

JAWS: a JavaScript framework for adaptive CPU-GPU work sharing

GStream: a graph streaming processing method for large-scale graphs on GPUs

SemCache++: semantics-aware caching for efficient multi-GPU offloading

An OpenACC-based unified programming model for multi-accelerator systems

The lazy happens-before relation: better partial-order reduction for systematic concurrency testing

Towards batched linear solvers on accelerated hardware platforms

A collection-oriented programming model for performance portability

Gunrock: a high-performance graph processing library on the GPU

Decoupled load balancing

Combining phase identification and statistic modeling for automated parallel benchmark generation

Optimization of asynchronous graph processing on GPU with hybrid coloring model

Efficient and reasonable object-oriented concurrency

A programming model and runtime system for significance-aware energy-efficient computing

The lock-free k-LSM relaxed priority queue

Static/Dynamic validation of MPI collective communications in multi-threaded context

CASTLE: fast concurrent internal binary search tree using edge-based locking

Section based program analysis to reduce overhead of detecting unsynchronized thread communication

A hierarchical approach to reducing communication in parallel graph algorithms

Tiles: a new language mechanism for heterogeneous parallelism

Are web applications ready for parallelism?