PPoPP 2015- Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Full Citation in the ACM Digital Library
SESSION: Concurrency
More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms
Vincent Gramoli
The SprayList: a scalable relaxed priority queue
Dan Alistarh
Justin Kopinsky
Jerry Li
Nir Shavit
Predicate RCU: an RCU for scalable concurrent updates
Maya Arbel
Adam Morrison
Automatic scalable atomicity via semantic locking
Guy Golan-Gueta
G. Ramalingam
Mooly Sagiv
Eran Yahav
SESSION: Code Generation
A framework for practical parallel fast matrix multiplication
Austin R. Benson
Grey Ballard
PLUTO+: near-complete modeling of affine transformations for parallelism and locality
Aravind Acharya
Uday Bondhugula
Distributed memory code generation for mixed Irregular/Regular computations
Mahesh Ravishankar
Roshan Dathathri
Venmugil Elango
Louis-Noël Pouchet
J. Ramanujam
Atanas Rountev
P. Sadayappan
SESSION: Transactional Memory
Software partitioning of hardware transactions
Lingxiang Xiang
Michael L. Scott
Performance implications of dynamic memory allocators on transactional memory systems
Alexandro Baldassin
Edson Borin
Guido Araujo
Low-overhead software transactional memory with progress guarantees and strong semantics
Minjia Zhang
Jipeng Huang
Man Cao
Michael D. Bond
SESSION: Large Scale Parallelism
Barrier elision for production parallel programs
Milind Chabbi
Wim Lavrijsen
Wibe de Jong
Koushik Sen
John Mellor-Crummey
Costin Iancu
Scalable and efficient implementation of 3d unstructured meshes computation: a case study on matrix assembly
Loïc Thébault
Eric Petit
Quang Dinh
Diagnosing the causes and severity of one-sided message contention
Nathan R. Tallent
Abhinav Vishnu
Hubertus Van Dam
Jeff Daily
Darren J. Kerbyson
Adolfy Hoisie
SESSION: Verification and Accelerators
A parallel algorithm for global states enumeration in concurrent systems
Yen-Jung Chang
Vijay K. Garg
Dynamic deadlock verification for general barrier synchronisation
Tiago Cogumbreiro
Raymond Hu
Francisco Martins
Nobuko Yoshida
VirtCL: a framework for OpenCL device abstraction and management
Yi-Ping You
Hen-Jung Wu
Yeh-Ning Tsai
Yen-Ting Chao
On optimizing machine learning workloads via kernel fusion
Arash Ashari
Shirish Tatikonda
Matthias Boehm
Berthold Reinwald
Keith Campbell
John Keenleyside
P. Sadayappan
SESSION: Algorithms
NUMA-aware graph-structured analytics
Kaiyuan Zhang
Rong Chen
Haibo Chen
SYNC or ASYNC: time to fuse for distributed graph-parallel computation
Chenning Xie
Rong Chen
Haibing Guan
Binyu Zang
Haibo Chen
Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency
Yuan Tang
Ronghui You
Haibin Kan
Jesmin Jahan Tithi
Pramod Ganapathi
Rezaul A. Chowdhury
SESSION: Locking and Locality
High performance locks for multi-level NUMA systems
Milind Chabbi
Michael Fagan
John Mellor-Crummey
A library for portable and composable data locality optimizations for NUMA systems
Zoltan Majo
Thomas R. Gross
MPI+Threads: runtime contention and remedies
Abdelhalim Amer
Huiwei Lu
Yanjie Wei
Pavan Balaji
Satoshi Matsuoka
SESSION: Poster Abstracts
Fence placement for legacy data-race-free programs via synchronization read detection
Andrew J. McPherson
Vijay Nagarajan
Susmit Sarkar
Marcelo Cintra
JAWS: a JavaScript framework for adaptive CPU-GPU work sharing
Xianglan Piao
Channoh Kim
Younghwan Oh
Huiying Li
Jincheon Kim
Hanjun Kim
Jae W. Lee
GStream: a graph streaming processing method for large-scale graphs on GPUs
Hyunseok Seo
Jinwook Kim
Min-Soo Kim
SemCache++: semantics-aware caching for efficient multi-GPU offloading
Nabeel Al-Saber
Milind Kulkarni
An OpenACC-based unified programming model for multi-accelerator systems
Jungwon Kim
Seyong Lee
Jeffrey S. Vetter
The lazy happens-before relation: better partial-order reduction for systematic concurrency testing
Paul Thomson
Alastair F. Donaldson
Towards batched linear solvers on accelerated hardware platforms
Azzam Haidar
Tingxing Dong
Piotr Luszczek
Stanimire Tomov
Jack Dongarra
A collection-oriented programming model for performance portability
Saurav Muralidharan
Michael Garland
Bryan Catanzaro
Albert Sidelnik
Mary Hall
Gunrock: a high-performance graph processing library on the GPU
Yangzihao Wang
Andrew Davidson
Yuechao Pan
Yuduo Wu
Andy Riffel
John D. Owens
Decoupled load balancing
Olga Pearce
Todd Gamblin
Bronis R. de Supinski
Martin Schulz
Nancy M. Amato
Combining phase identification and statistic modeling for automated parallel benchmark generation
Ye Jin
Mingliang Liu
Xiaosong Ma
Qing Liu
Jeremy Logan
Norbert Podhorszki
Jong Youl Choi
Scott Klasky
Optimization of asynchronous graph processing on GPU with hybrid coloring model
Xuanhua Shi
Junling Liang
Sheng Di
Bingsheng He
Hai Jin
Lu Lu
Zhixiang Wang
Xuan Luo
Jianlong Zhong
Efficient and reasonable object-oriented concurrency
Scott West
Sebastian Nanz
Bertrand Meyer
A programming model and runtime system for significance-aware energy-efficient computing
Vassilis Vassiliadis
Konstantinos Parasyris
Charalambos Chalios
Christos D. Antonopoulos
Spyros Lalis
Nikolaos Bellas
Hans Vandierendonck
Dimitrios S. Nikolopoulos
The lock-free k-LSM relaxed priority queue
Martin Wimmer
Jakob Gruber
Jesper Larsson Träff
Philippas Tsigas
Static/Dynamic validation of MPI collective communications in multi-threaded context
Emmanuelle Saillard
Patrick Carribault
Denis Barthou
CASTLE: fast concurrent internal binary search tree using edge-based locking
Arunmoezhi Ramachandran
Neeraj Mittal
Section based program analysis to reduce overhead of detecting unsynchronized thread communication
Madan Das
Gabriel Southern
Jose Renau
A hierarchical approach to reducing communication in parallel graph algorithms
Harshvardhan
Nancy M. Amato
Lawrence Rauchwerger
Tiles: a new language mechanism for heterogeneous parallelism
Yifeng Chen
Xiang Cui
Hong Mei
Are web applications ready for parallelism?
Cosmin Radoi
Stephan Herhut
Jaswanth Sreeram
Danny Dig