Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2016
Highly scalable near memory processing with migrating threads on the emu system architecture
- Timothy Dysart,
- Peter Kogge,
- Martin Deneroff,
- Eric Bovell,
- Preston Briggs,
- Jay Brockman,
- Kenneth Jacobsen,
- Yujen Juan,
- Shannon Kuntz,
- Richard Lethin,
- Janice McMahon,
- Chandra Pawar,
- Martin Perrigo,
- Sarah Rucker,
- John Ruttenberg,
- Max Ruttenberg,
- Steve Stein
IA^3 '16: Proceedings of the Sixth Workshop on Irregular Applications: Architectures and AlgorithmsPages 2–9There is growing evidence that current architectures do not well handle cache-unfriendly applications such as sparse math operations, data analytics, and graph algorithms. This is due, in part, to the irregular memory access patterns demonstrated by ...
- ArticleJuly 2015
Latency-tolerant software distributed shared memory
USENIX ATC '15: Proceedings of the 2015 USENIX Conference on Usenix Annual Technical ConferencePages 291–305We present Grappa, a modern take on software distributed shared memory (DSM) for in-memory data-intensive applications. Grappa enables users to program a cluster as if it were a single, large, non-uniform memory access (NUMA) machine. Performance scales ...
- research-articleOctober 2014
Alembic: automatic locality extraction via migration
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & ApplicationsPages 879–894https://doi.org/10.1145/2660193.2660194Partitioned Global Address Space (PGAS) environments simplify writing parallel code for clusters because they make data movement implicit - dereferencing global pointers automatically moves data around. However, it does not free the programmer from ...
Also Published in:
ACM SIGPLAN Notices: Volume 49 Issue 10 - ArticleMay 2012
Resilience to Various Failures for Read-mostly In-memory Data Structures
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumPages 1572–1580https://doi.org/10.1109/IPDPSW.2012.198As massively parallel processing (MPP) machines and their associated applications become larger, more work on resiliency is needed if those applications are to have a chance of running for significant lengths of time in the face of the expected ...
- ArticleMay 2011
Crunching large graphs with commodity processors
- Jacob Nelson,
- Brandon Myers,
- A. H. Hunter,
- Preston Briggs,
- Luis Ceze,
- Carl Ebeling,
- Dan Grossman,
- Simon Kahan,
- Mark Oskin
Crunching large graphs is the basis of many emerging applications, such as social network analysis and bioinformatics. Graph analytics algorithms exhibit little locality and therefore present significant performance challenges. Hardware multithreading ...
-
- articleApril 2004
Coloring heuristics for register allocation
We describe an improvement to a heuristic introduced by Chaitin for use in graph coloring register allocation. Our modified heuristic produces better colorings, with less spill code. It has similar compile-time and implementation requirements. We ...
- ArticleNovember 2003
Early Experience with Scientific Programs on the Cray MTA-2
- Wendell Anderson,
- Preston Briggs,
- C. Stephen Hellberg,
- Daryl W. Hess,
- Alexei Khokhlov,
- Marco Lanzagorta,
- Robert Rosenberg
SC '03: Proceedings of the 2003 ACM/IEEE conference on SupercomputingPage 46https://doi.org/10.1145/1048935.1050196We describe our experiences porting and tuning three scientific programs to the Cray MTA-2, paying particular attention to the problems posed by I/O. We have measured the performance of each of the programs over many different machine configurations and ...
- research-articleJanuary 2003
Register allocation
Encyclopedia of Computer ScienceJanuary 2003, Pages 1516–1518A register is one of a small number of high-speed memory locations in a computer's central processing unit (q.v.). Registers differ from ordinary memory locations in several respects:
- ArticleNovember 1997
Tera hardware-software cooperation
SC '97: Proceedings of the 1997 ACM/IEEE conference on SupercomputingPages 1–16https://doi.org/10.1145/509593.509631The development of Tera's MTA system was unusual. It respected the need for fast hardware and large shared memory, facilitating execution of the most demanding parallel application programs. But at the same time, it met the need for a clean machine ...
- articleJune 1997
- articleJanuary 1996
- ArticleJune 1994
Effective partial redundancy elimination
PLDI '94: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementationPages 159–170https://doi.org/10.1145/178243.178257Partial redundancy elimination is a code optimization with a long history of literature and implementation. In practice, its effectiveness depends on issues of naming and code shape. This paper shows that a combination of global reassociation and global ...
Also Published in:
ACM SIGPLAN Notices: Volume 29 Issue 6 - articleMay 1994
Improvements to graph coloring register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 16, Issue 3Pages 428–455https://doi.org/10.1145/177492.177575We describe two improvements to Chaitin-style graph coloring register allocators. The first, optimistic coloring, uses a stronger heuristic to find a k-coloring for the interference graph. The second extends Chaitin's treatment of rematerialization to ...
- articleMarch 1993
An efficient representation for sparse sets
ACM Letters on Programming Languages and Systems (LOPLAS), Volume 2, Issue 1-4Pages 59–69https://doi.org/10.1145/176454.176484Sets are a fundamental abstraction widely used in programming. Many representations are possible, each offering different advantages. We describe a representation that supports constant-time implementations of clear-set, add-member, and delete-member. ...
- ArticleJuly 1992
Rematerialization
PLDI '92: Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementationPages 311–321https://doi.org/10.1145/143095.143143This paper examines a problem that arises during global register allocation – rematerialization. If a value cannot be kept in a register, the allocator should recognize when it is cheaper to recompute the value (rematerialize it) than to store and ...
Also Published in:
ACM SIGPLAN Notices: Volume 27 Issue 7