Kavli Affiliate: Ke Wang
| First 5 Authors: Anjiang Wei, Anjiang Wei, , ,
| Summary:
Optimizing parallel programs for distributed heterogeneous systems remains a
complex task, often requiring significant code modifications. Task-based
programming systems improve modularity by separating performance decisions from
core application logic, but their mapping interfaces are often too low-level.
In this work, we introduce Mapple, a high-level, declarative programming
interface for mapping distributed applications. Mapple provides transformation
primitives to resolve dimensionality mismatches between iteration and processor
spaces, including a key primitive, decompose, that helps minimize communication
volume. We implement Mapple on top of the Legion runtime by translating Mapple
mappers into its low-level C++ interface. Across nine applications, including
six matrix multiplication algorithms and three scientific computing workloads,
Mapple reduces mapper code size by 14X and enables performance improvements of
up to 1.34X over expert-written C++ mappers. In addition, the decompose
primitive achieves up to 1.83X improvement over existing
dimensionality-resolution heuristics. These results demonstrate that Mapple
simplifies the development of high-performance mappers for distributed
applications.
| Search Query: ArXiv Query: search_query=au:”Ke Wang”&id_list=&start=0&max_results=3