I was reading Fred Kjolstad s thesis <http fredrikbk com pub Future of Coding #thinking-together

I was reading Fred Kjolstad’s thesis (<http://fred...

Will

10/01/2020, 12:04 AM

I was reading Fred Kjolstad’s thesis (http://fredrikbk.com/publications/kjolstad-thesis.pdf) about compilers for sparse linear algebra. He says:

The issue with libraries of hand-optimized functions boils down to our inability to easily build composable building blocks that perform well. Current sparse linear algebra libraries do not let us compose expressions, data structures, optimization strategies, and architectures without sacrificing performance. The first performance loss is from lost temporal locality—a deficiency that is also present with dense operations. Second, sparse operations may operate on many different data structures, which are each designed to work well on one type of sparsity pattern. If two composed functions do not support the same data structure, then it becomes necessary to perform an expensive conversion between irregular data structures. But the most serious issue is that composing two sparse linear or tensor algebra functions may perform asymptotically worse than a fused function written to compute the entire expression at once.

This made me think more generally about the composability vs. performance trade-off. Being able to compose black box abstractions at a high level is at the very foundation of software engineering, enabling programmers to eliminate boilerplate and more easily use others’ work. Yet composition is the enemy of performance: a careful implementation that fuses two operations is often more efficient than a sequenced invocation of the two. However, most compilers today only offer

#[inline]

pragmas or other extremely shallow means of reducing abstraction/composition costs. Even the most advanced C++ template magic can’t do the necessary code-reordering to achieve the optimal composition that Fred describes. Several programming systems have good ideas in this direction: • Zero-cost abstractions in programming languages (eg Rust https://blog.rust-lang.org/2015/05/11/traits.html) • Separating algorithm from schedule (eg Halide http://halide-lang.org/) • Using higher-order functions to express data parallelism (eg Spark https://spark.apache.org/) Curious to hear others’ thoughts (how will we manage this trade-off in future langs/compilers?) and pointers to relevant work.

➕ 4

Doug Moen

10/01/2020, 2:00 AM

Taichi addresses this problem, for sparse arrays on a GPU. http://taichi.graphics/wp-content/uploads/2019/09/taichi_lang.pdf https://github.com/yuanming-hu/taichi

wtaysom

10/05/2020, 9:18 AM

Partial evaluation to remove abstraction overhead?

3 Views

Open in Slack

Previous Next