Anyone here implementing a functional language the...
# of-functional-programming
d
Anyone here implementing a functional language themselves? I'm interested in: how you do lazy evaluation and how you do any parallelisation (or is that concurrency?)
e
We're implementing an eager language (Lamdu), with explicit laziness (actually call by name) via ordinary lam-of-unit. For example, a lazy "List" type uses a lam-of-unit around the "rest of list elements". For concurrency, we compile to nodejs, and use its concurrency model behind the scenes. But we hide the callbacks behind our "IO" monad (called Mut) so that the callbacks are just the ordinary monadic bind arguments
👍 1
d
I am implementing a dynamically typed, pure functional language (github.com/curv3d/curv). It's a work in progress. Data parallelism and concurrency are quite different. My language is data parallel. It compiles into data parallel GPU code (GLSL shader programs) or C++. Curv is an array language with array operations that can be executed in parallel, eg using hardware vector instructions. And since all functions are pure, there is no problem mapping a function over each element of an array, and executing all of the function calls in parallel on different cores (calls to pure functions can't interfere with each other via side effects). General purpose concurrent programming isn't a goal, although I am planning to add support for creating user interfaces via some form of Functional Reactive Programming, and UI programming normally involves concurrency. Haskell style lazy evaluation is incompatible with my design goals, so it probably won't be added to Curv. As a result, I'm biased, and I don't consider Haskell-style lazy evaluation to be essential to pure functional programming. The real essence is denotative semantics, referential transparency, equational reasoning. A key problem is that Haskell's model of lazy evaluation is inherently single threaded. It can't be implemented efficiently on a GPU. It is incompatible with data parallelism. There are also problems relating to ease of use, debuggability, the developers ability to understand memory consumption and performance. Function calls are strict in Curv. Aside from this being a requirement for GPU compilation, I also think that the downsides of lazy function calls outweigh the benefits, and I don't think that normal order evaluation should be the default. Haskell lazy lists are a special case of a more general idea of "procedural data structures". These are values that behave like data structures, but code is executed when you access data structure elements. Haskell lazy lists are inherently sequential and single threaded, and are impossible to implement on a GPU. I am more interested in exploring a general notion of procedural data that supports data parallelism and can be executed on a GPU. @eyal mentions that explicit laziness in Lamdu is actually "call by name". This is a form of laziness that is compatible with data parallelism. There is some "call by name" semantics buried in the Curv implementation, but I so far haven't found the need to document it and expose it in a language primitive. But that might happen once the language is fully designed and implemented.
d
Well, I'm still quite new to this in a way, but as I see it, both lazy evaluation and parallel term reduction are meant to increase efficiency (don't eval stuff you don't need, don't eval the same thing twice; eval in parallel if they're independent). So putting the specifics of Haskell and Lamdu choices and implementation aside, I was wondering if it's possible to have a model of graph/DAG reduction that is both lazy and parallelisable? First thought is that you need a results lookup that can be shared amongst threads. A thread would have to "claim" a result slot even before evaluating it, to prevent two threads seeing no result and eval'ing simultaneously. Anyway, just shower thoughts, I'm not building this yet..
I guess the lookup is like a bucket of thunks
e
I think the Haskell attempts at tackling auto-parallelism resulted in failures, and they found the difficult part to be granularity of work. Cross-core talk is expensive, cache sharing and invalidation is expensive, locks/barriers are expensive -- all of this is overhead that better be paid for work that is large enough to justify it. Inferring which work is large enough to be worth sending to a different processor is not an easy problem
👍🏻 1
d
@Duncan Cragg Lazy evaluation does not increase efficiency. Haskell people often say this, but it is pure marketing. Haskell's lazy evaluation decreases efficiency, but this was an intentional tradeoff, because the goal (of the original language designers) was to increase expressive power. There's so much to say about this. I'll drop one factoid. Idris is an offshoot of Haskell that is designed to have even more expressive power than Haskell, due to its dependent type system. But Idris uses strict evaluation by default. The Idris 2 compiler was rewritten in Idris (formerly written in Haskell), and now runs 15 times faster. Lazy evaluation in the old Haskell version was a major contributor to slowness. https://www.type-driven.org.uk/edwinb/why-is-idris-2-so-much-faster-than-idris-1.html I'm not an expert on graph reduction engines, but I have to ask: how high a priority is efficiency for you, vs ease of programming, and vs expressive power?
d
Well the goal is to be transparent to the programmer, just optimisation
e
Lazy evaluation adds overhead (inefficiency), but often the efficient solution is lazy. Eager languages allow you to be lazy explicitly for the cases that's needed (e.g: for efficiency), and that's why @Doug Moen is right -- pervasive / implicit laziness isn't efficient. But it can make code written with less effort perform better (if laziness is needed, you get it for no extra work)
d
Data parallelism in Curv is explicit, because I don't know how to do otherwise and meet my performance goals. But the intention is that most Curv users will not need to deal with performance annotations: these will be hidden away behind high level interfaces, inside library code written by experts. So the language design goal is to allow non-experts to use optimized library abstractions without having to be explicitly aware of the optimization (you shouldn't need to change the way you write high level code to get a performance benefit). High level code is meant to be purely declarative.
d
C'mon, you can force strict eval with Haskell any time if you need to worry about performance! Dually, in OCaml you can decorate a function to be lazy when you want to avoid calculating values you may not need. The real answer to "how do you do laziness" is usually compiling to a smaller core language (systemF in Haskell, OCaml bytecode, and QTT in Idris2) that does the eval, otherwise you trip over corner cases.
e
You can, but then you cannot re-use a whole lot of the standard library which is pervasively lazy, and have to pay attention to every single sub-expression which can hide lazy thunks within it. I think (have no empirical data to back it up) that starting with an overly strict solution, you typically need to decorate a lot less subexpressions as lazy to reach an efficient algorithm, than vice versa. In other words, the right amount of laziness is usually within just a few subexpressions - that are expensive to compute and not always needed. So a default where all subexpressions are lazy is very difficult to start with.
Also: The "strictness analyzer" adds eagerness where it can detect that laziness won't help. That's useful, but it makes things even harder - innocent changes can break the optimization.