I am implementing a dynamically typed, pure functional language (
github.com/curv3d/curv). It's a work in progress.
Data parallelism and concurrency are quite different.
My language is data parallel. It compiles into data parallel GPU code (GLSL shader programs) or C++. Curv is an array language with array operations that can be executed in parallel, eg using hardware vector instructions. And since all functions are pure, there is no problem mapping a function over each element of an array, and executing all of the function calls in parallel on different cores (calls to pure functions can't interfere with each other via side effects).
General purpose concurrent programming isn't a goal, although I am planning to add support for creating user interfaces via some form of Functional Reactive Programming, and UI programming normally involves concurrency.
Haskell style lazy evaluation is incompatible with my design goals, so it probably won't be added to Curv. As a result, I'm biased, and I don't consider Haskell-style lazy evaluation to be essential to pure functional programming. The real essence is denotative semantics, referential transparency, equational reasoning.
A key problem is that Haskell's model of lazy evaluation is inherently single threaded. It can't be implemented efficiently on a GPU. It is incompatible with data parallelism. There are also problems relating to ease of use, debuggability, the developers ability to understand memory consumption and performance.
Function calls are strict in Curv. Aside from this being a requirement for GPU compilation, I also think that the downsides of lazy function calls outweigh the benefits, and I don't think that normal order evaluation should be the default.
Haskell lazy lists are a special case of a more general idea of "procedural data structures". These are values that behave like data structures, but code is executed when you access data structure elements. Haskell lazy lists are inherently sequential and single threaded, and are impossible to implement on a GPU. I am more interested in exploring a general notion of procedural data that supports data parallelism and can be executed on a GPU.
@eyal mentions that explicit laziness in Lamdu is actually "call by name". This is a form of laziness that is compatible with data parallelism. There is some "call by name" semantics buried in the Curv implementation, but I so far haven't found the need to document it and expose it in a language primitive. But that might happen once the language is fully designed and implemented.