Huge idea: what if tensors are the next-generation...
# thinking-together
n
Huge idea: what if tensors are the next-generation replacement for RAM? Classic RAM (I'm talking about the software abstraction, not the physical hardware) is just a vector with 2^64 cells, most of which are zero and not backed by physical memory. This is commonly known as a sparse vector. The current AI boom has made it obvious that higher-dimensional memory chunks, known as tensors, are an important idea, especially sparse ones. Other than being higher-dimensional, key differences between tensors and RAM include: • An AI app will typically work with multiple tensors, but a classical app will only work with one RAM. (Though Wasm can have multiple RAMs, known as "linear memories", and of course, you can pretend to have multiple memories using abstractions like malloc). • Tensors can be subjected to unary operations such as slicing, permuting, and aggregation (min, max, sum, product), that generalize the boring read and write operations on RAM. • Tensors can be subjected to binary operations such as multiplication/contraction (generalizing matrix multiplication), convolution, and element-wise addition. The data of everyday programs is often very heterogeneous, which corresponds to having lots of sparse tensors. Sparse tensors need good support in software and ideally in hardware. Thankfully, there is AI hardware being developed that is designed to operate on sparse tensors, by way of dedicated circuits that can compress and decompress them. Tenstorrent is probably the leader here. Here's a fun fact: multiplication of sparse Boolean tensors is equivalent to a database equi-join. So if you think databases are important, then maybe you should give tensors some thought. And relatedly: operations on tensors are typically massively-parallelizable, thus could be a good foundation for a high-performance programming language that compiles to AI hardware.
❤️ 1
s
And relatedly: operations on tensors are typically massively-parallelizable, thus could be a good foundation for a high-performance programming language that compiles to AI hardware.
You hooked me there
l
So we already store tensors in RAM and perform various operations on them. You said software not hardware, so is the difference here that the abstraction between the 1D (flattened) data and its higher dimensional form is provided at a lower level in the software?
👍 1
n
I mean the assembly language of the hardware should be phrased in terms of operations on tensors. 🙂 The programmer should not be concerned with whether the tensor is ultimately flattened into a linear array of SRAM or DRAM cells. (In AI hardware, they definitely won’t be.)
👍 1
My goal with this post is just to get people thinking a little differently about the memory model upon which a programming language is built. Tensor-based memory models are about to become mainstream (next 5 years) thanks to the AI boom. Could lead to some exciting new paradigms of programming.
Here’s a challenge for everyone: when you visualise the act of “allocating memory”, what do you see? If you see a big linear chunk that you can address with a pointer, then maybe you’re trapped in 1-dimensional thinking. I certainly was/am.
d
How's your memory mostly zeros? If it is, you should use smaller machines. I think the idea that RAM is similar to a sparse vector is often not right. At least not if you use Chrome for browsing.
n
I’m referring to virtual memory (i.e. what apps see). I guarantee you that your 64-bits of virtual memory are mostly zeroes! And with a paging system, you can write across vast swathes of the memory whilst only consuming physical resources for the pages you actually touch. That’s what I mean when I describe linear memory as a sparse vector.
Now imagine what it would be like to have a memory model where your memory is sparse at the byte-level, and is multidimensional, and you can have multiple memories and perform massively-parallel operations (such as aggregations) over them. This is what sparse tensors are.
k
N-dimensional arrays as a fundamental data representation? That's an idea that has been around since the days of Fortran and APL. The 1960s. Efficient parallelization has been investigated as well, with today's Fortran containing very good support, though it's less automatic / miraculous than people tend to expect.
👍 1
BTW, I avoid calling N-dimensional arrays tensors because a tensor for me is a algebraic and geometric object, not a data structure: https://en.wikipedia.org/wiki/Tensor
n
Does Fortran handle large and high-dimensional (10000x10000x10000x...) sparse tensors, though? That's the main enabler of a lot of interesting applications. Tenstorrent handles sparse tensors completely in hardware; as a programmer you work with them as if they were dense. For context: if you multiply a pair of 99% sparse tensors using a dense multiplication algorithm, you’re doing 10000x more work than you need to (repeatedly multiplying by 0). In general, the asymptotic complexity is different.
I'm aware of the more "mathematical" definition of tensor. But I believe the difference is just that the array representation is what you get once you've chosen a basis. You can also talk about tensors without reference to any basis.
k
Fortran doesn't support sparse arrays as a language feature, but library support has been around for decades, getting better all the time. As for the tensor, yes, once you pick a basis, you get an array representation. But the whole point of tensor algebra and tensor analysis is that the tensor has a meaning (and properties) independent of the choice of a basis.
👍 1
a
Have you heard about https://chapel-lang.org/ ? It natively supports not only sparse but distributed N-dimentional arrays. Still take a look at the problems and the means to make it efficient.
Besides, "tensor operations" are only good for numerical computations. I don't know how much numerical computation you develop, but I develop none. Haven't come across a single one in any of Web-dev projects I was involved. 🤷‍♀️
n
Thanks I’ll check out what Charity does. But you’re not correct with the claim that tensor operations are only for numerical computation. I gave a very important example in my original post: contraction of Bool-valued tensors is precisely an equi-join between two database tables. I guarantee you’ve come across databases in your web dev projects 😇
This isn’t merely a curiosity. Given hardware like Tenstorrent’s, we might now have the opportunity to radically rethink high-performance databases, including in-memory DBs (i.e. the memory model of arbitrary programs).
a
Yeah and my DBs were full of Strings, DateTimes and Foreign Keys -- good luck putting all of that into "tensors" and performing parallel operations on them! 😁
In reality your parallelism stops as soon as you encounter a fold with non-associative operations. And pretty much all operations performing side effects are non-associative. That's basically leaves you with numeric (plus boolean) computations.
n
Mate, foreign keys aren’t a problem, they’re the very thing you equi-join on. Have a play around with the idea in the 2D case (Boolean matrices); you should be able to figure out how it works. Stuff like strings aren’t a problem either, they’re just a bunch of bytes along one dimension of the tensor.
a
I know how it works. I know how it performs. I know how GPUs and "tensor processors" are implemented and what they are capable of. Do you? 🙂
n
I came to share some exciting ideas with the community; I’m not interested in having a pissing contest. Clearly you’ve come to this thread with an ulterior motive. I think we can leave it there.
a
@Nick Smith you clearly know my motives and intentions much better than I do so I'll take your word for it! 🙂
v
Very interesting - I’ve been looking at XLA and JAX as a means of abstraction over such hardware. I also think a number of problems can be decomposed into “tensor friendly” representations - lots of interesting work going on in this area.
👍 2
k
@Alex Chichigin ""tensor operations" are only good for numerical computations." No, that's merely what they have been used for most commonly. Recommended reading for this thread: "The memory models that underlie programming languages" (http://canonical.org/~kragen/memory-models/). The third model discussed in that overview is "parallel arrays", and it's very similar to what @Nick Smith has proposed when starting this thread. It can be combined with ideas from "nested records" (the first model), which is what NumPy's "structured arrays" are. And there is also overlap with "relations", the last model in the overview.
❤️ 3
a
@Konrad Hinsen yeah thanks for "The memory models that underlie programming languages", that's a good one as far as I remember. 🙂 I agree that as a model "tensors" (or multi-dimensional arrays, parallel or not, though J language was calling them "tensors" some decades before ML frameworks did 😁) can accommodate pretty much everything, including certain notion of "objects", "relations" and "tables" as evidenced by APL, K/Q, J again or even R, NumPy and Pandas. But @Nick Smith was talking about (sparse) "tensors" as a low-level model of actual hardware memory of "tensor processors", and in reality this processors can efficiently handle only numerical (and packed boolean, yes) computations. As soon as we throw pointers (or indexes into other tensors) in the mix and start chasing them the "model" breaks and efficiency plummets. Thus I claim "tensor operations are only good for numerical computations". 😁
n
On these new distributed memory tensor processors, you don't usually "point" at things (there is no global memory / address space), you usually join things (in the form of tensor contraction). These chips are specifically built to do insanely fast contractions. You won't be running Java code on these chips (the epitome of pointer-chasing), but that's fine, that's not the promise. Operations on databases are mostly equi-joins and aggregations, both of which are basic operations in the Numpy/Pytorch API (which Tenstorrent is essentially using as the initial "instruction set" for their hardware). Fancier kinds of joins (i.e. on predicates) are less-obviously translated to tensor operations. It would be fun to explore how to re-implement them efficiently in terms of lower-level ops (cross-join + filter is always an option, but probably not the smartest one). A good ISA would have the right primitives available.
Thanks @Konrad Hinsen. That article looks really interesting! I'll have a dive in.
k
This thread is definitely made for "thinking together" ❤️