Has anyone thought about, implemented, or encounte...
# thinking-together
n
Has anyone thought about, implemented, or encountered higher-level abstractions of ALUs? a.k.a. the part of hardware where actual computations are performed (as opposed to the miles of hardware dedicated to control flow management). Almost every programming language has an ALU abstraction based upon fixed-width chunks of binary digits (32 or 64-wide), arithmetic operations (interpreting the chunks as integers or IEEE floats), and bitwise and bitshift operations. Those fixed-width chunks are grouped into "allocations", which are either larger fixed-width containers (structs etc) or dynamically-sized arrays. Recently I've been thinking about a "clean slate" abstraction that still exposes the basic operations that ALUs are good at (integer arithmetic and bit manipulations) but without the fixed-width chunk limitations. Fixed-width chunks are purely a hardware design limitation and have no inherent value to a programmer's mental model; they just add complexity to data modelling. What DOES have value is the notion of a dynamically-sized bit sequence that can be manipulated via splicing operations (take, drop, insert, replace) that generalize bit shifts, bitwise operations (the same old &,|,^ operations), and the familiar arithmetic operations (add, sub, mul, div...). This is a natural foundation for arbitrary-size integers and sequences, but also for general computations that want an efficient mapping to hardware capabilities. I want to take an ALU abstraction like this and build my way-out-there logic programming environment on top of it, so that you still have a conceptual bridge to hardware, and thus you can still reason about the efficiency of basic operations and use them to create efficient user-defined data types.
❤️ 3
👍 1
And yes, arbitrary-sized bit sequences have overhead, because you have to perform length checks before every operation. But I'm not too worried about this constant-factor overhead. I'm not competing with C. Also, many of those checks should be able to be removed with the help of some static analysis. I want to make this a language implementation problem, not a user's problem (and my users aren't compute-bound).
e
In Beads i have a variable length bit string type (bits), and a variable length byte type (bytes), which are very useful for packing binary things, or for doing byte manipulation, both common low-level operations. Some CPU's such as the Motorola 68000 had variable length bit and byte strings. Intel has kinda sorta byte manipulation with the REP MOVSB instruction, but the Motorola had extremely handy arbitrary bit string stuff. As for arithmetic, IEEE floating point is downright stupid, causes all sorts of problems. Some propose using DEC64 a superior methodology, but some languages support BCD with specified decimal digits which can be helpful in financial applications. No question that thinking about 32 vs 64 is mostly a waste of time unless you have an ungodly amount of data to process, which is why AS3, JS, and many other languages have a single numeric type based on 64 bit floating point.
n
Yes I'm planning to see how I go banning IEEE floats and instead exposing an opaque rational number type in my environment. As far as bit manipulation hardware goes, Intel's parallel bit deposit and extract for x86 seems really cool, but unfortunately isn't efficient on an AMD Zen, since they've inexplicably implemented it in microcode rather than as a native capability. I'm also saddened by the absence of a bit reversal instruction in x86... it seems to exist on every other major hardware platform!
r
A few thoughts from my instruction set design. 1. My uCISC instructions have an increment flag in them (see https://github.com/grokthis/ucisc/blob/master/docs/07_ALU.md#arguments). This allows you to chain arbitrarily long ALU functions back to back in increments of 16-bits since that is my word size. This works for addition, shifts and similar operations, for example. The increment points it to the next address and the carry flags tell the ALU how to adjust the next op. 2. You can generalize this by adding a repeater flag to repeat the same operation N times (see the repetition factor here https://github.com/grokthis/ucisc/blob/master/docs/05_Instruction_Behaviors.md#flags-register) 3. I banned floating point math from my ALU. You can always attach custom hardware to speed up these cases if needed. The problem is that these operations tend to be highly bit width dependent and also orders of magnitude slower in software. 4. However, repetition does NOT work for ALU operations where the first bit and the last bit affect each other. For example, in multiplication, each bit is effectively multiplied against every other bit. So, for arbitrarily sized numbers you'll need to make multiple passes. Using something like the Karatsuba algorithm (https://en.wikipedia.org/wiki/Karatsuba_algorithm) you could arbitrarily decompose larger operations into multiple sub operations but it's non-linearly scaling. I haven't done the math, but it gets out of control computation very quickly. 5. Verilog has ALU operations built in with semantics controlling bit width and signed-ness and the effects on the operation. I have found I need to be ultra careful with the results. Adding 8 bit numbers results in a 9th bit. Multiplying to n-bit numbers results in a 2n-bit number. You also have to be careful with what happens when you operate on unequal bit lengths. Verilog's handling of these things might provide some inspiration on how to do what you want.
👍 1
d
There's a proposal to put this into the C language: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2472.pdf
n
@Robert Butler Seems like a cool project, but beyond the chaining you still seem like you're sticking to the classical ALU operations. I'm interested in seeing a re-think of bit manipulation, and for that reason I'm planning on having more powerful primitives than just adds and shifts (but which ultimately can be compiled back to them).
@Doug Moen Yeah LLVM's support for this is really cool, but having integer size be statically-specified seems like it's more limited than having all bit strings be dynamically-sized. In other words, LLVM doesn't seem to have BigInts built in.
Once you've decided to ban overflow, size annotations should be a performance optimisation (computed through static analysis), not a requirement for compilation.
r
I'm curious, why is banning overflow a useful abstraction? At the end of the day, it has to run on hardware and making use of arbitrary bit widths on each operation will work often (with a performance overhead) and sometimes crater the performance. Unless you limit yourself to ALU operations that scale O(n) where n is the bit width, you'll end up with a lot of edge cases.
There are plenty of languages (mostly scripting languages) that you are safe to ignore bit widths for most cases since 64bits and double precision numbers can hold a lot of range. Most programmers don't think about those things anyway.
If you do start to break out of 64 bits, you very much need to think about the hardware because performance is going to kill the program pretty quickly on anything computation that isn't O(n) with respect to bit width. I guess my question is what is the target audience is this for?
n
Integer wrapping is almost always a bug, and it’s one that can occur years or decades after a program is written. On the other hand, reduced performance is at best a mild nuisance and at worst a bug (if drastic). But (drastically) reduced performance will only occur in situations where wrapping would have occurred anyway. So the question really comes down to: when overflow DOES occur (if it ever does), do you want to produce a correct result (a bigint) or crash/produce garbage numbers? That’s the choice I’m making. And I’m sure that 99.9% of integer overflows in most programs would not cause any noticeable performance drop, and thus are an “obviously correct” choice rather than a tradeoff.
I did mention in my original post that the target audience is not people expecting uncompromised C-like performance. My primary goal is to extend the accessibility of developing nontrivial apps to those who cannot create them today.
d
Nick, you started talking about an ALU abstraction (bit strings with low level ALU-like operations), but then you say that integer wrapping is almost always a bug, so now the abstraction we are talking about is integers, not bit strings. From my perspective, the confusion between bit strings and integers began in C (where it was appropriate) but has infected many modern high level languages (where it is not appropriate). The two ideas should be kept separate, because the operations are distinct and the concepts are distinct. In my language (Curv), I started with a "number" abstraction, which only supports numeric operations, not bit string operations. Later, I found a need for bit strings (to write hash functions), so I added a separate bit string abstraction, which is actually an array of booleans. So bitwise
and
,
or
,
xor
, and
not
are just those boolean operations extended to work on boolean arrays. For hash functions, I also needed operations to convert between numbers and bit strings, and I needed a bitwise version of integer addition that wraps around (ie, it has ALU semantics). Unlike C, Python or Javascript, my "bit add" operation, which only works on bit strings, is distinct from the "+" operator that only works on numbers, and they have different semantics.
👍 1
r
@Nick Smith that's a really interesting perspective and I can see where you are going. I see the same base information and come to different conclusions, at least in part because all of the situations where I have had to deal with overflows that I can think of are precisely the ones where performance matters. I write a lot of Ruby in my day job and every time I've had to think about precision or int vs. bigint performance has been one of the walls I was up against. That said, that's just my experience and I'm curious to see where it goes and see if it opens up new ways of approaching these issues that changes the situation.
n
@Doug Moen You’re right, I’ve mixed discussion of bit strings with that of integers, and I agree that bit strings should be considered something beyond integers. That said, I don’t think they should be separate concepts, rather I think integers should be considered one use case of bit strings. Addition is just xor with carry, and can be useful as a bit string operation, amongst general sequence operations like drop, insert, replace etc.
The benefit of keeping the concepts unified is that there is less complexity in the ALU model, and you can mix integers into bit string encodings. However I expect a programming environment to default to a separate (opaque) abstraction for integers that hides their bit string representation and bitwise operations, exposing just arithmetic. The “raw” bit string representation of integers can be reserved for those wanting to do low-level coding with an ALU.
d
Think about how it's implemented. 64-bit integer/pointer unions, right? If it's small, it's an integer, if it's big it's a pointer. Well, first off, .NET can't support that, but maybe you don't mind. Anyway, the very fact that integers have variable size forces every "add" operation to become two checks for "is this first operand a big int, or is this other operand a big int" alongside each 64-bit add operation. It also precludes standard vectorization. Equality tests are trickier... if two 64-bit values are unequal, you have to then check whether they are pointers and if so, call an equality-testing function. Perhaps worst of all, every time an integer goes out of scope you have to check if integer memory has to be freed. This could be mitigated with alloca, but only for integers that are local to the function. What if a bigint is located inside a struct on the stack? It can't use the optimization unless it "knows" it is on the stack. Don't get me wrong, BigInt support is great, but in my career I've mostly written performance-critical code which I would usually design to avoid overflow reliably, rather than suffer the performance penalty of a BigInt.
n
None of what you said is really relevant to me. I'm designing a language, not using BigInts as a library feature. Accordingly, situations where you can "avoid overflow reliably" are amenable to static analysis. It seems like it should be easy to identify sequences of operations where integer size won't change substantially, and you can perform a single size check before entering such a block of code, then branch to either a 64-bit version or a BigInt version as necessary. And memory management is also a language/runtime implementation issue which is going to have to be dealt with for general data structures either way.
r
Let's say you have the following code:
Copy code
def do_it(x, y)
  if x > y
   x = x + y
  else
   x = x - y
  end
end

number_x = read_from_stdin
number_y = read_from_stdin
do_it(number_x, number_y)
This is completely arbitrary, of course, but at which location can you reliably know you can avoid overflow? Any code that takes input could be handed values which cause an overflow at some point right? If you are duplicating logic (i.e. you have a 32-bit version, 64-bit version and perhaps more) how do you know which one to use? The only way you can know if it will overflow is essentially by doing the mathematical operation, so checking which path to take is at least as complicated as just doing it, which means you just need to do it and check for overflows and extend/retry with the next size. I'm having a hard time seeing how static analysis helps here except in arbitrarily small cases where you can compute the entire possible result space and prove it and all intermediate values are less than x-bits. I'm not sure if static analysis of this kind can really be reduced below the complexity of the halting problem, which is an NP-hard problem.
n
Reading from stdin is a "gotcha" that isn't going to occur too often in the middle of a large program (e.g. a large GUI app, not a small command line utility). Often the numbers you'll be working with will be pieces of program state, or derivatives thereof. Either way, imagine number_x and number_y are used in a lot of different expressions (they're some key value like the position of a particle). Then we might be able to deduce the following: if number_x and number_y are <= 32 bits, then the calculations they are used within will not overflow a 64-bit register (in the above example they simply need to be <= 63 bits). That's the static analysis. What can we do then? Create a fast code path (no size checks, no BigInts, no DRAM access) for the small number case, and a slow code path for the big number case. Branch into the correct path based on an initial test of number_x and number_y.
I'm not saying this static analysis is going to be super effective, but I can't see any major problems with the idea yet. I'll figure it out when I try to implement it! That will be the distant future, because I'm far more focused on UX in the coming year. A good ALU abstraction is key to a good UX.
e
The minute you throw a few IF statements into the mix, static analysis becomes impossible. I don't believe that static analysis could ever work. If you have a mysterious function for which you don't know the result range, then mystery(3) * 2 becomes unknowable in terms of required bit size. For a vast number of applications, double precision floating point has proven adequate, which is why it is the main numeric type used in JS and AS3. A long time ago FLOAT64 was so slow (100x slower than integer) one would go to great lengths to avoid it, but IBM and Intel cracked the code on fast floating point, and now you don't even think about it. I think Motorola's inability to build a fast floating point unit may have been a factor in Apple ditching Motorola for PowerPC.
n
I'm only envisioning the analysis working locally. The goal is to avoid checks on every single arithmetic operation. What is this "mystery" function you're thinking of? It's going to consist of primitive operations like add and multiply.
e
Even local branches will create a combinatorial nightmare for your analysis. And if it is inside a while() loop, how do you know how many times it will execute? Every one who has ever done this has done runtime checks.
n
Well, it won't be inside a while-loop because my language doesn't have those! Nor does it have general recursion. I'm introducing repetition capabilities VERY carefully. My theoretical foundation is Datalog, which is far more amenable to static analysis.
I'll probably be including escape hatches where "anything goes", but I'm yet to see how prevalent those will need to be. I think they'll be limited to low-level libraries.
And coincidentally, Datalog is itself a good language for implementing a static analyzer, so I'll probably have high-quality static analysis at my fingertips.
d
Well, there is no doubt a body of comp-sci research on this sort of thing that would be worth looking at. "value range analysis" is a term I've heard...
👍 1
w
Love the idea of a PL where the basic types are variable sized (and/or small scalar vectors), and made a few designs in that direction.. the hard part (for me) is always making it fast. My last attempts try to solve that by putting such values on a stack (as opposed to dynamic allocation fallback). Something similar to "small object optimization" of e.g. Clang's std::string implementation is also a good direction.