Title
#thinking-together
shalabh

shalabh

08/24/2019, 11:20 PM
In most programming languages, when you define a composite type (e.g.
class User {string name}
), you define both - the business shape ('a user entity has one name') and also the memory layout used to represent it internally. Are there any languages that let you define these separately?
Dan Cook

Dan Cook

08/25/2019, 1:02 AM
Classes in languages like C++/C#/Java define both the type of entity and the memory layout for it. Languages like Haskell treat types more like an abstract algebra over combinations of data.
Wouter

Wouter

08/25/2019, 3:46 PM
Almost all languages you don't really define the memory layout, because you have only 1 choice. Languages that give you >1 choice for layout are typically very low level.. and in those if you want to not hard-code that choice you use.. templates? 🙂
shalabh

shalabh

08/25/2019, 4:35 PM
Yes that's what I mean. You don't get any choice and the 'definition' implies both - the data model and the memory layout. Even in Haskell where you define only algebraic data types, and implicit layout is provided by the language. In C it seems like the memory layout is more explicit, though you're still defining a data model. Maybe some trickery is possible with C++ templates but it seems really clunky to do it everywhere. What I'm looking for is languages that explicitly support the notion of data type only and separately mapping to bits - potentially even support multiple representations at the same time in different places. The motivation is our business logic should not be coupled to lower level details, but optimization should still be possible - separate 'design' from 'optimization' in some sense. Here's a related post from before: https://futureofcoding.slack.com/archives/C5U3SEW6A/p1562172714184400?thread_ts=1562010026.133300&cid=C5U3SEW6A
Wouter

Wouter

08/25/2019, 4:43 PM
I'm not sure I follow.. in some sense, Java, Python, Haskell.. these languages you define data type only, since mapping to bits is either mostly invisible to you or not under your control anyway.. are you saying you want a low level language where flipping between, say, inline and heap allocation of members is more convenient than with templates or whatever?
4:49 PM
maybe you should give an example
i

Ivan Reese

08/25/2019, 7:59 PM
How about in Jai, where you choose whether to store collections as AOS vs SOA? That's a language-level feature (whereas in C, you still do AOS vs SOA, but you do it manually IIRC)
8:00 PM
I think that should count as separately defining the "business shape" — this is a collection of structs with these fields — and the memory layout — this is to be stored as a struct of arrays, since that better matches my access pattern and better leverages the cpu caches.
8:05 PM
I've also seen something like this in Clojure. For instance, Karsten Schmidt's thi.ng geom library is full of functions that operate on various kinds of vector, which have an explicit sense of the data type (vec2 vs vec3 vs arbitrary dimension vector).. but the underlying bit representation in memory is left up to the programmer — it just needs to support positional access by conforming to the appropriate Clojure protocols. Also, what about swizzling? Would that count? That tends to be offered as a language level feature for handling separately the memory layout of fields and their meaning.
Nick Smith

Nick Smith

08/26/2019, 12:15 PM
There is a seriously neglected research area that addresses the approach wherein you leave it to a compiler to select the byte layout for you (rather than making the choice yourself). The research area is "data structure synthesis" or "data representation synthesis". You can google these terms to find the relevant literature. I'd love to hear from anyone that knows something about how data structure synthesis has been (or could be) applied in a general purpose language.
shalabh

shalabh

08/26/2019, 4:47 PM
@Ivan Reese yes the Jai's AOS/SOA is a great example is great and kind of the thing I'm looking for - I didn't know you could do that. @Wouter the main idea is that there's a 'business shape' as Ivan called it, and a separate mapping onto memory bits (and other implementation details). In languages where you don't define the latter, a default mapping is provided for you. Unfortunately this may not always be optimal. To take a simple example, objects are always stored as references in Python so a string attribute is a pointer to a string object. Strings are immutable so I should be able to easily say, for this class, store this attribute inline (i.e. embedded). So whenever you assign 'obj.strvalue = something', it would get copied (no refcount update). In Python you can write a custom C extension that does this, but it's too much work and not available within Python itself. C++ has copy constructors but they only work very isolated cases because the machine types (
char *
etc) are exposed and pervasive across APIs. The motivation behind this is that I should be able to design with and write 'pure business logic' (for lack of a better word) without consideration of memory layouts and other lower level details, but also be able to specify these details separately. So if I change the memory layout, my business logic code doesn't have to change one bit. The business logic should only be coupled to the business shape. Consider how often in C++ a bunch of code will need to change if I switch from using a pointer to a reference. I also would like to do more advanced things, taking the Python example again, I may want to say that one attribute is always stored in a specific arena in memory. So whenever you assign to this attribute the value gets copied to the arena and aggregate operations on this attr become very fast. This would have to be baked into the language from the start - so
map
etc can be designed to be aware and use the internal structure of things. The ultimate extension of this idea is that you can take the same 'business logic' code and run it in a distributed fashion, but just providing a different implementation mapping.
4:52 PM
@Nick Smith - thanks for the pointers - will check them out.
Wouter

Wouter

08/27/2019, 2:26 AM
The problem with decoupling that more than say C++/Rust manage to do is that the representation matters a whole lot for how it is used.. if for example I pass on an array element to other code, its representation may determine its lifetime (who owns it), wether it is copied, or has object identity in comparison, and in the case of SOA the fields may not even be adjacent in memory which poses further challenges. And you don't necessarily want the above issues to be fully transparent either, given that you obviously must care about performance.
shalabh

shalabh

08/27/2019, 4:56 AM
Yes I agree lifetime is another gnarly issue related to this. It is one of the other implementation details often implied by the schema definition. I don't think C++/Rust have really explored everything though. While a simple lifetime policy that works for business models is 'anything reachable is alive' (Python, Java, etc.) it leads to the generally slower GC model. We do have to care about performance, the question is whether the performance related details can be separated from the business logic. Which is why I'm looking for other examples.
Wouter

Wouter

08/27/2019, 5:18 AM
I don't think you can. But lifetimes being part of the business logic is actually kind of nice. Who really "owns" data? What business do you have looking at data that was owned after the parent has died?
10:04 AM
I’d like declarative data descriptions separate from where I ultimately decide how things are going to be laid out/allocated/packed.
10:05 AM
I was thinking perhaps some multistage programming could be leveraged here. Getting it to work nice and ergonomically could be a challenge.
y

yairchu

08/27/2019, 12:28 PM
Note that AoS/SoA can be solved in existing languages with type parameters / “Higher Kinded Data”