re symbolic execution in ruby, i'll put that in a ...
# thinking-together
h
re symbolic execution in ruby, i'll put that in a thread because i did NOT realize how much of a wall of text i was putting up just now (oooooops)
🍰 2
so, my technique is roughly this: • use the AST as a guide for what to model. i'm using whitequark's parser gem for now because it's more at the level of abstraction users care about than the RubyVM::AST module, and i'm aiming for usability over exhaustiveness rn • assume primitive datatypes exist and focus on modeling their semantics first, so, dual prioritization. AST nodes so we can tie things back to source locations trivially, and primitive datatypes because in dynamically typed languages with a propensity for duck typing, exhaustive exploration of the semantics of an AST node is intractable • pick out examples from rubyspec for that AST node that imply some semantics • either manually or by way of retooling the runner (more ideal) to do so automatically; symbolize the critical inputs • take inputs and outputs over to z3 and find the implied minimal semantics (this part isn't in ruby rn bcz the best z3 gem is broken) • manually, or with some tooling (obvs ideal) refine the implied minimal semantics. an example is that z3 couldn't infer boolean or semantics in a compact form at first, because it couldn't do some basic boolean rewrites that i expect something like coq etc would've been able to do. or at least, it can't do it as best i can tell yet. • test the semantics against symbolic execution of MRI/etc using symbolic execution engines targeted at lower level semantics. probably going to use triton for this? maybe klee. depends. triton would be more portable to other implementations. s2e is an option too, but triton really feels easiest atm. • then run the semantics against rubyspec to do a final test. the end goal is to be able to infer semantics for all dynamic languages somewhat efficiently.
@wtaysom since you asked about this
k
(wall of text is A-ok)
👍 1
c
In #C5T9GPWFL we're here for WoT 👏
1
s
Interesting stuff! I've been interested in using abstract interpretation for improving the programming experience (offload mental simulation, etc.). This seems to overlap, though I'm not sure I quite understand the difference between symbolic execution and abstract interpretation.
the end goal is to be able to infer semantics for all dynamic languages somewhat efficiently.
Is there a larger goal in mind here once you can infer these semantics?
h
yes, with said semantics in a separation logic friendly form, we can bring provable systems methods into the realm of dynamic languages with high confidence, i hope. also things like extracting TLA+ specs from implementations by way of rewriting/interactively abstracting with automated assistance away aspects of the separation logic spec of your program
💪 1
abstract interpretation and symbolic execution aren't really fundamentally distinct, but i found one approach much easier to self teach, and also it's just easier to apply symbolic execution directly when there's no language spec, at least if you want to machine verify your spec.
at least as far as i could learn so far, that is
there are ways to exchange information between the abstract interpretation and symbolic execution, at least theoretically, but i don't know the details at all, and i'm not aware of any research exploring that in particular.
AI emphasizes AST and lattices under the hood more whereas SE emphasis bytecode or instruction level semantics a bit more, while representing things in a first order logic generally. (technically i'm also drawing on counter example guided abstraction and there are other niches/terms some may not consider strictly SE that i'm including in it, such as dynamic symbolic execution, concolic testing, dynamic binary analysis. just in case anyone wants to follow up on any of this.)
👍 2
w
This is way, way cool. And for "the end goal is to be able to infer semantics for all dynamic languages," Ruby is a great place to start since its semantics are nuts. (I say this as a proud Rubyist since the turn of the millennium.) Ruby is going to keep you honest. I mean the very fact that people are productive in Ruby shows that one can model its semantics "isomorphic up to shenanigans," but my, my what shenanigans are available! Let me share a little bit about my interests adjacent to this: (1) treating execution traces as first-class objects for debugging, (2) refactoring. (1) When dealing with a relatively big framework ( let's call it Rails), its nice being able to see how all the bits are dynamically connected. And one step removed form the execution itself, it's useful to ask the what-if at specific branches. This leads to... (2) The most common useful refactor I see in practice is the business logic tangle of conditionals where a complicated joint space (let's say six booleans and four variables depending on them, so 256 cases to consider) can probably be factored in a reasonable way, but I'd really love tool support for navigating that space, making sure all the cases are accounted for.
❤️ 1
h
think i'm close to sharing something maybe around this! it's the hacky version using klee and the MJIT
❤️ 2
but it's a start
building a docker container based off the klee container