Broad question here. Do people here know of any to...
# thinking-together
r
Broad question here. Do people here know of any tools that separates the complexity component of a program from the underlying behavior it would eventually produce, and then let you manipulate code so the behavior is fixed? By behavior I mean something like the user-facing behavior of a program, or its effect on some data. Its a flexible concept in my mind. A large portion of programming seems to be rewriting code so it maintains the same behavior, but is then also extensible in some way. Factoring code is an example of this activity, but you could also rewrite code to produce the same behavior which is not a factorization of the original code. To be concrete, you could factor in two different ways, so each factorization would produce the same behavior but neither is a factorization of one another. Moving back to the unfactored code and factoring in the other way is then a means of transforming the code to produce the same behavior that isn't mere factoring. (picture: code<--factoring<--code-->factoring-->code) (this is very reminiscent of factoring in abstract algebra and you could imagine an algebra about manipulating the code in this way, and going down this road you can ask whether two programs will produce the same behavior implies there is a common factorization but this might be another conversation). I'm curious about this question mostly as a proxy for a related question in math: How can you transform one proof into an equivalent proof? This is a slippery concept because nobody knows how to make precise the idea of "equivalence of proofs". If you know about Hilbert and his 23 problems you might find it interesting that he originally had a 24th problem on the equivalence of proofs! Even though the idea is notoriously difficult to pin down, I think it is intuitive enough to take a pragmatic stance and ask how you could go about implementing technology to carry out these transformations. This is important to me because in math we "factor" proofs all the time and often compare proofs to determine the essential and incidental aspects of each. So what I'm really looking for is any techniques or perspectives in the domain of programming that could be taken back into mathematics. I've seen some approaches down at the level of the lambda calculus but I haven't found them useful. I think a pragmatic/experimental approach is better than a theoretical approach at this point.
j
The closest thing to what I think you're asking about is using Logic Programming to do program synthesis. Here's a video demoing a system of this kind:

https://www.youtube.com/watch?v=5vtC7WEN76w

👍 1
m
At risk of coming up as very naive: what about tests on the practical side and formal verification on the other side? Since those are allowing us to refactor code while ensuring (to varying degree) its correctnes and behaviour, how do those fit into what you are looking for?
h
I wonder how different proofs for the same theorem compare after resolving all abstractions and breaking them down to ZF. But I guess even proofs using the minimal amount of axioms aren't unambiguous, as two different axioms could share a common idea. However, it is undecidable whether a set of axioms contains redundant axioms. In general, deciding whether two programs are semantically equivalent is undecidable too. So there cannot be a universal tool that enumerates all alternative function bodys with the same behavior.
w
When dealing with undecidability, it's best to say, "well, we'll just handle the easy cases," and see how far you go. Compiler optimizations are certainly an exercise in factoring as is partial evaluation, which is a good deal more fun. Come to think of it, conventional refactoring is sort of the opposite transformation. Instead of removing indirection, add it so that the rest is more regular.
r
@Martin I appreciate the answer. Talking about testing is more concrete than talking about an abstract idea of behavior, and as you said the practice of testing is about manipulating code so the same tests (at least) still pass. I guess the picture I have in my mind is that when you are coding you are moving around a space of strings of symbols, and in that huge space is a space of valid programs, and once you write tests, then there is an even smaller space of programs that pass those tests. Is there any means of restricting code rewrites to discrete steps that take place entirely in the space of programs that pass the tests? Rewriting for-loops as while-loops wouldn't change any tests. Changing some variable names. Refactoring. These wouldn't change the tests at all, and in a specific context you could possibly have more. If you have various pieces of code that are interchangeable then you have something like an algebra where you can substitute equivalent expressions. Tests tell you when code is interchangeable so it gives you some algebra-like thing and how can you manipulate code at this level? So tests are relevant for what I'm looking for because they let you say when two programs are equivalent. I don't think proof verification does the same in math.
@Henning Dieterichs "In general, deciding whether two programs are semantically equivalent is undecidable too. So there cannot be a universal tool that enumerates all alternative function bodys with the same behavior." Neat! Good to know there isn't a universal tool. @wtaysom Easy cases is right! I haven't seen compiler optimizations before. I might take a peek into that area, but I have a feeling I might get scarred.
r
I agree with @wtaysom. This very much reminds me of compiler optimization. Classical compiler optimization makes use of many transformations that maintain equivalence such as SSA form. These sorts of transformations combined with heuristic based analysis are the bread and butter of compiler optimization. The GHC Haskell compiler is probably the epitome of the classical compiler optimization approach. Then there are the recent results in applying Machine Learning and Genetic algorithms to compiler optimization. These compilers often produce extremely novel, unexpected, even bizarre, machine code, and yet the results have been shown to be "equivalent", and yet much more efficient in some dimension than known state of the art classical approaches (often either better runtime or less resulting machine code.) AFAIK, the way the ML compilers seems to work today is essentially unit testing and manual inspection against programs compiled with classical compilers. https://arxiv.org/pdf/1805.03441.pdf https://www.semanticscholar.org/paper/Compiler-Optimization%3A-A-Genetic-Algorithm-Approach-Ballal-Sarojadevi/6676a5489ced5412fa2ba3ecb76ca3e5ca2723e0 A key idea is that all these algorithms must have some notion of equivalence of programs. An optimization is only useful if the resulting behavior is equivalent. The real heart of this is the Church-Turing thesis. Program optimization is essentially saying one program is equivalent to another program, but just computed in a different way. The fact that such equivalent programs can even exist is a direct result the Church-Turing thesis. Your line of reasoning is related to some of the inspiration of the creators of logic programming and proof assistants like Coq and Agda. If you can encode a proof in a programming language, you have essentially shown that the proof is computable. If it's computable, then there are infinite equivalent programs (via Church-Turing). You can then apply all the known computable transformations like SSA, or graph pruning analysis, etc...
😎 1
w
Wow, GCC flags @Ray Imber? I do like the idea of ML being applied at high level "moves" than it often is. I mean, for instance, allow for only valid application of inference rules rather than random term rewrites. Let the system play over that space.
amiga tick 1
d
"tools that separates the complexity component of a program from the underlying behavior it would eventually produce, and then let you manipulate code so the behavior is fixed" I'm not sure if optimization is on your mind here, but look up the Halide "language" (a C++ DSL), which allows the behavior to be specified separately from the execution plan. Also I heard TVM is related, as well as a new pair of languages called Rise & Elevate for algorithm optimization.