Here's my perspective on LLMs and the future of pr...
# thinking-together
n
Here's my perspective on LLMs and the future of programming. I don't believe that the introduction of LLMs that can write code is going to obviate programming. And I don't believe that it is now pointless to develop new programming languages. Instead, I think LLMs are going to make programming and FoC research better, by automating one of the least interesting parts of programming: fiddling with the minutiae of syntax, language constructs, and libraries. I think programmers will still have plenty of work to do. The profession is not doomed. But to justify this, we have to take a step back and consider all of the activities involved in programming. Firstly, what is a "program"? A program is nothing more than: • A formal specification of the behaviour of an interactive system • ...that computer hardware can execute (after translating it into machine code). To emphasise this, I will use the term "formal spec" in place of "program" for the remainder of this discussion. GPT-4 can understand formal specs, and also everyday English. Thus, if we can describe the functionality of a system in everyday English, GPT-4 can (attempt to) translate it into a formal spec. But writing the formal spec is just one activity of programming. Altogether, programming (or perhaps "software development") involves several activities: 1. Determining what functionality the system being developed "should" have. This is done either by talking with relevant stakeholders (e.g. the future users), or by directly observing deficiencies with their current practices. 2. Expressing that functionality as a formal specification, i.e. "coding". 3. Verifying that the specification correctly implements all of the functionality of step 1. This includes practices such as reading and reviewing the specification, as well as testing the software. 4. Validating that the implemented functionality addresses the stakeholder's problems. 5. Repeating the first 4 steps until the stakeholders are satisfied with what has been developed. Here's my hypothesis: In the next 10 years, LLMs might radically reduce the amount of work required for step 2, but only step 2. Steps 1 and 4 are very human-centered, and thus can't be automated away — at least until we are at the point where we have an omnipresent AGI that observes all human practices and automatically develops solutions to improve them. Similarly, step 3 will not be automated any time soon, because: • The plain English descriptions that we give to LLMs will often be ambiguous, underspecified, and maybe even inconsistent. Thus the LLMs will have to make educated guesses at what we mean. (Even if they are able to ask clarifying questions, there will always be some choices that are automatically made for us.) • LLMs will occasionally get confused or misinterpret what we say, even if we are clear and careful. We will not have infallible AIs any time soon. So let's assume that LLMs can automate most of step 2. What does this mean for those of us developing tools and technologies to improve programming? Is our work obsolete now? Will the AI researchers and AI startups be taking the reigns? I don't think so! There is still a huge opportunity to develop tools that address step 3, at the very least. (Steps 1 and 4 are harder to address with technology.) In particular, step 3 involves the task of reading source code. When an LLM spits out 1000 lines of JavaScript, how do you know that the code implements the functionality that you wanted? You have to verify that it does, and for large programs, that will be an enormous amount of work! As we all know, no amount of testing can prove that a program is correct. Thus, we cannot verify AI-generated programs just by using them. Maybe the program has a subtle bug, such as a buffer overflow, that might only be triggered 5 years after the program is deployed. Or less insidiously: maybe the program just doesn't handle certain edge-cases in the way you would like it to. Either way, a human should probably read through the entire program with a keen eye, to check that all of the logic makes sense. There's clearly an opportunity for FoC researchers here: we can make languages and tools that make reading and verifying the behaviour of programs easier! Some examples: • We can design programming languages that are vastly easier to read than traditional languages. How might we do that? Well, "higher-level" languages are likely easier to read, since they are likely to be more concise and focus on the end-user functionality. So work on higher-level programming models will continue to be valuable. To complement this, we can (and IMO, we should) invent new syntaxes that are closer to plain English, such that the specifications that LLMs produce are accessible to a wider audience. • We can design programming languages where it is harder to write erroneous programs. For example, we can design programming languages that cannot crash or hang (i.e. Turing-incomplete languages), but which are still general-purpose. This reduces the kinds of errors that a human needs to consider as they verify a program. • We can design better tools for reading and interrogating source code. (For example, better IDE support for navigating and understanding the structure of large codebases.) • We can design better tools for exploring the space of behaviours of a running program. (Perhaps similar to the tools discussed in Bret Victor's "Ladder of Abstraction" essay.) Overall, I think the future is bright! I'm going to continue my own PL research project (a very high-level language) with as much vigor as ever.
w
Arguably with "coding" (2) assisted/automated, we can get on to the future part! Certainly people are playing with (3), getting Chat to output Agda code for example. But the simple fact is that if "getting it working" becomes easier, then we actually give attention to "get it right." I can also see AI potentially helping with the communication challenges of (4).
n
Your mention of Agda has me wanting to define the notion of "verification" a little more carefully. (I'm just thinking aloud here.) Agda and friends are often described as languages for writing "verified programs", but that term makes me uncomfortable, because it's prone to be misunderstood. (At the very least, it confused younger me.) No programming language can verify that the functionality a programmer desires has been implemented correctly. No programming language can ever do that. The programmer can always specify a valid program that is different to the one they had intended to specify, and the computer will happily accept it. The best you can do is to ask the computer to verify that your program has a particular property, e.g. "this variable is never null" or "this program doesn't crash". That's a merit of static typing in general. But back to your comment: Yes I agree, Agda makes it easier for humans to verify LLM-generated code by reducing the kinds of errors that a program can have. It achieves this by having an expressive type system that can reject a lot of invalid programs. 🙂 (IMO, Agda's type system is too complicated for the average programmer. But I wholeheartedly believe that developing languages with expressive type systems is a worthy goal.)
j
Most of the software development work is maintaining large codebases and I'd say it's still largely unproven that LLMs can offer much help in these settings. Currently the limiting factors are at least the allowed input sizes and the cost and speed of the queries. It seems that iterating on large codebases would be very slow and difficult. Overtime, it's of course very likely that these issues will be solved, but hard to say if it will take two years or twenty years. I think LLMs will certainly increase the productivity of software development and therefore make it cheaper. This will lead into software becoming economical in more use cases, so the demand for software will increase. Hard to say how this will balance out with the increase in the supply from the productivity increase. But it could also be that LLMs increase the demand for software developers, which could in turn increase the market opportunity for FoC tools. And as discussed, generating traditional code with LLMs still, at the very least, requires professional programming skills to verify the output. A (good) low/no-code tool would not.
But of course, if a LLM already performs better in the use case, and for the audience of your foc tool, it's probably a good idea to pivot.. You could still try to compete with price, but it's difficult to say what will the price of LLM queries be in a year.
n
Why can't an AI skip the intermediate step and just output direct machine code? After all, do you verify what your compiler is spitting out?
People are still thinking in terms of the old paradigm of write, compile, run program. But a computer system where you can interact with an AI that is able to directly access input and output pretty much removes the need for any sort of programming. Programming is in the end a roundabout way of getting a computer to perform some action/compute some result.
l
gpt4 excerpt:
> LLMs for bridging the gap between domain experts and developers: LLMs can help facilitate communication between domain experts and developers by translating domain-specific requirements into programming constructs or even generating code prototypes. This can reduce miscommunications and speed up the development process, especially for interdisciplinary projects.
@Nick Smith LLM's is now an essential component in most tools for thoughts; the interface and information organization/representation (eg. higher order language, pkm/graph, etc) to facilitate both the LLM-tft side and the tft-human side is still up for a huuuge improvement. Feeling pretty good about working just in that zone :))
i
Porting my reply from @Jarno Montonen ‘s question to build of what was already said here. I think there’s broadly two categories of software “consumer”, adapters and users. Users want it to just work and adapters are interested in how. Maybe we could even map it to wanting the right thing vs worse is better. For users LLMs could be a more natural interface to existing tools, APIs, automation. There the question might be, “Is this a better interface than what you get from other methods of interaction?” Sometimes? For adapters, visibility and clarity are as important as the end product. As far as I’ve seen, using prompts to program a system builds a different skill from doing it manually . Similar to using an existing library or hiring a developer you abstract the process by outsourcing to a third-party. If that’s valuable depends on what the task is. But good abstraction, comes from being able to peel that back if needed. With current AI tools the person writing is also responsible for defining those layers. So speed could improve, but required skill would still be similar. Pessimist take: Tooling tailored to AI programming seems more likely than them using existing ones. Historically that means a split, probably with more mainstream effort on the AI language side. But people tend to do this a lot: cities rebuilt for cars, modern assembly areas are too dangerous to walk around, optimized javascript limits view source’s benefits. Billion dollar SaaS black box as the best option for improving usability of existing tools software seems like digging deeper into present coding’s problems.
n
@Naveen Michaud-Agrawal You're asking "what is the value of reading a program that somebody else has written?", i.e. what is the value of code review. The answer is well-known, and the second half of my post addresses it in detail.
n
Sorry I didn't make my point that clearly. What I'm saying is that if an AI is good enough to write code for a compiler (by code I mean an end to end system, not just a snippet of code similar to a stack overflow answer), it is probably good enough to write code for a machine architecture directly. In that case there is no code to review.
n
In that case there is no code to review.
That's simply not true though. There is still code to review, the difference is that it's now much harder to understand. In general, we're not going to want AI to write opaque binary programs whose behaviour cannot be inspected. (Though it might be fine for certain toy or hobbyist programs.)
d
Well, it's possible that AI will earn our trust and we will indeed settle on machine code output. We trust programmers after all. But even then, we have to be able to build and iterate on a shared mental model with the AI as we see the result executing. So we could see what we think of as high level languages today splitting in two directions: the machine code part down and the dialogue-driven modelling up.
n
it's possible that AI will earn our trust and we will indeed settle on machine code output. We trust programmers after all.
That's true, but I suspect that won't happen until we have "full AGI" (whatever that means). We won't be able to trust an AI to do the right thing for us until it is able to have detailed conversations with us about general requirements and come away with a true and complete understanding of what needs to be done — not just a smoke-and-mirrors "best guess", as happens with LLMs today. At the point where we have such AGI, there will be far more dramatic ramifications for society than the automation of programming jobs. We will likely have automated the vast majority of human labour. 🫢
d
Yup 👍
But anyway, the other half of my point is the issue of the nature of a formalism used to evolve a shared mental model with the AI ... One way or another we won't have to think about NFRs - either that's dealt with in the "AI's compile and commit latest machine code" loop or the modelling language is already good enough to be able to take pure domain functionality descriptions and pull together all the machine stuff needed to make it directly executable, optimising along the way. So what "cognitive modelling formalisms" are there for expressing pure domain structures and behaviours, that we can use to give robustness to our chat threads with an AI?
n
I still have hope that a high-level programming language can serve as the "cognitive modelling formalism". Indeed, as we speak, I'm working on a high-level PL whose syntax is more-or-less a formal subset of English, combined with some mathematical notation such as function calls
f(x, y)
.