A few shower thoughts on AI and end user programmi...
# thinking-together
d
A few shower thoughts on AI and end user programming. • I imagine this group is more averse to AI research compared to baseline in tech/CS. The following comes to mind: ◦ Most algorithms today tend to be pretty opaque, whereas we value understandable systems. ◦ Current end user systems (e.g. Alexa) are super frustrating when they don’t give you what you want, since their behavior is hard to manipulate. ◦ There’s an uneasiness about AI systems replacing the things we DO want to control; we value systems that put people in control. ◦ There’s little value for human intuition in most current AI research, with some notable exceptions below. • The deep learning gold rush has created a situation with interesting analogies to the “programmer/end user” dichotomy. Researchers who develop the expertise to design deep learning systems tend to “think in deep learning” to the exclusion of better suited approaches in some contexts, while those without a deep understanding have little ability to modify their systems. • Going further with this, two of the most interesting researchers right now IMO are François Chollet and Josh Tenenbaum, both of whom see program synthesis as the key to advancing AI in a more interesting and useful direction. There’s a parallel with the popular “Type I/Type II” thinking analogy: deep learning is great for Type I thinking (pattern matching, interpolation, “geometry”), while there hasn’t been much progress yet with Type II thinking (generalization, planning, “topology”). The idea is that you need both: a perception layer that goes from continuous to discrete, and a “programming” layer to work with the results. A recent example of program synthesis is DreamCoder. • IMO we’re about to see all the wrong assumptions that are encoded in current software systems recapitulated in this new trend in AI research, since these efforts will use existing programming languages for the program synthesis component (DreamCoder uses a lisp). I think there’s an opportunity for our group’s way of thinking to improve on the “program synthesis” part of the pipeline. • Chollet’s “Abstraction and Reasoning (ARC)” paper and challenge is to me a very clever and appealing illustration of the limitations of deep learning and the potential of program synthesis. ◦ One “big idea” is to optimize for generalization by limiting the number of training examples available to an algorithm. • “Building Machines that Learn and Think Like People” is another nice intro to this direction in AI research, linked in this rather delightful essay. • Lick’s Man-Computer Symbiosis paper from 1960 is as fresh as ever (not to mention The Computer as Communication Device). • If this is all too nebulous, Chollet’s NeuroIPS talk (the first 40 min) is a nice crisp intro. Interested if anyone else is thinking about these topics.
🎉 1
👍 7
s
I imagine this group is more averse to AI research compared to baseline in tech/CS.
Ha - I certainly am more averse to AI, specifically the trend of slapping ML on anything and everything. Consider copilot, which autocompletes textual code, without modeling any semantics. It kinda seems backwards.. first we make something hard to do, then try to work around it by heavy ML / statistical models. No wonder when its wrong it is totally wrong: secret keys, joke code etc. Having said that, I got a few very good responses to a tweet once https://twitter.com/chatur_shalabh/status/1312073013194493952 I'll summarize: 1. instead of putting ML into production directly, use it to find a function that does what you want, then render it out as human-readable code for deployment 2. exploratory programs - write a program to generate some shapes, then use a DNN to generate shapes you did not think of 3. given some requirements in a constraints language, do a ML driven search in a very broad solution space
d
Thanks Shalabh. I hope you’ll give the papers above a skim (On the Measure of Intelligence, Building Machines That Learn and Think Like People), as I think you’ll find some good responses to the question you posed. To me the responses you got are nice but just scratch the surface. For instance, differentiable programming is a very rich area to explore, even without introducing “heavy ML”, and can mesh very well with existing domain knowledge. I agree with your criticism of Copilot (and GPT-3 in general), and I think causal models have to be at the core of any collaborative human-computer system.
👍 1
w
"Anyone else is thinking about these topics?" In a word, yes.
🥁 1
💥 1
k
Thanks @daltonb for all the references you provided! Just what I have been looking for for a while: a high-level overview of the non-boring activities around AI. My own current view of AI: its role is in discovery, not decision making. I want AI tools to suggest things to me, much like code completion in an editor, but let me decide what to do with them.
☝️ 2
d
@Konrad Hinsen Thanks for the feedback! I only came across Tenenbaum’s and Chollet’s work recently, and was very pleasantly surprised since I haven’t historically found much to be excited about there. I definitely resonate with your view. If you end up experimenting in that direction, I’d love to hear more.
t
• AI community is SOTA oriented, if a method leads to good metrics, it will be adopted despite we do not know how it is working. Program synthesis generated human readable program can not outperform unreadable blackbox neural networks, so the interest of it will be limited. It might be useful to distill pattern to detect anomaly, such as seasoning effect of time series. Still, do not know if any program sythesis approach can achive SOTA in anomaly detection tasks. • Program synthesis might be viewed as a form of unsupervised learning. It can help human to interpret the data, finding deep statistics about the pattern. But synthesis program might not be a good representation of knowledge learned from vast amount of data. • “programmer/end user” dichotomy to me is about the producer of big language model (such as GPT-3) and users fine-tune them for downstream tasks. Most "end user" will not have access to the software/hardware to produce another GPT-3, they will be forced to reuse whatever was given due to cost. Just like we reuse excel for everything.
🤔 1
d
Hi @taowen, have you looked at the ARC Challenge paper above? So far, program synthesis is the only competitive approach for that benchmark, which is part of the charm of how it was constructed. To me it’s a good illustration of the limits of today’s SOTA methods. The way I see it informally, whether you use program synthesis, NNs, or any other approach in designing your system, it needs both “lookup table” and “inductive reasoning” modes for robust real-world operation. Type I vs Type II thinking, if you will. Lookup tables are what you get from (nonrecurrent) NNs, and they beat inductive reasoning for performance (and usually accuracy) if things never change and you have enough data to train with. However, results from chaos theory show that lookup tables can’t be a universal approach when navigating real-world dynamical systems. On the other hand, causal chains are very effective tools for generalization under novel (sparsely sampled) conditions. So inductive reasoning beats lookup tables in new or changing environments, but in most current systems this is left to the engineers designing the system or offloaded to end users. As a brief counterexample, classical mechanics is a program synthesis approach (as nicely illustrated in Structure and Interpretation of Mechanics), and is a preemiment example of knowledge representation learned from vast amount of data.
t
is ARC Challenge listed here https://paperswithcode.com/sota ?
k
@daltonb Seeing your reference to SICM, you might like this article about deriving mathematical models via machine learning (plus human-provided assumptions): https://arxiv.org/abs/1911.01429
d
Thank you. Very lucid paper— and still on the edge of my ability to read without spending all day on the math. Any main takeaways for you? Fig 3 will be a reference to hold onto. I’m still kind of hung up on “None of these diagnostics address the issues encountered if the model is misspecified and the simulator is not an accurate description of the system being studied”; I’m trying to get my head wrapped around the inductive process of developing the model in the first place and then adapting it as the world changes. On another math note, you may find this analysis of how NNs really work interesting. Nice 15-20 min

discussion here

starting at minute 12.
k
The sentence you quote is actually one of the main takeaways for me. It's the same issue we have with Bayesian modelling and many other data-heavy methods: there is some basic layer of assumptions baked into the system before any data flows in, and those assumptions cannot be tested, no matter how much data is available. I suspect it's the same with human learning, but our basic assumptions about the world have been shaped by evolution to be reasonable (perhaps I am overly optimistic here!).
BTW, one of the authors has a nice Web site / blog with material on the use of AI in science: https://astroautomata.com/
I had an interesting exchange with him in the comments section of one blog post: https://astroautomata.com/paper/symbolic-neural-nets/
That video looks interesting indeed, the paper is on my reading list. The discussion reminds me of how NNs are introduced in Stéphane Mallat's excellent course (https://www.college-de-france.fr/site/stephane-mallat/_course.htm, in French).
d
@Konrad Hinsen nice comment exchange, thanks for the link. Everyone works for DeepMind these days huh? Reminds me of a similar project for symbolic learning from Max Tegmark’s group. His pithy take: “I think of all of physics as being lossy data compression.”

https://youtu.be/pkJkHB_c3nA?t=1877s

k
A physicist ought to know better! Data compression means you can recover your observations, and interpolate. Physical theories allow more, in particular generalization. Theories are transferable to situations that nobody has observed before. Which is why we can test them by doing experiments.
d
Heh I take him to be talking about a higher order of data compression. Causal composition allows us to navigate the curse of dimensionality and find higher ground when local conditions change, but doesn’t help with “global conditions” outside the model. Causal models still make regularity assumptions about unobserved future conditions; e.g., that no one’s about to shut down the simulation or tweak the speed of light.
k
I would indeed expect him to be aware of this, but he doesn't mention it in the talk. Viewers are thus likely to take his "physics is data compression" too literally.
👍 2
b
Thanks for all the references @daltonb! I need to look through them. What @Konrad Hinsen mentioned resonates with me and reminded me of my personal stance that a lot of modern tech developments have seemed to forget about computers and tech augmenting human abilities. We see this in “self-driving” cars: let’s just slap machine learning on this system designed for human use and replace the human. It doesn’t work and completely skips the step where we augment human abilities. I see that in automated theorem proving. The headlines are always about replacing humans rather than framing those systems as they really are: automated proofs are developed by humans in human developed languages and cross-referenced against existing proofs by humans or references. Those systems are augmenting human work but the framing is always “replacing”. I’d personally like to see more concentration on augmenting human abilities, and I think that’s where machine learning can help out. (Apologies for rambling and the stream of consciousness. Just thinking and typing at the same time. 🙃)
👍 1