Provenance of content will be a huge challenge due to recent Future of Coding #thinking-together

Provenance of content will be a huge challenge due...

Jim Meyer

12/05/2022, 11:34 AM

Provenance of content will be a huge challenge due to recent advancements in AI/ML. My immediate thought was "but code is in Git etc. and we know the author", but that's all void if the actual author of the code was a tool like ChatGPT/Co-Pilot and the dev was just the one that pushed it. Maybe AI will be what brings about next-gen versioning systems where content provenance is managed at the AST node level during code authoring, and not just by whoever pushed the code after the fact.

Nilesh Trivedi

12/05/2022, 11:36 AM

Why does provenance matter? If it's good enough to pass a human review, then it's good enough. So quality/accuracy will matter more than provenance. Or did you have copyright liability in mind?

➕ 1

Jim Meyer

12/05/2022, 11:42 AM

I think the review aspect is that weak part of the link. More people, with less programming knowledge will be adding code that was written by AI, and even developers are busy and will accept code that looks correct. All that code is then fed into the training set of the next round, and you've got a positive feedback loop. It's the same as self-driving cars. Humans learn to trust the system, relax, and then it's all fine until it's not. I do think we'll eventually get to a good place, but right now we're in the gap between two worlds, and that's when the damage happens.

Jim Meyer

12/05/2022, 11:43 AM

The thread that inspired this: https://twitter.com/v21/status/1599571365556006915

Nilesh Trivedi

12/05/2022, 11:46 AM

Speaking only as a developer, we cannot accept code that just looks correct because we have quality review infra in place. Any code needs to pass the test cases. Can it pass test cases while being incorrect? May be, but then that's equally likely with a human. So, generated code is fine. But generated prose is a different matter. We will need accuracy-checker, bullshit-checker etc kind of systems.

Jim Meyer

12/05/2022, 11:47 AM

How will the AI know the difference between well-tested code and code that's not during its training?

Nilesh Trivedi

12/05/2022, 11:49 AM

Not the AI, but the AI training team will very likely put quality filters for the training set. Instead of training on entire github or stackoverflow, you only train on the highly-rated repos or answers.

👍 1

Jim Meyer

12/05/2022, 11:52 AM

I see your point, but highly-rated is a simple metric, that might be more related to popularity/hype than code-quality/correctness. It's a super difficult problem.

Nilesh Trivedi

12/05/2022, 11:54 AM

My point is, when it comes to code, the vicious cycle of bad quality content -> bad model -> bad quality content is just not there. Because our quality/rating systems work. For prose, yes, this is absolutely true.

Jim Meyer

12/05/2022, 11:55 AM

Interesting. I think code and prose are the same in that regard 😄

Nilesh Trivedi

12/05/2022, 11:59 AM

Not really. Spam on the web is mostly prose. There are both incentives as well as lack of quality filter tools. Nobody is spamming github with incorrect code.

Jim Meyer

12/05/2022, 12:08 PM

Ah, yep, agreed. they're different in terms of scale and incentives. I was thinking from a more fundamental level: Quality assurance of language content at scale (both code and prose are symbolic, and use/construct abstractions). They're similar problems in that regard, but we might be lucky that the incentives make training on code a less complicated practical problem.

Nilesh Trivedi

12/05/2022, 12:09 PM

Meanwhile, StackOverflow has temporarily banned answers by ChatGPT because they're too inaccurate: https://meta.stackoverflow.com/questions/421831/temporary-policy-chatgpt-is-banned

Jim Meyer

12/05/2022, 12:09 PM

Haha, yep it sure is confident, even while wrong 😄

Jim Meyer

12/05/2022, 12:23 PM

This actually feeds back to my initial worry. Code alone is not enough to train systems like ChatGTP. It won't know how to map natural language onto code. That comes from training on sources like StackOverflow. See how this spam makes prose and code not so different? The social systems and incentives do appear to have some overlap. This policy change is needed, but difficult to enforce at scale. Same as QA'ing spam/prose because it requires a lot of effort to know whether something is a good answer.

👍 1

Jim Meyer

12/05/2022, 12:24 PM

Humans talking about code -> Social systems/incentives -> Humans gaming the systems with AI

Mark Dewing

12/05/2022, 4:33 PM

AI systems based on large language models are prone to human-like errors, both in solving logic puzzles and in writing program code. Which gives me less confidence in code review as a quality check on AI-written code - both the AI and the human are okay with code that "looks good". The advances in the scientific method come from developing techniques to work around limitations and drawbacks in human reasoning. We'll need to apply these (and new ones) to AI systems. In my view, the test suite is going to be the most important part of the code base, since an AI will write the code. (Until the AI's can write the test suite as well 🙂 ).

wtaysom

12/07/2022, 2:32 AM

Things to keep in mind: 1. Part of what makes ChatGTP and improvement over plain GPT-3 comes from training for relevance rather than the first thing you would think of. So adjusting the training criteria should then feed into... 2. If an AI is writing code, it should iterate with itself running the code before coming back with an answer. Basically, mix in some of what's done to play games. In fact... 3. We could potentially see quick improvement with these chat systems if well engineered prompts can go back into the training of the system. For example, one reason why "in the style of" prompts get more pleasant output from ChatGTP is that they steer the system away from its default middling BS mode. Curiously, one good use of ChatGTP is to help discover likely misconceptions that students learning technical subjects are likely to have. In conversations so far, I've seen ChatGTP be fuzzy on the distinctions between: 1. Continuity and uncountability in math, and 2. block and procs in Ruby. To the credit of incorporating so much training data, ChatGTP is conversant on so many topics. It's kind of interesting how OpenAI has put a filter on this so as to avoid avoid answering questions where ChatGTP's knowledge might be limited.

3 Views

Open in Slack

Previous Next