Hi everyone, I have a big question about a program...
# devlog-together
m
Hi everyone, I have a big question about a programming-related problem I only partly understand, hoping not to get an answer, but rather pointers to where I should be looking. Here it goes. For the past two years I have been working on SplootCode wanting to make code more accessible to non-engineers. The initial product (as is today) is basically a structured editor with drag and drop a-la-Scratch capabilities, and some other things to help non-engineers see and understand what the code is doing. You can see an example tiny program here. During the interviews I conducted with several people, I bumped into a non-native English speaker who revealed their frustrations with having to learn English in order to code. After speaking to another dozen non-native speakers - some of whom teach programming in non-English speaking countries using their native language - I figured that this is a really big problem for a lot people, and I want to solve it. Since I have been deep into SplootCode for over two years, I am sure that I am suffering from sunk cost fallacy like crazy, thus the reason I am asking the question here is to get some outside perspective. --- So, to solve the problem for people, I am wanting to address the nested problem that – AFAIK – the vast majority of programming languages, with the exclusion of any purely symbolic ones, are effectively a subset of the English language. My first train of thoughts was: “perfect use case for building on top of SplootCode! The structured editing part means half the work is done already since I can label keywords however I want”. Followed by: “but wait! how can this be actually solved so that anyone can use their own tool of choice, which is what people will need?” --- Thus my question is: I believe that the proper solution is to add a _layer_* to code – call the
tongue
layer – that defines the _label_** for each
tongue
. What do I need to understand thoroughly, in order to build such a solution? Who should I speak to? *_layer_ intended as a standalone structure that doesn’t interact with anything but the semantic label of any literal, variable, operator, or function (or any other atomic structure that may exist within a programming language) **_label_ intended as the human-readable part of an atomic structure of a program. For example, the
if
in the structure
if () {}
in javascript, would be
se
in Italian, and
もし
in Japanese.
k
One thought: maybe you need to understand the internalization process. As I understand it, there are libraries that let you wrap all string literals in your program in an annotation and then look up the literal in a language-specific list of translations. So one lens here is that you "just" need to perform this internationalization process to a compiler codebase..
g
It sounds - from my very biased perspective - like you are describing text-to-text translation. If so, then you want to dig into technologies that make text-to-text-transformation easy, e.g. PEG parsing, LLMs (large language models "AI"), macros (you can learn about macros in Lisp, but lisp uses Lists instead of Text for macros). For PEG, I would suggest OhmJS (pdubroy on ohmland discord https://discord.gg/7FqKRZdv, and ohmjs.org) and LLMs (elimisteve on ohmland and programming simplicity discords (https://discord.gg/ZEy2ajN3XQ)) (and, TXL (Source Transformation Language by James Cordy) txl.ca) and the fields of Program Transformation and meta-programming. I have been dabbling in this sort of stuff and would be happy to elucidate (including a DSL for t2t for use alongside OhmJS). (REGEX and CFGs look tantalizingly close, but aren't as good at this as Ohm and PEG and LLMs). Note that OhmJS can do more than just t2t transformation.
m
Thanks @Kartik Agaram, sounds inline with it, I’ll deep dive!
Indeed @guitarvydas , I will definitely leverage LLMs for the text-to-text translation part. Thanks for the links on the transformations too, I’ll look dive into Ohm and read up on PEG parsing as well!
d
You might be interested in this thread: https://futureofcoding.slack.com/archives/C5U3SEW6A/p1706212534780349 The language there has a graphical symbolic representation and a text representation that, as far as I can tell, is not language-specific. I think WordPlay is also designed to be natural-language-agnostic? None of that is probably all that useful to you for your own language except for inspiration. :-)
m
Thank you @David Alan Hjelle , inspiration is always awesome :)
👍 1
d
I'm not sure that "their own tool of choice...is what people need". A billion spreadsheet programmers are just fine with their spreadsheet editor and can always export their spreadsheets if they want to use a different one. The point of your structured editor is to make programming easier by the very fact that it is a different kind of editor. If someone wants to program with a non-structured editor, you can give them their standard code without the tongue layer and they can go. And if someone else develops a different structured editor with something like your tongue layer, you can decide whether to export to their format.
m
💡 2
m
Didn't know they did @Mariano Guerra ! Thanks for sharing!
@Dan Swirsky that’s one of the strongest conviction I had before seeing how people interact with computers 😅