I ve put together some thoughts about syntax they re pretty Future of Coding #thinking-together

I've put together some thoughts about syntax, they...

Jared Forsyth

11/09/2024, 1:22 PM

I've put together some thoughts about syntax, they're pretty roughly defined at the moment, but I'd love to get your feedback! https://gist.github.com/jaredly/593d66a955b09572f3810b43b75a22a1

Misha A

11/09/2024, 2:56 PM

The structured editor that I'm building

for which language(s)?

Jared Forsyth

11/09/2024, 3:16 PM

The idea is to be multi-modal, but I've started it with a clojure-esque language that I'm creating. Ultimately it's an editor for developing new languages, and I want to have various syntax options

Misha A

11/09/2024, 3:38 PM

Can you expand on what is

structured

for you? Syntax as in "commas, parens, semicolons" (collections and atoms) is just a part of the story. Other part is semantic meaning of that syntax. For example the difference in meaning of tokens in

(+ 1 2 3)

(if 1 2 3)

, or meaning of vector(

[]

) elements in

(defn foo [a b] ...)

(let [a b] ...)

. This is why both "lisp has almost no syntax" and "lisp code is AST" - BS. > Ultimately it's an editor for developing new languages "editor for new clojure/lisp macros" would be nice test/milestone/challenge for it. I tried to approach same/similar problem recently as "DSL for custom macros support for Clojure IDE", because

condo

configs are nightmare: https://clojurians.slack.com/archives/C06AH8PGS/p1713614069031559 (https://clojurians-log.clojureverse.org/instaparse/2024-04-20)

Misha A

11/09/2024, 3:45 PM

this might be relevant too https://www.hytradboi.com/2022/codebase-as-database-turning-the-ide-inside-out-with-datalog/

Jared Forsyth

11/09/2024, 4:16 PM

Yeah so my structured editor works at a level between raw text and the AST. I've tried doing structured editors at the AST level, but it ended up being misaligned with the way I wanted to be inputting & manipulating code. So I think that the level that treats

(+ 1 2 3)

and

(if 1 2 3)

the same is the right spot for editor manipulation. And then a language's parser converts this "concrete syntax tree" into an AST

Jared Forsyth

11/09/2024, 4:17 PM

Yeah I had a great chat Peter Vilter a couple years ago about his datalog stuff, it's very cool!

Misha A

11/09/2024, 6:40 PM

"paredit for js"? slurp, barf, wrap in parens/brackets/curlies, swap tokens (left-right, top-bottom)? Feels like not enough to make new editor. What else you have in mind? (if you don't mind ofc)

Jared Forsyth

11/09/2024, 7:37 PM

hah so, there's a variety of things going on in the project • structured editor is a part of it (I guess I haven't really gotten around to writing out my "why structured editors" thoughts, but one game-changer is persistent addressability in the midst of changes. for an editor to be able to have a durable location for e.g. "the

name

of the function `flatMap`" that's not a line/col pair that will break at the slightest touch, unlocks a lot of nice things) • jupyter/observable/etc. style super-REPL/literate programming environment for pure functional languages • unison-style "terms are referenced by the hash of their contents, stored and synced in a database" • a Development Environment for programming languages themselves, making it easy to iterate and play with various aspects of a programming language (compilation targets, execution semantics, type inference algorithms) in relative isolation, as well as enabling the bootstrapping of self-hosted languages

Jared Forsyth

11/09/2024, 7:40 PM

It used to be "I want to make a programming language that has All The Best Features" and while I was at it I figured I'd make a structured editor for it at the same time, because I've tried making a structured editor for existing languages and concluded that it would work much better if the language (& compiler) were designed with structured editing in mind.... and then I got a little distracted by wanting to make "a minimally-featured language that is capable of self-hosting its own type inference while being nice to use", and so it has morphed into being an Editor Environment that can be used to make a variety of programming languages 🙃

Jared Forsyth

11/09/2024, 7:42 PM

Thus far the editor has only allowed clojure-style syntax, but the past few days I've been wondering what it would take to open it up to c-style languages, and if I'm going to do that might as well come up with a General Unified Theory of Syntax 😄

Misha A

11/09/2024, 7:54 PM

It seems to me #1 is at odds with `level that treats

(+ 1 2 3)

and

(if 1 2 3)

the same is the right spot for editor manipulation` because all you get is

some-hash[0][1][0][6]

(nested array address) or something w/o knowledge what

defn

means. re unison: YES. designing new lang w/o even giving it a try to be "content-addressable" – ... it solves/simplifies/amplifies so much later on in the toolchain: deps, version control, diffs/reviews, (structural)editing. (in my like 4th spare time I try to retrofit content-addressability onto at least a subset of clojure, which too started elsewhere: from custom macros, to kondo-config for it, to "screw it - I'm writing myself a clojure IDE with blackjack", to "might as well make bake in addressability and distribution for source control, because git is both overkill and underwhelming (like any text-files-diffing SCM)")

Misha A

11/09/2024, 7:56 PM

relevant too:

https://www.youtube.com/watch?v=GB_oTjVVgDc▾

Jared Forsyth

11/09/2024, 7:56 PM

Ah so

(+ a b c)

is actually a map of node-id to node,

0=list(1 2 3 4), 1=id(+, ref=hash of the + function), 2=id(a, ref=hash of the a term)

etc

Jared Forsyth

11/09/2024, 7:57 PM

So the

(defn + [a b] ...)

is addressable as

some-toplevel-id : the-loc-of-that-id-node

Jared Forsyth

11/09/2024, 7:59 PM

which ends up being nicely durable

Jared Forsyth

11/09/2024, 8:00 PM

Yeah Dion is super cool! I wished they'd produced more about it 😭

Misha A

11/09/2024, 8:01 PM

so essentially what I wrote?

hash[1st][0th][7th]

or am I not seeing something? also how is "hash of a func" different from "hash of a term"? so you have some "rule" that "1st item in a () list - function call"? that's a semantical knowledge I mentioned

Jared Forsyth

11/09/2024, 8:01 PM

nope nope

Jared Forsyth

11/09/2024, 8:02 PM

(sorry talking afk, 1 minute)

Misha A

11/09/2024, 8:04 PM

at the very least you need to differentiate "new name `N`" from "reference `R`", as in my example

defn

let

Copy code

(defn foo [x y] ...)
 R    N    N N

(let [x y] ...)
 R    N R

and that is semantic knowledge, not just "collections and atoms"

Jared Forsyth

11/09/2024, 8:06 PM

yeah so in the editor identifiers are by default "unlinked", and there are editor affordances for "linking" an id to a definition

Jared Forsyth

11/09/2024, 8:07 PM

and the parser provides hints back to the editor about when to provide those affordances

Misha A

11/09/2024, 8:07 PM

+ constants/literals

+ scope (which

are known, and which are error))

Copy code

(defn foo [x y] ...)
 R    N    N N
(quote (defn foo [x y] ...)
 R      C    C    C C

Jared Forsyth

11/09/2024, 8:08 PM

C is the same as N, no need to distinguish

Jared Forsyth

11/09/2024, 8:08 PM

Importantly: linking an ID (turning an N into a R) is done by the user, not by some out-of-band algorithm

Jared Forsyth

11/09/2024, 8:09 PM

So an important difference from unison: I'm not trying to Normalize All The Things

Misha A

11/09/2024, 8:09 PM

> linking is done by the user are you describing "when user writes grammar for new lang"? or "when user programs in new lang"?

Jared Forsyth

11/09/2024, 8:10 PM

Writes programs using the new lang

Jared Forsyth

11/09/2024, 8:10 PM

it's autocomplete that actually means something

Jared Forsyth

11/09/2024, 8:12 PM

So more realistically, the toplevel

(defn a [b] c)

probably looks like

id=x45r, root: 37, nodes: 37=list(11 3 7 1), 11=id(defn, ref=builtin), 3=id(a, ref=null), 7=array(20), ...

. So "the name of that defn" is

(x45r, 3)

Misha A

11/09/2024, 8:12 PM

then you need "scoping rules", or rather "autocomplete needs to know scoping rules". that's again part of semantics of particular list of atoms (I might have a tunnel vision, because I spend lots of time in with in from the clojure pov)

Jared Forsyth

11/09/2024, 8:13 PM

hahaha

Jared Forsyth

11/09/2024, 8:13 PM

Yeah, that's the part where the parser gives autocomplete hints back to the editor

Misha A

11/09/2024, 8:13 PM

"the name of that defn" -

defn

(defn ...)

Jared Forsyth

11/09/2024, 8:14 PM

sorry

is what I meant to be referncing

Jared Forsyth

11/09/2024, 8:14 PM

the ID the defines the 'name' of the function that is produced by that toplevel

Jared Forsyth

11/09/2024, 8:16 PM

This is more ~~relevant~~ interesting when a toplevel can have multiple exports, for example with

(deftype (option a) (some a) (none))

Jared Forsyth

11/09/2024, 8:16 PM

References to the type constructor

some

have a durable reference to the id that defines the name of it, so renames are trivial

Misha A

11/09/2024, 8:29 PM

how scope "spreads" is a semantic too. in clojure (again, sorry :D) - there are (at least) parallel scope, forward sequential scope, backward sequential scope: here, numbers is sequence of scope propagation (higher number gets its scope from prev number):

Copy code

forward + backward example:
0
 1   2
        3
(let [a x b a] [a b])
      4     
            5
          6    7
                8 8 ;; a and b have parallel scope at this point
 
parallel example:
0
 1       2
          4 3 4 3  5
                    6 6 
(binding [a 1 b 2] [a b])  
;;The new bindings are made in parallel (unlike let);

also notice, that in

let

ejects/exports its cope from vector to body

[a b]

, but body does not export scope outside let (propagation stops). So spread direction is based on the meaning of first symbol, and the fact that 1st symbol meaning is important - is a higher level semantic too

Misha A

11/09/2024, 8:32 PM

One instance of scope

export

is created global definition: in

(defn foo [a b] body)

defn

exports

foo

to the global scope, but not

body

. which is solely semantics of

defn

Jared Forsyth

11/09/2024, 8:32 PM

So locals aren't locked in

Jared Forsyth

11/09/2024, 8:32 PM

Also the parser tells the editor what is exported

Misha A

11/09/2024, 8:32 PM

so parser knows what's up (which list el is local which is not), because you backed in some semantics in it (for c-like langs - defined a set of keywords and what they mean: if def for while)

Misha A

11/09/2024, 8:33 PM

but if you allow user-defined macros in your lang - you need to provide a way for user to let parser know what's up

Jared Forsyth

11/09/2024, 8:36 PM

So macros work on the cst, and are expanded before the parser operates

Misha A

11/09/2024, 8:38 PM

basically you describe(bake in) N of special forms and their semantics during lang-design-phase, and then rely on macroexpand for autocomplete?

Jared Forsyth

11/09/2024, 8:47 PM

Oh yeah so macros also can report autocomplete hints

Misha A

11/09/2024, 8:49 PM

> C is the same as N, no need to distinguish literal symbol

defn

is not the same as

defn

which is meant to be looked up and resolved as e.g.

clojure.core/defn

so either you need to prompt user on every word "is it a ref or is it static/literal?", or forbid literals, or rely on semantics of something in the text before the word, again, in clojure:

quote

, which is a semantic not just "colls and atoms"

Misha A

11/09/2024, 8:53 PM

this is all long winded way to say that "syntax families" seem to be incomplete w/o mentioning scope propagation rules and semantics

Jared Forsyth

11/09/2024, 9:02 PM

So when typing defn, if it autocompletes to link then it is R otherwise it's C

Jared Forsyth

11/09/2024, 9:03 PM

Macros operate on Rs mostly tbh

Jared Forsyth

11/09/2024, 9:04 PM

But yeah only global scope references are linked

Misha A

11/09/2024, 9:04 PM

mostly≠only )

Jared Forsyth

11/09/2024, 9:04 PM

Also macros don't have access to any environment to resolve things

Jared Forsyth

11/09/2024, 9:05 PM

The only time they'd use a C is for a numeric literal or as the export-name for a new definition

Jared Forsyth

11/09/2024, 9:07 PM

Either a macro consumes core/defn as an R, or has it referenced in its definition, or it doesn't have access to it

Misha A

11/09/2024, 9:07 PM

how macro knows it's an R when there is no env access to look it up?

Jared Forsyth

11/09/2024, 9:07 PM

an attribute on the node

Jared Forsyth

11/09/2024, 9:07 PM

Whether node.ref is null

Misha A

11/09/2024, 9:07 PM

so macro receives already resolved things?

Jared Forsyth

11/09/2024, 9:08 PM

Yup

Jared Forsyth

11/09/2024, 9:08 PM

Well it can also access "terms associated with resolved things it has received"

Jared Forsyth

11/09/2024, 9:08 PM

Where term associations are explicitly defined as a first class thing

Jared Forsyth

11/09/2024, 9:10 PM

This is all critical for term hashes to be useful. Can't depend on "the whole environment"

Misha A

11/09/2024, 9:13 PM

ok,

defn

is a macro, it receives

foo

and

a b

all 3

foo a b

exist in global scope (= can be resolved, and are resolved before being passed into macro) now

defn

throws away resolution and assigns new roles of

global

to `foo`and

local

and

Jared Forsyth

11/09/2024, 9:14 PM

So def isn't a macro

Misha A

11/09/2024, 9:14 PM

foo a b

are not in the globals, resolution resolves to 'unknown' and passes that to

defn

Jared Forsyth

11/09/2024, 9:14 PM

Gotta be built in

Jared Forsyth

11/09/2024, 9:14 PM

Macros bottom out to def/def type/etc

Misha A

11/09/2024, 9:15 PM

built in

you mean

semantics (scope propagation rules, locals/globals export) baked in into "parser"

Misha A

11/09/2024, 9:17 PM

ok, but what about

defn

being macro in clojure (bottoms out to

def

), and all those

prismatic.schema/defschema

etc. basically any macro exporting new global?

Jared Forsyth

11/09/2024, 9:17 PM

Mean it (def) can't be a macro

Jared Forsyth

11/09/2024, 9:17 PM

defn produces a def, which the parser determines produces an export

Jared Forsyth

11/09/2024, 9:18 PM

This allows different parsers (e.g. languages) to have different forms for defining things

Misha A

11/09/2024, 9:18 PM

> defn produces a def, which the parser determines produces an export yes, but

defn

knows what is new global, and what is args (new locals). and you resolve them before

defn

gets them

Jared Forsyth

11/09/2024, 9:19 PM

"new global" is just an id with ref=null

Misha A

11/09/2024, 9:19 PM

Jared Forsyth

11/09/2024, 9:19 PM

'fn' also can't be a macro

Misha A

11/09/2024, 9:20 PM

does editor show it as an error? how does editor know it's not an error, and

foo

is ok to be unresolved at this particular place: second token inside

(defn foo ...)

list

Jared Forsyth

11/09/2024, 9:21 PM

Parser knows what ids need to be resolved

Misha A

11/09/2024, 9:21 PM

macroexpand + "source map" from exported new global back to

foo

in the (defn foo)?

Jared Forsyth

11/09/2024, 9:22 PM

No need to source map :) durable ids

Misha A

11/09/2024, 9:22 PM

I mean conceptually

Misha A

11/09/2024, 9:23 PM

"expand, and see that id=7 goes from unresolved to export-new-global, and its all good"?

Jared Forsyth

11/09/2024, 9:27 PM

I mean it's a parser error to use an id with ref=null as an expression if it's not resolvable with local scope

Jared Forsyth

11/09/2024, 9:28 PM

I can imagine a parser using an unresolved I'd in other ways that it would determine are valid

Jared Forsyth

11/09/2024, 9:29 PM

In fact it would be a parser error to use a resolved id as the "name of what I'm exporting"

Jared Forsyth

11/09/2024, 9:30 PM

So it's not a "how do I ensure unresolved ids eventually have a home" problem, it's more generally "do all these nodes make sense"

Misha A

11/09/2024, 9:33 PM

In fact it would be a parser error to use a resolved id as the "name of what I'm exporting"

can't redefine things this way.

Copy code

(def foo 1)
(def foo 2) ;; exports already resolved(able) global foo

Jared Forsyth

11/09/2024, 9:34 PM

Yeah so no name conflicts allowed in the same module

Misha A

11/09/2024, 9:35 PM

Copy code

(def elsewhere/foo 1) ;)

Jared Forsyth

11/09/2024, 9:36 PM

And it wouldn't autocomplete to resolve that id

Misha A

11/09/2024, 9:37 PM

because it knows from hardcoded knowledge "no qualified symbols here"?

Jared Forsyth

11/09/2024, 9:37 PM

Parser decides what autocompletes

Misha A

11/09/2024, 9:38 PM

I understand, I just try to zero in on "based on what"

Jared Forsyth

11/09/2024, 9:38 PM

When the parser is parsing, and sees the sibling to a 'def' in this case

Jared Forsyth

11/09/2024, 9:39 PM

At the top level

Jared Forsyth

11/09/2024, 9:39 PM

Btw ids with refs are underlined

Jared Forsyth

11/09/2024, 9:39 PM

Visually distinct

Misha A

11/09/2024, 9:39 PM

circling back to "IDE for defining new languages": that initial grammar/lang-description needs to provide that info for parser/autocomplete.

Jared Forsyth

11/09/2024, 9:40 PM

Yeah so the base lang is just raw js lol

Jared Forsyth

11/09/2024, 9:40 PM

In a big ol string literal

Misha A

11/09/2024, 9:40 PM

😄

Jared Forsyth

11/09/2024, 9:40 PM

And then you use that to make other languages

Misha A

11/09/2024, 9:41 PM

does that base-lang restricts what semantics are un/available to new-langs? or is it to academic or hard to tell atm?

Jared Forsyth

11/09/2024, 10:10 PM

So the base lang doesn't produce any restrictions

Jared Forsyth

11/09/2024, 10:12 PM

The nature of the editor and such does produce limits though. For example, macros don't have access to the environment. The parser and compiler don't even have global access. Dependency graphs are calculated by the editor.

Jared Forsyth

11/09/2024, 10:13 PM

Also impurity is a no go for the repl to make sense

Alexander Bandukwala

11/12/2024, 6:04 PM

Still reading this thread. But in response to the original post. I think it would probably be useful to talk about syntax in terms of formal grammars where you have terminal/non-terminal symbols. I think terminal symbols are your atoms and non-terminal symbols are your collections. Expressing these grammars in terms of a meta-syntax may also be useful for grouping languages together with similar structural editing affordances.

Jared Forsyth

11/12/2024, 6:05 PM

That's a good tip, thanks!

Misha A

11/12/2024, 9:05 PM

btw, which structure-edit-operations you have in mind?

Jared Forsyth

11/13/2024, 3:35 PM

so I'm starting out building a structured editor that is "usable as a normal editor", such that most of the keystrokes would be the same as in a text editor for general "code input" operations. And then I'll work on layering structure editing on top (probably with a heavy emphasis on end-user-coded transforms)

Alexander Bandukwala

11/13/2024, 5:02 PM

That sounds similar to the work we're doing on Hazel using tylr https://hazel.org/papers/tiny-tylr-tyde2022.pdf

Jared Forsyth

11/13/2024, 5:14 PM

yup 🙂 I'm a huge fan

Jared Forsyth

11/13/2024, 5:14 PM

haven't read that paper though, thanks for the link

Misha A

11/16/2024, 10:30 AM

re syntax flavors: https://futureofcoding.slack.com/archives/C03RR0W5DGC/p1731679753126049 reminded my of • commenting out or ignoring blocks of code/text:

/* ... */

//

#_

(comment ...)

• markers/indentation as denotation of nestedness (python):

>>

\t

Open in Slack

Previous Next