Heretical idea: a function call syntax where the f...
# thinking-together
n
Heretical idea: a function call syntax where the function name can appear anywhere in the call. First, presume we're using Haskell-style syntax, so that
f(2,3)
(C-style) is written as
f 2 3
(Haskell-style). Second, presume that parameters (in function signatures) must be prefixed with a
&
symbol (or whatever symbol you prefer; we'll need this later), so the definition of
f
would look something like
f &x &y = ...
. Now, imagine the definer of the function can choose where the function name is supposed to appear. So we could define the function f in several ways: •
f &x &y =
, in which case a function call would look like
f 2 3
&x f &y =
, in which case a function call would look like
2 f 3
&x &y f =
, in which case a function call would look like
2 3 f
Why would we want a syntax like this? Many reasons. 1. We get infix operators for free: •
&x + &y =
&x mod &y =
2. We can have multi-word function names: •
&x is less than &y =
if &cond then &a else &b =
3. This syntax erases the distinction between defining three separate functions returning one value each, and defining one function returning a record with three fields. I think this is nice; the distinction seemed arbitrary in the first place! Reducing the number of superficial choices a programmer needs to make helps reduce the cognitive burden of programming. And probably more things 🙂. What are the downsides of this syntax? • Some of the "classic" syntax of programming languages now becomes ambiguous. In particular, we hit ambiguities when passing functions around as values. In Haskell you can write expressions like
map f list
where f is a function being passed as an argument. Given we could now define
f
as an infix or postfix function (see earlier), we need to make sure that we can refer to the function in an unambiguous manner. We could do this by writing something like
map (. f .) list
where the
.
symbol means an unbound parameter. For the multi-word functions, you'd write
map (. is less than 5) list
• There are a few other potential issues I'm working out. Regardless, this seems like an interesting idea, right? 🙂 How do people feel about it?
❤️ 4
a
I think you will open up for a lot of ambiguities. It might be necessary to protect against them by automatic means, like Bison does.
n
Yeah, I'm taking that as a challenge. If I can find an unambiguous syntax and semantics, the payoff could be huge 🙂. Think about all the fun natural-language tricks you could do when defining functions.
One constraint I'm going to be using is to demand each "function call" be delimited by brackets
(...)
. The contents of every pair of brackets will need to fit a function signature exactly (not a prefix or a subset of it). That means there will be no concept of operator precedence or associativity.
But I may be able to recover associativity by adding a separate feature that captures it.
a
Multi-word function names look cool, but I'm not sure they are useful. For example, wouldn't your if...then...else function always evaluate all three arguments? Many lisps implement DSLs using macros, including some that appear more like English sentences, e.g. the Common Lisp loop macro (see http://cl-cookbook.sourceforge.net/loop.html)
n
The syntax concept I’m presenting can be contemplated separately from the operational semantics of the language. So it could be lazy, or strict 🤷‍♀️. I have some ideas about the semantics I want, but that’s for a separate thread 🙂.
👍 2
a
Ah, with parenthesis around every call, you might be halfway there. Will you use the
&
prefix for every argument in function calls also?
n
Nah, I think the prefix is only necessary to disambiguate the definitions. It shouldn’t be necessary at the call site!
a
Consider any user defined functions
(fun1 &X &Y)
and
(&X fun2 &Y)
. What prevents the user from naming variables to make the ambiguous call
(fun1 fun2 var3)
?
👍 1
o
I played a bit with this idea few years ago, to use "symbols with spaces" for JavaScript functions. And I ended using parenthesis I also tried some other weird symbols, just to see if I can find one which is better visually (but failed). https://stackblitz.com/edit/symbols-with-space
❤️ 1
I once thought to use a syntax like the_product_of_$1_by_$2 for the name of the function, to tell where exactly you take the parameters.
On this topic, maybe some inspiration can be taken from the text language to specify Scratch Blocks program using text. Which is only used on the Scratch forum. It is interesting that it is not a text language that is designed to be executed directly, only to generate an image of a Scratch Program that might be executed (or not). https://en.scratch-wiki.info/wiki/Block_Plugin/Syntax
❤️ 2
n
@Axel Svensson
Consider any user defined functions 
(fun1 &X &Y)
 and 
(&X fun2 &Y)
. What prevents the user from naming variables to make the ambiguous call 
(fun1 fun2 var3)
?
There should be some rules for how and when names/words can be re-used. The simplest is probably that if a word appears in a function name, then it can't also be the full name of a variable (nullary function). Assuming
fun1
and
fun2
aren't pre-defined variables, your example doesn't violate the aforementioned rule. However, that call would match nothing, so you'd get a compile time error.
@ogadaki Cool! There are probably a million directions you can go with the idea of space-separated symbols, especially if you start treating them as lists, i.e. data, as Lisp does. I'm being conservative right now though, and just thinking about this as a "generalized function call syntax". I'm going to check out that Scratch syntax 🙂. (Regarding brackets: I actually prefer the square bracket syntax partly because it's list-like, but also because you don't have to hold shift to type them 😇).
g
Check out objective-c’s inline arguments function syntax. https://www.tutorialspoint.com/objective_c/objective_c_functions.htm
(int)max:(int)num1 andNum:(int)num2 {…}
👍 1
o
(Regarding brackets: I actually prefer the square bracket syntax partly because it's list-like, but also because you don't have to hold shift to type them 😇)
That depends on your keyboard layout. On mine (AZERTY, French), it is the parenthesis that don't need extra key and both curly and square brackets require "Atl-Gr" key. Which is annoying when you code and that's why some French devs prefer using a qwerty keyboard (so without "éàè..." keys).
n
Ah, I didn’t realise they did that for brackets! I’d love to get some statistics on (for each ASCII symbol) the percentage of people on earth who can type that symbol without any modifier keys 🧐. It’s really important for a language designer to know 😅
1
k
I wanted something like this for my digital scientific notation Leibniz (https://github.com/khinsen/leibniz) and found it very easy to implement since Leibniz is a term rewriting system. There are no function calls, only rewrite rules, so it's no problem to make the equivalent of function-call syntax function-specific. For the parenthesis issue, I adopted a feature from the Pyret language (https://www.pyret.org/index.html): it requires parenthesis around any operator/function call, the one exception being chained use of the same operator. So you can write
2 + 3 + 4
, meaning
(2 + 3) + 4
, which is the most frequent situation where parenthesis can become visually dominant. It looks to me as if this approach could be transposed to more standard programming languages, but that's a bit outside of my expertise.
m
I created a language with similar ideas a while ago, it's almost complete: https://github.com/marianoguerra/interfix
👌 1
😮 2
a
I'm sorry but did you reinvent Agda's Mixfix syntax? 😅
m
never saw it, so coinvented 😛
👌 1
interfix doesn't require the underscores
d
You can call me a convert and subscribe me to your heresy 😛 I've always been convinced of the need to reduce the syntax needed in formal end-user languages, and decoupling parameters from their exact position is a basic tool for that; passing parameters by name instead of position was an improvement in post-C languages, so it's easy to imagine what benefits that style can bring. I'll go deeper into heresy: in a visual environment (think e.g. a spreadsheet or graphical design tool), _you don't even need the function name to be adjacent to the parameters_; the environment itself can suggest the list of nearby locally available values that are compatible with the function's input types, and the user chooses which one is appropriate. Applying functions becomes a point&click interaction rather than recalling which syntax I need. This can even depend on named parameters. Think of an IDE where, when you're passing a parameter to a function, "intellisense" suggests all nearby variables in scope with the right type that could be used for that parameter. Conversely, you could select a value and type a parameter name over it, and it could suggest function names that use that type of the value with that parameter name as a role for the value, so that the user to choose the most adequate suggested function. How many less API reference queries would it take for the user of this system? In this style of coding, the structure of connections is more important than the concrete syntax of the text you create them with. In my opinion visual languages are the ones that can benefit the most from this separation between structure and placement, as perfect placement of values w.r.t. functions in a graphical environment is more difficult than with raw text.
🎯 1
a
However, that call would match nothing, so you'd get a compile time error.
@Nick Smith could you explain how? I imagined it'd match both.
d
Natural languages split into synthetic (eg Russian, Turkish) and analytic (eg English, Chinese) languages (very roughly). Synthetic languages do a lot of agglunitation and morphology, for example they have cases like accusative or locative and lots of agreement. Analytic languages don't bother much with aggrement and morphology (which is also why NLP on English always starts off so easily). But that comes with a price: English has a relatively fixed word order, whereas in Russian you can basically order the words in your sentence almost any way you like. So from that we learn that in order to avoid ambiguity, if we could tag the tokens in our programming language with a 'case', we could get away with much more free token order in each statement.
❤️ 2
n
@Denny Vrandečić Do you mean we could let the programmer choose the order of tokens for each function call? If so, I'd consider that an anti-feature, because now there are many syntaxes for the same call, and we can no longer exploit our "shape recognition" capabilities to quickly identify a function. There's merit to only allowing the definer of the function to choose the token order. But perhaps I'm missing your point.
d
Yes, that was my suggestion.
n
Do you think there are benefits to that level of freedom? Would it help Russians write more readable code?
@Axel Svensson We've probably got different ideas about the underlying semantics. If I had to guess, you're thinking of each symbol in the sequence
(a b c)
as a value. In contrast, I was thinking of each symbol as a "piece of syntax" that helps identify which function to call. Under your interpretation,
(fun1 fun2 var3)
is ambiguous, because it could be interpreted as passing the symbolic value "fun2" to the fun1 function, or as passing the symbolic value "fun1" to the fun2 function. Under my interpretation, there is no function definition with the signature
(fun1 fun2 .)
, so you get a compile-time error. But since starting this thread, I've actually been trending towards the first interpretation. It might turn out to be "nicer" in practice, especially as an alternative to passing strings around everywhere. But I think you'd now want/need variable references to be distinguished from symbols, using the
&
syntax as you mentioned, or even just the parentheses (after all, a variable reference is just a function call with zero arguments). So a function call would look like
fun1 (a) (b)
or
fun1 &a &b
. Though I've been thinking about a more lightweight syntax inspired by Rust's lifetime syntax:
fun1 'a 'b
. It's probably necessary to avoid parentheses/syntax hell.
👍 1
a
Since we're all wild heretic here: How about capitalization carrying syntactic significance? Small initial letter means it's part of the invoked function name, e.g.
fun1 Fun2 Var3
means invoke fun1 with fun2 and var3 as arguments, while
Fun1 fun2 Var3
means to invoke fun2.
So essentially, capital first letter instead of
&
. The heretical part would be to have the first character (or all characters) in identifiers be case insensitive.
n
Would that be giving special treatment to Latin-based alphabets though? It would be nice to support arbitrary Unicode. Imagine asking someone writing in Hiragana to start their words with an English letter!
👍 1
Also, that proposal wouldn't work with symbols like
+
and
*
. But I do like the sneakiness of it 😉.
a
True. "First character in Katakana or upper-case Latin!" :-D
n
@dialmove I think there's a lot of merit to ensuring identical function calls have an identical visual structure. Readability/skimmability is one of my key concerns. I am definitely interested in smart IDEs, but I'm not sure that requiring the use of a smart IDE to specify arguments (rather than having a rigid syntax) is a good thing. If you had this, I'd hope the manipulations would be immediately serialized back into a rigid textual form that is easily skimmable.
@Konrad Hinsen To enable the writing of expressions like
(2 + 3 + 4)
, I'm now wondering if I can extend my proposed function call syntax to something based on regexes. For example, to parse the aforementioned expression you might be able to write a function signature like:
&x (+ &y)* = ...
And then
&y
would be assigned a list of numbers, rather than a single number. Of course, regular expressions aren't known for being easy to understand. Perhaps I could find a somewhat simpler pattern language that still allows the description of arbitrary-length expressions.
k
@Denny Vrandečić There's Perligata for a transposition of the analytic vs. synthetic concept to programming: https://users.monash.edu/~damian/papers/HTML/Perligata.html
😲 2
@Nick Smith Pattern matching looks like an idea worth exploring. But as you said, it needs to be more human-friendly than regex.
@Mariano Guerra Was interfix in any way inspired by Smalltalk? Smalltalk keyword messages feel very similar.
m
@Konrad Hinsen yes, smalltalk and dylan where some of the inspirations
d
@Nick Smith not sure if there's a benefit. In natural language it has the advantage of emphasis. It can give some flexibility w.r.t. visual layout of the code. But in the end, it might make it more complex without a clear win. I am not advocating it, just an idea 😄
👍 1
n
Having thought about things further, I think my proposal ends up being about metaprogramming and homoiconic languages pretty quickly. Perhaps I should search for inspiration from those.
b
I also took a crack at this 2 years ago. We called it "anyfix"/"omnifix". It's fun to see these other implementations with names like Interfix and Mixfix! The use case for us was talking to medical care providers and discovering that many would invent their own written shorthands (grammars) for jotting down EMR data while working with patients, which would often have out-of-order parameters (20 inches 10lbs or 10lbs 20 inches etc). IIRC ambiguity was the exception rather than the rule. We figured if it's very clear to humans what the intent is the parser should be able to figure it out too. It was rather easy to implement in Tree Notation, which is a whitespace based syntax where each row is a node split into cells. The line parser first determines what kind of node the line is, and then a cell parser determines what the type of each cell is. A node definition states the cell types expected and whether to use a prefix/postfix/omnifix parser. Simple tests (such as regex) are then used to detect the type of each word in the omnifix case. In theory this puts the onus of avoiding ambiguity on the language designer, but in practice (IIRC from our small experiments) it was surprisingly easy to avoid. It seemed that usually when you have 2 params with the same type, it was often better to take a list instead. A really dumb toy demo for our implementation: https://jtree.treenotation.org/designer/#standard%20poop
👍 1
d
My language works this way because it's not based on functions as such, but on a lower-level mechanism: rewriting after pattern matching. The "function" symbol is just another pattern element, not distinguished from "data" or "parameters".
n
@Duncan Cragg Yeah, I'm thinking of this in a similar way: a "function signature" is just a pattern, where none of the tokens are special "function name" tokens. Function calling is just pattern matching. However, I'm not thinking about rewriting, since that is conflating operational semantics with the mere syntax presented here. The actual means by which a computer executes a program consisting of these patterns can be considered separately. I'm aiming for a very non-conventional operational semantics, inspired by relational programming.
k
@Nick Smith I see your point but I think you would probably gain from studying term rewriting systems. A main featured distinguishing term rewriting from traditional function-calling approaches is precisely the separation of syntax and evaluation. Term rewriting systems start from a term algebra, which just says what valid terms look like. The next layer adds patterns and pattern matching, the result of which is a substitution. With patterns you can define rules. Rewriting is yet another layer, in which you have to decide on a strategy to select and apply rules. If you want different operational semantics, you just change the last one or two layers, but you can keep the lower ones. In contrast, with traditional PL design, there is no useful layer between the AST and the full language, evaluation semantics included.
n
What if my semantics involves the equivalent of database equi-joins and aggregation? Surely those can’t be described in terms of term rewriting.
i.e. my operational semantics has nothing to do with substitution of terms
k
I am not a database expert, but my first reaction is that everything can be encoded in terms. Terms are isomorphic to s-expressions, XML trees, etc. You may in the end want specialized hand-written implementations for some of those terms, for performance reasons, but I'd be surprised if your semantic would be impossible to express as terms.
🤔 1
d
Rewriting is Turing Complete so you can do anything you want. 😃👍
The actual transformation mechanism from before-state to after-state in the rewrite needn't be TC, though, and is probably better off not, so it terminates!
Primitive Recursive perhaps? Need a fixpoint each time: when applying the rewrites has no effect or there are no matching rewrites.
a
@Duncan Cragg
My language works this way because it's not based on functions as such, but on a lower-level mechanism: rewriting after pattern matching.
Sounds like Markov's Normal Algorithms and Refal programming language. 🙂
d
Wow, that Ruskie Functional language is new to me! Thanks! I'll dig into that. The Markov thing is also new to me but I can't find any intro to it that's not behind a paywall!
a
@Duncan Cragg that's strange, I thought every decent "introduction to the theory of computation" book has at least a chapter on it... 🤔
d
Markov Chains maybe?
Or HMMs
k
Not the same Markov. Markov chains were invented by the father of the inventor of Markov algorithms.
d
Crikey, I'm learning a lot today! Two Markovs!