Lately I've been thinking about syntax highlightin...
# thinking-together
s
Lately I've been thinking about syntax highlighting and how helpful it is for developers. Recent research on this seems to be mostly focused on students and beginners, but the consensus (if there is any) seems to be that SH has negligible effects on source code comprehension (some examples: 1, 2, 3). A 2015 paper found an positive benefit, but the study had a small sample size and found that the effect is strongest for beginners. The authors of the 2018 paper (linked above) made the following claim:
Our findings indicate that current IDEs possibly waste a feedback channel to the developer with an ineffective code highlighting scheme. This feedback channel could convey more meaningful information, for example the font colour could encode the type of function in terms of its namespace.
In other words, "semantic highlighting" could be more beneficial for programmer productivity, a paradigm that "attempts to reveal the meaning of the code" instead of just "identifying syntactic elements" [source]. This can mean something simple like giving each variable its own colour, but I think it can also incorporate more creative ideas. I found two IDE packages for semantic highlighting: SemanticColorizer for Visual Studio and semanticolor for Atom. Has anyone here has used those packages (or something similar) and found them useful? I'm also interested what opinions you have about syntax highlighting in general (I've already read Rob Pike's opinion). 🙂
i
I use this occurrence-based highlighting (giving it the semantic attribute seems a bit of an overstatement to me) in IntellIJ IDEA with Scala. Honestly, I haven't felt any improvement, but i'm also not bothered by it, so I kept it active. I guess I should disable it and see whether I'll be missing it. What I'd love to see, though, is a plugin that colors identifiers based on types, so that I can see how types flow through an implementation. I've created a couple images at some point to show this idea. I needed this when I was attempting to explain the implementation of a Cartesian product function to some people at a coding dojo. The language in the screenshots is Haskell.
In the example above, the type-based highlighting makes it easier to notice that the first and second pattern matches of
[]
are on different type occurences.
❤️ 1
t
Few ideas: The value for beginners shouldnt be underestimated because anybody in a new codebase or language is like a beginner. Having colors for keywords and meaning makes it easier to learn the language. highlighting helps with writing code. I find it helps me identify typos while typing because the word is the wrong color. Similarly it helps me identify keywords I forgot were keywords. Semantic highlighting can be thought of as something to strengthen code smell. If code looks the wrong color, it has a very bad smell that's easy to spot, whereas code with a single letter missing has a very subtle bad smell. I'd imagine highlighting would be most helpful for debugging.
1
s
I find it varies with language—in C for example, I find it very helpful, quite often I'll find myself in a conversation of sorts with my editor to make sure the highlighting match up, if it doesn't highlight like I expect, I know I've made a syntactic error somewhere. On the other hand, take any lisp, I don't find highlighting tokens to be too useful at all, but I find myself doing a similar thing with the autoindenter, though rainbow parens are also great. I personally find tooling quite important
f
My experience is that syntax highlighting really helps spotting some errors quickly. If the syntax highlighting is even slightly off, it really confuses me. For example, when I started learning Rust, the VS Code incorrectly highlighted some regular names as built-ins. So before thinking of "semantic highlighting", I think it's more important to do "correct highlighting" first. For most languages, this means that you can't simply use Regex highlighting rules. I've used individually-colored names in Jetbrains products and liked it, but it wasn't a huge productivity boost. Rainbow-Parens are also helpful IMO. When talking about syntax highlighting, it's usually about text / background color and font variants (bold/italic) only. My hypothesis is that source code readablility could benefit from using different fonts for different syntactic elements. Identifiers and literals could use a sans-serif font, documentation a serif one and keywords / punctuation a monospace font. I'm currently building a prototype that uses different fonts depending on the element type. I can hopefully show something next week.
👍 1
x
Douglas Crockford has a similar opinion as Rob Pike about normal syntax highlighting. He discusses his system in his "monads and gonads" video. In it he uses color to signify scope. Here is a vscode plugin based on that idea: https://github.com/azz/vscode-levels I haven't used it myself, but I wonder whether non-traditional syntax highlighting schemes wouldn't have the same level of utility as traditional ones (which I agree is low; though I also use the "wrong" color as a signal that something is syntactically wrong with my code, there are better ways my editor can and does use to inform me of such problems). There is a small set of limited colors that can be unambiguously used at the same time (say max 7ish), incidentally about the same as the limit of how many things I can hold in my head at the same time (7, plus or minus two). This color limit is especially true if I'm using different colors for the same syntax elements (to indicate something other than syntax, such as scope or other semantics). In order to program I need to build up a mental model of the code. Highlighting will only help me if it can extend the capability of my brain to do that beyond its natural limits, e.g. work with a partial mental model, supported by "color context", or help me build up a regular mental model more quickly.
💡 1
f
Another thought on this: even if studies showed that there was no productivity increase when using syntax highlighting, I'd still want to use it just for the aesthetics.
👍🏻 2
👍 1
d
I originally had the same opinion as Rob Pike. Now, I'm programming in C++, and I sometimes disable blocks of code using ``#if 1 try this #else old code #endif`` when I am experimenting with a new implementation. Vim colours the disabled
old code
differently, so I can quickly see that a large block of code is disabled even if I can't see the preprocessor
#
tags that disabled it. That's useful. Ditto for colouring block comments and multiline string literals. On the other hand, giving each token on a line a different colour is just visual noise.
☝️ 2
👍 1
k
The synthesis of many positions is that conventional syntax highlighting is extremely useful for highlighting two things: comments and literals (especially string literals). Beyond that the returns start diminishing quite quickly. I believe this so strongly that I disable highlighting for most things but have 4(!!) colors of comments based on different leaders. (I also like to highlight early exits: `break`/`continue`/`return`.) And then I find the colors I save to be occasionally useful for highlighting individual variables on demand but randomly and persistently (across restarts), giving me a sort of synesthesia where I start to expect certain variable names in certain parts of a codebase to be colored a certain way. Invaluable for highlighting dataflow and side-effects. Here's a couple of screenshots from various points in the past that show off all these features. (I love talking about this topic but often repeat myself.) * https://mastodon.social/@akkartik/101163809901430347 *

https://i.imgur.com/EmFMTtv.png

Here's what I use for "dataflow highlighting": https://www.reddit.com/r/programming/comments/1w76um/coding_in_color/cezpios
👍 3
These days Mu has slightly more conventional highlighting of keywords just because I expect most people to be unfamiliar with the syntax: http://akkartik.github.io/mu/html/apps/arith.mu.html. But you can still see multiple colors of comments here.
i
I've been using semanticolor for the past few days, and I don't dislike it, so I'll probably keep using it. I look forward to offering some deeper n=1 self-evaluation the next time this subject comes up :)
👍 1
Semanticolor update: I like it and will probably continue using it for a year or so, then switch back to syntax highlighting for a bit, so that I can more acutely appreciate the differences. Some immediate shortcomings, which may be avoidable with a richer implementation (I'm using Semanticolor in Atom): • The dynamic colors seemingly don't reflect my theme. They're ugly colors. They don't feel designed. • The colors collide like crazy — I'll post a screenshot below. • When making up a variable name, I now have an additional consideration — what color do I want the name to be? That's even more of an incidental complexity and source of bias than choosing a name already is inherently.
Not to mention...
Invites the question: "semantic" to who, exactly?
k
I'm not sure what you mean by semanticolor anymore. What editor are you using, how many colors does it provide, what's the flow for setting a new color, does it persist across restarts?
i
Sorry — from the OP:
I found two IDE packages for semantic highlighting: SemanticColorizer for Visual Studio and semanticolor for Atom.
I'm using the Atom plugin. It automatically generates colors based on the text of each word, using some sort of hashing I presume. The colors are consistent between files, across restarts, etc. The plugin, as I have it configured, ignores language keywords, literals, operators, comments, and a few other things.
k
Cool. And are there 16-bit colors?
Oh I see it's similar colors.
This is where my solution is superior, IMO. Not every word needs a color, that's just a recipe for angry fruit salad. If we use fewer colors it becomes more feasible to give up control over what colors to use.
x