I have a simple, yet annoying validation problem. ...
# present-company
w
I have a simple, yet annoying validation problem. Can any of you think of a language/framework that makes validating the following sort of schema as clear (or clearer) than the English language specification? Suppose we have a playhouse with a bunch of rooms. In each room, we want there to be a box, a ball or two, and no other toys. If there are two balls, one should be red and the other should be blue. Report all the rooms that have the wrong toys and exactly what's wrong.
And yes, coding this up earlier today fizzed my buzz: 34 lines, half an hour, several mistakes.
e
relaxng (XML validator) has an interleave operator for this sort of thing. Its been a while since I last wrote a gramamr but I think it would look something like this:
Copy code
grammar {
  start = element house { room*}
  room = element room { box & ((red & blue?) | (blue & red?)) }
  box = elem box { "box" }
  red = elem ball { "red"}
  blue = elem ball { "blue" }
}
👍 2
or else any other grammar would help (ANTLR, PEG, etc):
Copy code
grammar = "house" room*
room = "box" ("red ball" "blue ball"? | "blue ball" "red ball"?)
I think it is useful to think in terms of grammars because then becomes obvious when the thing you are trying to validate is not context free
👍 1
in XML world, once you need to validate something were a context free grammar is not enough, you reach for Schematron.
hah had to edit my grammars. Not sure if they match your requirements exactly but you get the idea 😛
j
+1 @Emmanuel Oga's suggestion of grammars, to which I'd add any other automata generating-abstractions. For example, regular expressions over the domain of toys (as opposed to the domain of characters).
d
This writeup expresses the cleanest approach I know of: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/
g
the trickiest part of this is the “report all rooms that have the wrong toys and exactly what’s wrong”—i’d be tempted to reach for the really dynamic grammars that let you pretty easily insert whatever side effects/logic you want, like OMeta/ohm or rebol/red’s parse dialect. do you just want console output or structured data about the errors?
🤔 1
j
After someone asked in private what I was on about up above, I offer as an example this "regular expressions over sequences" library in Clojure: https://github.com/cgrand/seqexp The relationship between formal languages and automata is a worthwhile area of computer science study for practical minded practitioners. 🙂
v
Wouldnt a datalog or mini-kanren style approach be useful?
👍 1
w
Ah @Garth Goldwater, OMeta haven't heard that in a while! Yes friends, an important part here is the detailed error reporting. I'll need to read Parse, don't validate @Don Abrams. Seems many non-trivial programming problems have a parsing (state machine) feel, and at one level we have good parsing tools though they seem to be underutilized in practice. One interesting thing here that counts matter and order does not. (Of course, one quick fix is to sort the list of toys in a room and then match on that sequence.) As for error reporting, here's an idea. Imagine an interactive grammar generating tool. We start with golden path of validity (imagine a sorted list), something like:
Copy code
Room = "box" ("blue ball" "red ball" | Color " ball")
Then the tool would propose invalid sequences for which we have to generate error description productions. That would be neat. I suppose other perennial parsing problems are tokenizing nicely so that you can operate on higher-level constructs, and I mean once you are acting on tree-ish data rather than streams, it begins to feel like regular functional programming...
b
I am still learning it myself, but would Prolog be able to do this?
w
Short answer: yes-ish. Prologs can be a bit odd in what they can and cannot do.