Lately I've been <experimenting with recognition o...
# two-minute-week
a
Lately I've been experimenting with recognition of hand-drawn symbols in embedded structures, working with Luke Iannini in Realtalk Thinking about how perceptually salient properties (such as spikiness/roundness, wonkiness) could be taken into account in a kind of analogue interpretation of the shapes alongside discrete symbol recognition as 'signposts' in feature space. and what happens to those features when some symbols are marked out as higher order functions. Thinking about syntax based on proximity and containment rather than adjacency. also what happens when the parser itself is part of the scene.. e.g. how does its orientation change the parsing of the symbols? Would love to hear about other projects that have explored this kind of area!
k
I'm just starting to dip my toes into recognizing gestures on a touchscreen, which feels like a kiddie pool for your project.
s
CC @wolkenmachine @Ivan Reese
g
FWIW: I’ve been playing with parsing non-textual language. Conclusion: grade school math plus relational languages (PROLOG in my case) plus only a small handful of relations goes a long way towards something usable: • /recognizing/ symbols and gestures is orthogonal to /parsing/ symbol-based languages ; “editing” and “parsing” and “semantics checking” should be completely orthogonal (accidental complexity abounds when the above is conflated together) • containment • connections • bigger/smaller • colour/shape/etc • SVG has just everything needed (in fact, I ignore a lot of SVG, no swoopy art, just boxes and arrows and ellipses) • blocks of text are “symbols”, their contents can be written in any 3GL, then parsed as black boxes along with the above • real humans think of rectangles as isolated, stand-alone thingies --> isolation is key (I call it “0D”) • currently using draw.io to edit drawings, then manually-written code to parse (e.g. hand-written semantics code using XML parser which sucks symbols (and some relationships) out of diagrams) • ignore stuff that is hard to parse (swoopy stuff can remain in the diagrams, but is treated like comments) • I transpile the hybrid diagrams to code in some existing language(s), or, run them directly without transpilation • my friends: Ohm-JS, PROLOG, PEG, backtracking parsing, (I think miniKanren would work, too)
c
Thinking about syntax based on proximity and containment rather than adjacency.
Could you expand on the difference and/or tension here?
a
@Christopher Shank Yes definitely! Conventional text-based programming languages are generally built on adjacency. This is a visual property in a way but I suppose a discrete one -- two things (characters, or words) are either adjacent or not -- so we don't tend to think of it as visuospatial. Box-and-wire dataflow languages like max/pure data are built on connectedness, another discrete property. We tend to think of these as more visuospatial because you can arrange things how you like in a 2d arrangement, but this is all secondary notation rather than syntactical or semantic. Proximity is where the arrangement enters the core language - things connect if they are proximal. This means you can assign additional meaning from how proximal things are. A really nice, successful example of this is the reactable. I'm not exploring this in the particular demo at the top of the thread, but think there are a lot of possibilities here. I explored this sort of thing some years ago with a haskell-based FRP front-end. Containment I guess is actually a separate issue. In this demo I'm exploring just drawing around groups of glyphs. In text-based systems we use parenthesis for this. One nice thing about exploring proximity is the possibilities for collaboration. The reactable is again a really nice case study for this - they made it circular so there's no way 'up' and people can collaborate by standing around it. Way ahead of its time really.
n
Hi @Alex McLean, this looks very interesting although it's a bit tough following what bits constitute 'programs' and what constitutes structural properties. Are you able to describe the setup a bit more?
a
Hi @Naveen Michaud-Agrawal, you mean the hand drawn thing in the video? That's just showing embedded sequences, no higher order stuff
n
Ah so the nested groupings are automatically recognized by the system? Do you have any examples of what the RealTalk code in the editor looks like?
a
It then builds realtalk claims from the opencv contours
Copy code
When /tool/ is a "embedded shape recognizer", /tool/ points "up" at /p/ within (2) inches, /p/ recognized contours /cs/ with origin /origin/:
    local boxes = {}
    for i,c in ipairs(cs) do
        local tl = c.region[1]
        local br = c.region[3]
        local w,h = br.x - tl.x, br.y-tl.y
        local box = create_id(p, i)

        -- Assumes parent contours are always earlier in the table
        local parent_box = c.parent_index > 0 and boxes[c.parent_index] or nil
        Claim (origin) has box2 (box) with position (tl) width (w) height (h).
        Claim box (box) is in a stack with children count (c.children_count) parent (parent_box) centroid (c.centroid).
        Claim box (box) has contour (c).
        table.insert(boxes, box)
    end
End
and separately does a hack to discard inner contours of drawn containers
Copy code
-- Find outer shapes
When /origin/ has box2 /box/, box /box/ is in a stack with parent (nil):
    Claim (box) is a shape with level (0).
End

-- Find inner shapes
When /parent/ is a shape with options /o/, box /child/ is in a stack with parent /parent/,
    box /inner_child/ is in a stack with parent /child/:
    Claim (inner_child) is a shape with level (o.level+1) container (parent).
End
a bit hacky but does the job
n
Thanks! I'm quite interested in the idea of interpreters of hand drawn languages, I think I understand now what your video is showing
a
@Naveen Michaud-Agrawal it might be clearer in this

https://youtu.be/TFQCVk7Iyyk

n
So it looks like you are selecting the parent contour to interpret using the small RealTalk object. Is the order of symbol interpretation dependent on where that page is pointing and it's orientation?
a
Yes that's right, from left to right from the perspective of the tool, based on centroid
n
Thanks! I've been playing with Folk, which has a similar design to RealTalk and have been interested in Bret's goals for universal scientific literacy, so it's very interesting to see all the dimensions being explored with DynamicLand.
v
If you are parsing the shapes why do you need the colored realtalk object? Can the parent contour not just have a shape annotation on it to play?
n
From what I understand the parsed shapes become true RealTalk objects
a
@Vijay Chakravarthy yes it would be lovely if it could stand alone as its own sort of language. At this stage of exploring ideas it's just super handy to just point some lua code at things in this way though. Especially as when I'm not manipulation the notation itself I don't have to worry about hands and pens getting in the way of the camera. I need to grapple with that problem at some point though!