I've been thinking a lot about the patterns and ar...
# thinking-together
I've been thinking a lot about the patterns and architectures we're going to see start to emerge that lend themselves well to being written by generative AI and came across this technique being used by a library called Marvin (https://github.com/PrefectHQ/marvin) where they limit the tokens the LLM can respond with to just a single token corresponding to a value in an enum. They then have it respond with the value as the response to a natural language query. This is extra interesting because responding with a single token is relatively fast and cheap. The example they give is using it in routing:
``` USER_PROFILE = "/user-profile"
SEARCH = "/search"
NOTIFICATIONS = "/notifications"
SETTINGS = "/settings"
HELP = "/help"
CHAT = "/chat"
DOCS = "/docs"
PROJECTS = "/projects"
WORKSPACES = "/workspaces"
AppRoute("update my name")
# AppRoute.USER_PROFILE```
But I feel like there's a seed of an idea here that points to what a piece of an LLM-core architecture may look like. I experimented with the idea a bit in chatgpt earlier today (screenshots attached) and I'd love to know if anyone finds this interesting or has any thoughts/opinions.
Just for clarification - is the idea that the LLM can be used as a sort of router?
Yeah, so they're using it as a natural language router which is super interesting in itself - that "update my name" could also be the output of another LLM (ex - as a result of a user action in a UI...user fills in a "name" field and clicks the "save" button, router POSTs that message to the API). You can take that further on it's own - can you have an LLM route messages in a smalltalk/ruby/objective-c message-passing style language? But where my head went is - given a specific context in a prompt, we can encode a bunch of information (in binary, hex?, emoji?) and enable new patterns that give us cost/speed benefits from LLMs while also not having to explicitly write as much code. I need to do a bigger experiment, but in my screenshots, I'm showing it actually being able to encode and decode the information, at least from a simple binary encoding. A while back I followed a tutorial for building a chess engine in C, and they used a similar technique but with U64s and a lot of bitwise operations, bit masks, etc and I'm thinking there's a way to take inspiration from those techniques and use it with an LLM...
I have been playing with the -0613 OpenAI models, which have festures to be forced to return JSON that adheres to a provided schema, and it has been working well, so far. Even when dealing with relatively complicated schema and relatively complicated requests. So limiting it to function-like inputs and outputs feels like asking too little of the model per call. But then again, other models may not be as good at it.
have you looked at how lmql.ai does token masking and constrained decoding?
Ooh I'm not familiar with lmql.ai thanks for sharing that! I'll check out more about what they're doing