I’ve been somewhat frustrated by limitations in AI agents when it came to both them deciding which web resources might be relevant, and their inability to retrieve private data. If it’s not public, you have to have some method to expose that data to (for example) Claude Desktop or similar - and it has to live in a different silo. Rebuilding all that context time and again is also a pain.
With all that in mind, and with some downtime in hand, I’ve put together
Sombra - a tool that combines traditional web scraping techniques (the original arc90 readability algorithm) with a modern, authenticated remote MCP connection, consumable by compatible clients.
Web pages that you save are stored as markdown and can be organised into collections - and those collections are then available via MCP resources. Scraping happens client-side, so if you can see the content in Chrome, you can save it to your collection. I added screenshot capture too - but haven’t exposed that to MCP yet. I’d be curious if that might be helpful to any of you - it feels like it might be too much when the markdown is available - maybe the visual references could be another resource? About the name, why sombra? I was thinking of sci-fi references such as Peter Hamilton’s u-shadow, or the idea of a “shadow” in Silo - I’d like to evolve this concept further in the future.
The stack is Clojure/Datomic on the backend with a TypeScript Chrome extension - the early release is now publicly available.
If any of this sounds interesting, I’d love some feedback! It’s one of those projects that scratches a personal itch, and then possibly got a bit out of hand - but having built it, it feels like it would be a shame not to put it out there, in case it helps others. Thanks!