Simplon Off: flipping the lights off on AI

tl;dr: At the Green Tech Hackathon in Zurich, Fabio, Oleg, David and I built Simplon Off in a single day. It is a small web app that recommends a right-sized open AI model and helps you run it locally - on the laptop you already own, with no data centre in the loop. You can try it or read the code.

Last week was Climate Week in Zurich, a major milestone for our new company, Resilens. We demoed our decision platform for the first time to an audience of more than 70 people during a joint session with CLIMADA Technologies — but that’s a story for another time.

On the Friday of Climate Week, I spent the day at the Green Tech Hackathon. I decided to join the GenAI track, which asked an honest and slightly uncomfortable question: generative AI became a foundational tool astonishingly fast, but serving all those prompts - and cooling the data centres that do it - carries a real environmental cost. How do we make AI itself a bit greener, instead of only pointing it at green problems?

Together with a great bunch of people, Fabio, Oleg and David, I picked what is almost a boring angle on that question: most everyday AI tasks do not need a frontier model.

Drafting an email, fixing the grammar in a paragraph, turning a few bullet points into a diagram - none of that requires the largest model running in the largest data centre. A much smaller model, running on the laptop you already own, is often good enough. That is the whole idea behind Simplon Off. The name is a small pun: simple, and off - as in offline, and as in flipping the lights off when you are done. And it's a nod to our Swiss heritage.

How to make responsible AI accessible 🔗

In our design notes, we framed "responsible AI" as a triangle with three corners: privacy, sustainability, and ethical use. What we liked about local-first AI is that it sits fairly comfortably in the middle of all three. Your prompts never leave your device, so privacy is the default rather than a setting. A small local model uses a fraction of the energy of a hyperscale inference call, so sustainability comes mostly for free. And running the tool yourself keeps a bit of control and accountability close to home.

The catch is accessibility. Running a model locally is still a tech-savvy thing to do, and the people who would benefit most are usually the least likely to wade through quantisation formats and cryptic file names. So we wrote down four personas - from Petra, who does not follow AI at all, to Delia, an impatient teenager who closes anything slow - and tried to design for them, not for ourselves.

What we built 🔗

Simplon Off is a small web app with a deliberately short path through it:

Tell it what you want help with - writing, learning, coding, a creative task, or your own description.
Tell it about your device - roughly how much memory you have, your operating system, and how comfortable you are with setup.
Get a recommendation - one specific open model, a download, and short setup instructions tailored to your machine.

The download is a llamafile: a single executable, from Mozilla's project of the same name, that bundles the model and everything needed to run it. No Python environment, no toolchain - you download one file, run it, and a local chat interface opens in your browser. For a non-technical user, that is roughly the simplest "local AI" experience that currently exists¹.

Behind the recommendation is a small ranking we worked out and wrote down in the repo. Rather than ask people about VRAM, we translate hardware into four plain tiers - Basic, Balanced, Power, Advanced - and match each to a sensible model. A Basic machine is pointed at a tiny Bonsai model that starts fast and sips memory; a Balanced one gets a 4B-class model like Qwen3.5-4B as a solid default; stronger machines can step up to Apertus, the openly licensed model developed here in Switzerland, when quality matters more than footprint. The bias throughout is towards the smallest model that does the job well, which is also the most sustainable one.

What Simplon Off is not (yet) 🔗

I want to be honest about the edges, because it was a one-day build and it shows.

The web app itself was vibe-coded with Lovable and deployed on Cloudflare Workers - fast to get standing, but the device "detection" is still partly simulated, and the model metadata is a curated mock rather than a live catalogue. Audio transcription is our most honest weak spot: general chat LLMs are not speech-to-text models, so instead of pretending otherwise, the app says so and recommends treating transcription as a separate local step.

And the hard part - actually putting a number on the grams of CO₂ you save by going local - we did not solve. That is genuinely difficult; the hackathon brief itself called the inference footprint a "black box". What we built is a nudge in the right direction, not a measurement tool.

Why this matters (to me) 🔗

I keep coming back to a pattern I wrote about recently: the most interesting AI projects, to me, are the useful, bounded, low-risk ones. Simplon Off fits that. It does not try to be an everything tool. It just helps someone pick the smallest model that solves their problem and run it on hardware they already own.

That is not an anti-AI position - it is a right-sized one. If even a fraction of everyday prompts moved off the hyperscale path and onto a laptop, the privacy and energy maths would look a lot better. A one-day hackathon prototype will not move that fraction. But it was a good, concrete thing to point a day at - a small antidote to the AI emotional loop of being alternately amazed and alarmed.

Stay connected 🔗

Thanks for reading. You can try Simplon Off, browse the code and design notes on GitHub, and read more about the hackathon. All credit to the team - Fabio, Oleg and David - for a fun and focused day.

As always, if something here resonates, subscribe for more, or get in touch. And a plug to my company's website, Resilens and our careers site, get in touch if you are intersted joining our team.

Although I admit that making the file an executable is not exactly user friendly. For this step you might want to ask your local tech support. ↩