LLMs are productivity multipliers for software engineers. Over the last few years, the quality of their output has reached a professional level. They cannot one-shot a complete SaaS product (yet), but it’s more than just fancy autocomplete. What used to take an hour now takes minutes. Once you get used to building with AI-powered tools, the alternative feels awkward.
I remember developing software in the early 2000s. The internet existed, but treasure troves like Stack Overflow didn’t. Building something with a new framework meant buying and reading the physical book. Before hoodies were in, the sign of the real developer was a stack of O’Reilly books on your desk.
Once Google made it trivial to find the answers, those books went out of style. Searching for the right answer online became the norm. Developing without internet access was a clumsy curiosity. Every engineer who tries coding without, knows how much they depend on the internet.
We’ve reached a similar point with AI tools.
During my holiday, I found myself with a few hours to kill on an airplane. I’m working on a side-project, but developing without AI-powered tools just feels off.
So, I tried running a local LLM to see if I could “vibe code” in the air– The less-cool, nerdy version of the Mile High Club.
The latest version of PHPStorm has an AI assistant that allows you to connect to your LLM of choice. You can use classics like OpenAI, Claude or Gemini to develop at a professional level. But it also supports Ollama, which is, without a doubt, the easiest way to run a local model.
While my MacBook M2 Pro is a capable machine, it’s no tricked-out GPU powerhouse. It can run a large 24 billion parameter Mistral model, but does so at a glacial 20 seconds per token. I had more success coding with smaller models I had on my machine (DeepSeek r1 and Llama 3.2). Generating code and asking questions was definitely possible. It’s not at the same level as its cloud-based brethren, but it’s not completely useless either. It’s a decent ChatGPT replacement when you don’t have an internet connection.
I see another parallel from the early internet days. We used to run a strict client-server architecture where powerful servers generated documents to be consumed on weak clients. Companies had mainframes in their basement that could handle massive amounts of data. Gradually, however, as the client’s hardware got stronger, more responsibility was moved away from the server. Modern-day React applications run locally, and their servers do only a fraction of the work. We might be moving in that direction with AI.
It might take some time before we ditch our GPU mainframes for local AIs, but it feels like the logical progression for most cases. My laptop AI only needs to be able to discuss software concepts and hunt down Laravel bugs. It doesn’t need to support fluent conversation. The future AI in my stovetop will only need to understand when the milk is about to boil over. I don’t need to discuss the history of the Panama Canal with it.
Smaller, dumber local LLMs with limited but dedicated knowledge about a niche topic– Adding intelligence to our tools and products doesn’t require AGI.
Microsoft’s recently released BitNet, a model that can run on a classic CPU, skipping the need for fancy GPUs altogether. Nvidia is selling Jetson Nano, AI hardware for lower-powered edge computing. It seems like a trend we can embrace.
Which products can we build if we can have a capable AI that doesn’t require a connection to the 2025 equivalent of a mainframe? What can we make if we decentralise AI?
IoT devices seem like a good target. The aforementioned stovetop is such an example. But software products will benefit as well. Smaller models can run on the user’s device instead of in the cloud.
There is an opportunity for a faster, more privacy-centric, and cost-aware solution.
