Key Highlights
- Buterin runs open-weight models like Qwen3.5-35B entirely on his own NVIDIA 5090 GPUs using NixOS and bubblewrap sandboxes. The setup delivers around 90 tokens per second, enabling practical daily use while keeping all data and computation off Big Tech servers.
- Treating the LLM as an untrusted component, Buterin implements aggressive input sanitization, local daemons for selective access to tools like email and Ethereum signing, and never exposes raw credentials. This mirrors the security discipline Ethereum developers apply to smart contracts.
- Acknowledging limitations in areas like document search, translation, and seamless integration, Buterin recommends group-shared high-end machines for those who can’t afford premium hardware.
Vitalik Buterin, the Co-Founder of Ethereum, has detailed his latest personal computing setup aimed at achieving full control over artificial intelligence tools without relying on Big Tech servers.
In the recent post on X Thursday, Buterin shared his “self-sovereign, local, private and secure LLM setup” blog. The configuration runs entirely on his own hardware, using high-end NVIDIA 5090 GPUs and open-weight models like Qwen3.5 with 35 billion parameters. He reports inference speeds around 90 tokens per second, making it practical for everyday use.
“My goal is to intentionally take a hardline approach – not as extreme as some of my friends, who physically isolate everything, but still quite far, insisting on sandboxing things, sticking to local LLMs and local tools, no servers required, and see how far I can get,” the Ethereum Co-Founder wrote.
Technical foundations of the setup
The stack starts with NixOS, chosen for its declarative, reproducible builds that make the entire system auditable and easy to restore. All of its processes run inside bubblewrap sandboxes, limiting what the model can access.
Moreover, local daemons act as tightly controlled gateways, allowing the LLM selective interaction with tools like email clients or Ethereum wallet signing, without ever exposing raw credentials or full data stores.
Input sanitization layers filter prompts aggressively to block accidental leaks or injection attacks. Buterin’s approach treats the LLM as a capable but untrusted component, applying the same security mindset Ethereum developers use for smart contracts.
This setup aligns with Buterin’s broader 2026 push to reclaim computing self-sovereignty. Earlier in the year he declared 2026 the year to reverse a decade of “backsliding” toward centralized services, extending Ethereum’s principles of decentralization, user control, and verifiable trustlessness into everyday software and AI.
Challenges and hybrid safeguards
Despite rapid progress in open models, fully local AI still faces usability gaps. Buterin has acknowledged that tasks like seamless document search, real-time translation, or audio transcription lag behind polished cloud offerings. Integration remains fragmented—users juggle separate GitHub repos rather than enjoying a single, intuitive interface.
While having novel capabilities, the power draw and hardware cost for the proposed setup present further barriers. Such an arrangement demands significant electricity and expensive GPUs, putting it out of reach for most casual users today.
Yet smaller, specialized fine-tunes and efficient Mixture-of-Experts architectures are narrowing the gap, with some models already viable on laptops or high-end phones.
“If, on the other hand, you cannot personally afford the admittedly high-end laptops I have suggested here, then my recommendation is to get together a group of friends, buy a computer and GPU of at least that level of power, put it in a place with a static IP address, and all connect to it remotely,” Vitalik emphasized.
The hybrid design idea
To bridge remaining shortcomings, Buterin explores hybrid designs. Local filtering strips sensitive information before any query leaves the device. When heavier lifting is needed, he layers cryptographic protections: zero-knowledge proofs for verifiable remote computation, trusted execution environments (TEEs) for isolated processing, and per-query payments that keep interactions minimal and auditable.
Buterin’s linked write-up dives deeper into trade-offs, quantization choices, and sandbox configurations. He stresses pragmatism over ideology, aiming for local-first where feasible, with cryptographic guardrails for everything else.
As generative AI embeds itself deeper into work, communication, and finance, experiments like Buterin’s highlight a sharpening divide between convenience and autonomy. While cloud services offer ease and raw capability, it comes at high cost of data exposure and platform dependence. On the other hand, local stacks demand upfront effort and hardware investment, yet deliver verifiable control.
Whether this approach gains mainstream traction depends on further advances in model efficiency, user-friendly interfaces, and falling hardware prices. For now, it stands as a concrete proof-of-concept from one of blockchain’s most influential voices: self-sovereignty in AI is technically achievable today—and worth pursuing one carefully sandboxed GPU at a time.
Also read: SIREN Crashes 77% Again, Three Times in a Row Now: Classic Pump-n-Dump
