What is a local LLM? in Talk Radio
- June 4, 2025, 10:29 p.m.
- |
- Public
You may or may not know that services like ChatGPT run on hardware that is essentially an extremely powerful version of a GPU, or graphics processing unit.
The GPU in a computer grinds on graphics, video, and sometimes parts of the user interface like menus. GPU’s task is drawing things, drawing billions things by the second. GPUs require significant cooling, large heatsinks with many fins or liquid circulation, similar to what CPUs need. Since power generates heat, you can try to gauge the performance of a component by looking at its cooling setup.
the 5090, an abomination before the lord:
This is about as powerful as retail GPUs get, but it’s not even close to the chips driving digital capital today. As you use them, these LLM cloud services have a data center doing the heavy lifting hardware-wise. Then it returns to you with some surface information for display and interaction on your dumb terminal.
In spite of consumer hardware regressing to old timey terminals, it’s possible to run an LLM on a retail GPU, like my RTX 2070 Super with 8GB of VRAM. The calculations grind right thru the same part typically used to draw things for Baldur’s Gate 3. Its data is not only run, but also stored on my own silicon at home. After setup, it runs without an internet connection. There’s no middleman, fewer privacy concerns, and it’s just interesting. Think of a ChatGPT session as streaming content, while running the LLM locally is like having the physical media.
I am granting my own computer an approximately parrot + wikipedia persona. What a time to be alive.
Gamers are frustrated with NVIDIA for price gouging and commercial data centers are insatiable for vram. NVIDIA doesn’t want to waste one second with any distractions like vram for the normies. NVIDIA has been criticized for prioritizing AI and cloud service providers over gamers by producing larger and more expensive graphics cards each year.
New budget brands for GPUs are nonexistent. High value for the price are now all older, second-hand models from 2 to 4 years ago.
ChatGPT et al get a lot of negative attention for being obnoxious and selfish, which they are. But they are also vulnerable to rapid changes in the market. Their terrible behavior is desperate, an inevitable crisis of venture capital tech subscriptions.
ChatGPT et all are shuffling around in the dirt. Nvidia is the one selling them shovels.
Someday, the bubble will burst, the free and cheap LLM frontend access to data centers will cease, but those GPUs could get replaced and so become ours. There will be amazing things we can do with billions of billions of billions of things rapidly drawing, but we can’t limit ourselves focusing on chasing profits.
Here’s what I’ve been using locally with ollama: llama 3.1 8b, mistral-nemo:12b, qwen 14b.
Does anyone know how to pull CausalLM/14B-DPO-alpha with this thing? lol.
Last updated June 04, 2025
Sleepy-Eyed John ⋅ June 05, 2025
Hmmmm