Running LLM on your daily PC and why it is not good

Two days ago I started playing a bit with my own AI assistant. It has been very great and funny experience so far but with few wrinkles. For example:

  • performance of new models on previous generation GPU (Radeon RX 7900 XTX) is surprisingly good with average of 28t/s but it takes whole VRAM and put a load on PC overall.
  • It takes whole lot of disk space. I.e. one vllm docker image to download and extract needs about 18gb; this seems a bit excessive.
  • Rocm installation needed a bit like 9gb to install all of its binaries. How much code do you need to produce 9gb of compiled code? This is crazy.
  • I have 16c/32t CPU and it seems to be enough for most of my workload but with llm in the background sometimes most of it is in use.
  • 32gb is not enough, but with current prices I can suffer for now. 🙂
  • If you want to swap models a lot you need a lot of disk space since bigger models takes several gigabytes.
  • You need decent internet speed to be able to download it.

Yes you may say that it is fine in 2026, everybody have fiber connection now! Yes but when you want to test something and new docker image is downloading in with 1mb/s speed, you may have a walk in that time.

Also I was not able to determine exact source of the issue but when I work and have open:

  • 3 browsers
  • 3 IDE instances
  • mail client
  • signal client
  • matrix client
  • media application
  • secret manager
  • remote desktop client
  • one VM running in the background
  • some other tools

it is a bit crowded and my PC freezes from time to time. But not completely more like very slow responses and stuttering. Funny thing is that there seems to be no sudden spikes in CPU activities nor memory usage during that. It is highly annoying though because you can’t even use mouse properly during that. Right now my best bet is that PC tries to swap memory between RAM and swap disk on ssd and constantly shuffling data here an there causes that. Probably buying few sticks of RAM would fix that but with current RAM prices… Well…

Another weird issue is that from time to time my main monitor 49″ Samsung stops responding to GPU updates. I can still use my PC with 2 other monitors so it is not frozen completely. It still works just main monitor no longer refreshes. It is funny that it is really easy to fix. I need to turn the monitor off and wait for the OS to register that it is gone. Gnome shuffles stuff on desktop and 2 other monitors get new windows. After that I just turn monitor on and everything is back to normal. Even all the windows are back in their place. Magic!

I have no idea why this is happening but it was already happening few times before I was using GPU for running models so it may be issue with the card, monitor or the drivers. It is just more apparent with higher load.

I have no idea what is the cause.

A bit annoying

But biggest disadvantage of such solution is that usually it is the most powerful PC you have and by the extent it is very power hungry. My PC with all external devices, monitors and etc needs much. I did not metered it but I estimate about 500W/h at the least. To not have it running nonstop you power it off. Constant sound of fans is also annoying. So again you power it off.

When you power it off you can’t use it. Of course you can juggle the power switches a bit to power off everything you do not need for the night. But it is very annoying after few days. The simplest solution to save energy and do not hear the fans or see faint glow is to power it via one general switch to everything but then you can’t use it.

The simplest solution to this is to have your server for self hosted services running somewhere you can’t hear or see and put it on some device that is not that power hungry. I did it already with my main server running my self hosted services for file sharing, media sharing, backups, messaging, network control, private DNS, smarthome devices control, authentication, git and etc. I have server like that running in a living room (where it was noisy and too loosely accessible to children), then in an attic where it was fine, but a bit too hot and in the basement at last. Now it have almost static temperature whole year and it is basically not existent to children.

Since this is good setup I wanted the same for my LLM setup. Few months ago I bought new server based on AMD Threadrippper PRO. It have 4 free PCIE slots that I intended for GPUs. But it is still very pricey setup and it easy to setup. Also it takes a lot of power. It mostly it is very pricey. Running for example four R9700 cards would cost me about 6000$ and would most probably require additional cost for some pcie extenders and additional PSU. And it still would be around of only 192GB of VRAM. Biggest models require multiple 200GB cards. So with the motherboard, case, RAM, disks, PSU cards and etc. Probably whole beefy server like that would cost me more than 10k$. And it still would not be able to run biggest models.

This is really something you can’t just do in your homelab. There is no way to actually compete with datacenters in your home.

You can also for example run big models on your CPU using ram. But it is slow. Very slow. Unless you are running some agents and speed does not matter that much it is just some fun toy. And I wanted something usable. Like an assistant that I can write or talk too and get response to question or a task in few to 20 seconds. That is bearable.

For that you have to have separate device built specifically for LLMs running in some closet at your home.

Your daily is not the best idea to run those things 24/7.

Leave a Reply

Your email address will not be published. Required fields are marked *

Solve : *
17 + 3 =