Running LLM on your daily PC and why it is not good

Two days ago I started playing a bit with my own AI assistant. It has been very great and funny experience so far but with few wrinkles. For example:

  • performance of new models on previous generation GPU (Radeon RX 7900 XTX) is surprisingly good with average of 28t/s but it takes whole VRAM and put a load on PC overall.
  • It takes whole lot of disk space. I.e. one vllm docker image to download and extract needs about 18gb; this seems a bit excessive.
  • Rocm installation needed a bit like 9gb to install all of its binaries. How much code do you need to produce 9gb of compiled code? This is crazy.
  • I have 16c/32t CPU and it seems to be enough for most of my workload but with llm in the background sometimes most of it is in use.
  • 32gb is not enough, but with current prices I can suffer for now. 🙂
  • If you want to swap models a lot you need a lot of disk space since bigger models takes several gigabytes.
  • You need decent internet speed to be able to download it.

Yes you may say that it is fine in 2026, everybody have fiber connection now! Yes but when you want to test something and new docker image is downloading in with 1mb/s speed, you may have a walk in that time.

Also I was not able to determine exact source of the issue but when I work and have open:

  • 3 browsers
  • 3 IDE instances
  • mail client
  • signal client
  • matrix client
  • media application
  • secret manager
  • remote desktop client
  • one VM running in the background
  • some other tools

it is a bit crowded and my PC freezes from time to time. But not completely more like very slow responses and stuttering. Funny thing is that there seems to be no sudden spikes in CPU activities nor memory usage during that. It is highly annoying though because you can’t even use mouse properly during that. Right now my best bet is that PC tries to swap memory between RAM and swap disk on ssd and constantly shuffling data here an there causes that. Probably buying few sticks of RAM would fix that but with current RAM prices… Well…

Another weird issue is that from time to time my main monitor 49″ Samsung stops responding to GPU updates. I can still use my PC with 2 other monitors so it is not frozen completely. It still works just main monitor no longer refreshes. It is funny that it is really easy to fix. I need to turn the monitor off and wait for the OS to register that it is gone. Gnome shuffles stuff on desktop and 2 other monitors get new windows. After that I just turn monitor on and everything is back to normal. Even all the windows are back in their place. Magic!

I have no idea why this is happening but it was already happening few times before I was using GPU for running models so it may be issue with the card, monitor or the drivers. It is just more apparent with higher load.

I have no idea what is the cause.

A bit annoying

But biggest disadvantage of such solution is that usually it is the most powerful PC you have and by the extent it is very power hungry. My PC with all external devices, monitors and etc needs much. I did not metered it but I estimate about 500W/h at the least. To not have it running nonstop you power it off. Constant sound of fans is also annoying. So again you power it off.

When you power it off you can’t use it. Of course you can juggle the power switches a bit to power off everything you do not need for the night. But it is very annoying after few days. The simplest solution to save energy and do not hear the fans or see faint glow is to power it via one general switch to everything but then you can’t use it.

The simplest solution to this is to have your server for self hosted services running somewhere you can’t hear or see and put it on some device that is not that power hungry. I did it already with my main server running my self hosted services for file sharing, media sharing, backups, messaging, network control, private DNS, smarthome devices control, authentication, git and etc. I have server like that running in a living room (where it was noisy and too loosely accessible to children), then in an attic where it was fine, but a bit too hot and in the basement at last. Now it have almost static temperature whole year and it is basically not existent to children.

Since this is good setup I wanted the same for my LLM setup. Few months ago I bought new server based on AMD Threadrippper PRO. It have 4 free PCIE slots that I intended for GPUs. But it is still very pricey setup and it easy to setup. Also it takes a lot of power. It mostly it is very pricey. Running for example four R9700 cards would cost me about 6000$ and would most probably require additional cost for some pcie extenders and additional PSU. And it still would be around of only 192GB of VRAM. Biggest models require multiple 200GB cards. So with the motherboard, case, RAM, disks, PSU cards and etc. Probably whole beefy server like that would cost me more than 10k$. And it still would not be able to run biggest models.

This is really something you can’t just do in your homelab. There is no way to actually compete with datacenters in your home.

You can also for example run big models on your CPU using ram. But it is slow. Very slow. Unless you are running some agents and speed does not matter that much it is just some fun toy. And I wanted something usable. Like an assistant that I can write or talk too and get response to question or a task in few to 20 seconds. That is bearable.

For that you have to have separate device built specifically for LLMs running in some closet at your home.

Your daily is not the best idea to run those things 24/7.

Connecting Nanobot to Matrix

I started digging into Nanobot code to check why it cannot connect to Matrix server despite the config being correct and nanobot gateway being executed. Just in case I messed up something I did recreate entire nanobot workspace by doing nanobot onboard after removing entire directory. Did not work.

Closer inspection of the code shed some light on the problem: the config was not being used because channel was never instantiated. That part of the code was gone for some reason despite the Readme stating it is possible to integrate nanobot with Matrix.

I started to change some code to make it work. It did not seemed to complex to fix. I got it almost working but I am not great with Python, so I had some problems with tests not being green on my branch. Seemed a bit strange that I broke tests not connected to my own changes but I was unfamiliar to the repository and last time I was working with Python was around 2022 so… Who knows? Maybe I did broke it.

But then I thought ‘Hmm this seems like an obvious problem and easy fix! Maybe someone already did that!’. And it actually was true. There was PR open for this.

I was happy to run this branch so I setup virtual environment with uv that I still have somewhere around after I was playing with vllm setup. I downloaded this branch, built nanobot and rerun the gateway. Now it was able to connect to Matrix server though it still was having some issues with sending encrypted messages and had to play a bit with Matrix API to get access token for the bot.

Also I think it is a bit annoying that it reponds to all the messages in the channel. I.e. it would make sense to create root for bigger audience and to share some ideas or discuss something and having a not responding to everything would be very annoying and disruptive. It would make more sense if message would be only considered to be a propmpt when it would be mention @nanobot.

But for now I am happy with ability to chat with my new bot in my own Matrix channel and ask him to do some stuff for me.

Adding the bot to Matrix server

I have my own matrix server for my own use. Usually it is used by me a bit by my family and most of it by my own services to sent me some notifications.

Lately I started playing with nanobot. It is personal AI assistant like OpenClaw. I wanted to be able to chat with it in my own Matrix via my own phone. To do that I needed to create new user dedicated

I find it a bit confusing that there is no central web UI that allow you to do that. Ok, since privacy and security is their utmost concern maybe there did it like that so that you have to have direct control of the server that is running it. OK that is one way to do it securely but also it is a bit obscure. You have to remember where it is, how docker container is named (there are several) and remember exact command you have for run, with exact name of parameters. Since it is very rare occurrence having a need to do that (it is not like I am constantly changing users), I have a hard time to remember that.

I created snippet of bash script that need to be run in order to do that. I have matrix running on docker compose with separate directory for all the data.

# navigate to directory
cd /opt/matrix
sudo docker compose exec matrix-synapse /bin/bash
register_new_matrix_user -u newusername -p very-secure-password1 -c /data/homeserver.yaml

This creates the user. In order to connect to server as new user you still need to login. In theory it should be possible to do that via web client and extract device id and access token from the client itself. It is possible and might actually work. But it is also possible to do that via API and it is much better since you can easily regenrate that data. And you will probably need that since token that is in use will be active but if user will not be active via that token for some time it will be invalidated. And then using client will be much more inconvenient then just API call via curl for example.

Here is another bash script that retrieves access token via API:

curl -XPOST -d '{"type":"m.login.password", "user":"newusername", "password":"very-secure-password1"}' "https://matrix.domain/_matrix/client/r0/login"

Of course user name and its password need to be the same as in previous script.

This will return JSON similar to:

{
    "access_token": "QGV4YW1wbGU6bG9jYWxob3N0.vRDLTgxefmKWQEtgGd",
    "home_server": "localhost",
    "user_id": "@matrix.domain:newusername"
}

That is all. Though I am not sure how device id need to be retrieved/regenerated without some client. Or even if it need to be communicated to the server at all prior to login. Anyway one time login via client and retrieving device id from the client it is enough. I won’t change and access token can be changed via running a script again fairly easily.

Running my own AI assistant.

I recently started experimenting with my own AI asisstant. I decided to skip on OpenClaw for being massive slop of 400k LOC of vibe coded monstrosity. Looking trough web I found that there is also NanoClaw which seems much better but looks like it is tied to Anthropic service and I like owning my own data and I self host what I can.

That left me with Nanobot being my only option from bigger open source projects that are (relatively) known, are under active development and have big user base.

First I did setup do some research about possible integration of locally hosted model. Nanobot have configuration options for custom, OpenAPI compatible provider that should be fine with llama.cpp server. Also it have dedicated provider for vllm which is also compatible with OpenAI API.

I did some research to which model would be suitable to run in agentic mode and Qwen 3.5 have very good opinions. Unfortunately it is fairly new and is not integrated with all the tools yet – at the time of writing this I could not make it work with llama.cpp. This is not terrible since vllm server seems to be better choice for running a server – it is more performant and have dedicated Docker images. Also Amd have page for vllm on docker with their rocm libraries so it seemed like better choice for tests on my PC with Radeon RX 7900 XTX with 24GB of VRAM.

Qwen 3.5 did not run with vllm on its official docker images. It failed with ‘uknown’ architecture of model. I did not wanted to setup my own local environment of vllm with the latest versions of libraries because it also requires to serum AMD Radeon drivers and ROCm libraries installed – which for now is terrible experience on Debian. AMD officially supports only Ubuntu and Fedora.

Because of that I decided to run another model from Qwen family. I did some tests and:

  • Qwen 3 0.6B – was very fast but also felt a bit dumb.
  • Qwen 3 1.7B and Qwen 3 4B felt much smarter but they are still pretty small and I wanted to try bigger models.
  • Qwen 3 Next – I could not fit it into GPU.
  • Qwen 3 Coder 30B – it was running with very small context.
  • Qwen3-VL-30B ThinkingQwen3-VL-30B Instruct running image capability seemed a bit wasteful since I did not had an use case for that. Also thinking version is fun to read through the response to learn how those models operate but it is slow (because of number of generated tokens) and would be very annoying for agentic use.
  • Qwen3-30B-A3B-Instruct-2507 – felt like about right to test few things. Quntized version of it: cyankiwi/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit ran pretty fast, felt capable in responses and could fit with smaller context in my 24GB of VRAM.

Unfortunatelly right now Nanobot have some problems with running on Matrix which is a bit sad since I am running my own server so it would be perfect integration. But I can always chat with it and I also setup separate email account as an alternative way of communication – I can always send an email from anywhere!

Right now I was able just to make few tests but it feels great to have my own personal assistant, virtual entity living in my own hardware, waiting for me to ask it for help!

Moving /boot partition

For a while I had a problem with my boot partition being too small. It was fine for one kernel image but not for two. So whenever kernel was updated it was failing because of lack of space. Five years ago 500MB was enough but not nowadays.

Today I decided to finally fix that.

First, I did created new LVM volume for /boot:

sudo lvcreate -L 1G -n boot root-vg

Then I mounted it and copied contents of old /boot into it

sudo mount /dev/mapper/root--vg-boot /mnt/boot
sudo rsync -avp /boot /mnt/boot

After that I edited /etc/fstab and changed old boot partition to new boot partition and then rebooted the machine.

Then I did update Grub:

sudo update-grub

Everything went well so I removed old partition as no longer necessary.

And this time PC failed to start.

Being in Grub emergency shell I managed to boot my machine by doing:

set root=(lvm,root--vg-boot)
linux /vmlinuz-6.18.9+deb14-amd64 root=/dev/mapper/root--vg-root
initrd initrd.img-6.18.9+deb14-amd64
boot

Which meant that system was fine, but Grub was misconfigured.

I tried to fix that by reinstalling grub (which according to quick web search should fix the issue) but I was unable to find an example that would work on my setup. I kept getting:

grub-install: error: cannot find EFI directory.

Finally after inspecting manpage for grub-install I did use additional switch:

sudo grub-install --efi-directory=/boot/efi /dev/nvme0n1

This did work and I was finally able to see GRUB welcome page and booted my PC just fine.