Running my own AI assistant.

I recently started experimenting with my own AI asisstant. I decided to skip on OpenClaw for being massive slop of 400k LOC of vibe coded monstrosity. Looking trough web I found that there is also NanoClaw which seems much better but looks like it is tied to Anthropic service and I like owning my own data and I self host what I can.

That left me with Nanobot being my only option from bigger open source projects that are (relatively) known, are under active development and have big user base.

First I did setup do some research about possible integration of locally hosted model. Nanobot have configuration options for custom, OpenAPI compatible provider that should be fine with llama.cpp server. Also it have dedicated provider for vllm which is also compatible with OpenAI API.

I did some research to which model would be suitable to run in agentic mode and Qwen 3.5 have very good opinions. Unfortunately it is fairly new and is not integrated with all the tools yet – at the time of writing this I could not make it work with llama.cpp. This is not terrible since vllm server seems to be better choice for running a server – it is more performant and have dedicated Docker images. Also Amd have page for vllm on docker with their rocm libraries so it seemed like better choice for tests on my PC with Radeon RX 7900 XTX with 24GB of VRAM.

Qwen 3.5 did not run with vllm on its official docker images. It failed with ‘uknown’ architecture of model. I did not wanted to setup my own local environment of vllm with the latest versions of libraries because it also requires to serum AMD Radeon drivers and ROCm libraries installed – which for now is terrible experience on Debian. AMD officially supports only Ubuntu and Fedora.

Because of that I decided to run another model from Qwen family. I did some tests and:

Qwen 3 0.6B – was very fast but also felt a bit dumb.
Qwen 3 1.7B and Qwen 3 4B felt much smarter but they are still pretty small and I wanted to try bigger models.
Qwen 3 Next – I could not fit it into GPU.
Qwen 3 Coder 30B – it was running with very small context.
Qwen3-VL-30B Thinking – Qwen3-VL-30B Instruct running image capability seemed a bit wasteful since I did not had an use case for that. Also thinking version is fun to read through the response to learn how those models operate but it is slow (because of number of generated tokens) and would be very annoying for agentic use.
Qwen3-30B-A3B-Instruct-2507 – felt like about right to test few things. Quntized version of it: cyankiwi/Qwen3-30B-A3B-Instruct-2507-AWQ-4bit ran pretty fast, felt capable in responses and could fit with smaller context in my 24GB of VRAM.

Unfortunatelly right now Nanobot have some problems with running on Matrix which is a bit sad since I am running my own server so it would be perfect integration. But I can always chat with it and I also setup separate email account as an alternative way of communication – I can always send an email from anywhere!

Right now I was able just to make few tests but it feels great to have my own personal assistant, virtual entity living in my own hardware, waiting for me to ask it for help!

8 Replies to “Running my own AI assistant.”

Leave a Reply Cancel reply