Integrating my AI assistant with pyLoad

I am using pyLoad download manager for quite some time now. Previously when I was running windows server I was using JDownloader. I think it was better in few things, for example it was able to automatically name a package based on the name of the file from the link. So if I would be trying to download https://cdimage.debian.org/debian-cd/current/amd64/iso-cd/debian-13.4.0-amd64-netinst.iso, it would automatically name a package ‘debian-13.4.0-amd64-netinst.iso’. Folder with downloaded file would be named the same too. Much more convenient. If I would drop multiple links for the same file it would be downloading them in parallel, downloading them much quicker. Though with current speeds in average home maybe this is not necessary anymore. It had better UI, in my opinion, but maybe I was more accustomed to windows desktop then; certainly it was better to have single view instead of 3 views like in pyLoad:

  • For current downloads
  • for already finished downloads
  • for packages

But it did not have web ui. There was some kind of plugin for that, but it is hard to add web UI to the program that was meant to be windows application and I was unable to make it work.

When I switched to pyload I was a bit sad by the downgrade, but having ability to run your service in web browser was worth it. Still I do not like UI in pyload so and replacing it with my AI assistant seemed like a great idea to save few few minutes every time I wanted to download something.

First I needed an API. PyLoad have an API – except it does not work. Or at least not the way you would expect. There is literally section named Using HTTP/JSON but when you try to use it it says: Obsolete API. There is this issue when someone tried to do similar thing as me and maintainer answered that JSON is not supported. Apparently it is JSON because it returns JSON. But you can’t send JSON to it.

That was something really surprising and it did not stopped me. Since there is web UI, there must be some kind of an API that can be called for login and adding packages. I inspected the page in the browser and it was sending login form via /login page.

curl -X POST "https://p.np0.pl/login" \
 -H "Content-Type: application/x-www-form-urlencoded" \
 -d "do=login&username=user&password=pass&submit=Login"

Ok, this does not look that bad and returns Set-Cookie header, so it should be fine. But then, I could not call correctly endpoint for actually adding a package. It did not work – I could not figure out how to call it with correct set of parameters that will be recognized as valid links. The web UI one on the other hand were not working without the Csrf token. And then API endpoints from docks stopped working too showing that they are ‘Obsolete API’. After doing login again it started working but they were required Csrf token this time. OK fine, I can get you CSRF token. I found it in head>meta, but it requires another call to /dashboard for HTML.

curl https://p.np0.pl/dashboard

CSRF token is in <meta name="csrf-token" content="">. Value of the tag need to be extracted and fed to another call. After that addPackage finally worked but it is not JSON API too and I had to guess how to pass list of links to it.

Again it is not as straightforward as it should be. Passing a string causes pyload to use string as list and tries to download every letter. Not what I wanted but a bit funny though.

This is how you need to call it:

curl -X POST "https://p.np0.pl/api/addPackage" \
    -H "Content-Type: application/x-www-form-urlencoded" \
    -H "x-csrf-token: $CSRF_TOKEN" \
    -H "Cookie: $COOKIE" \
    -d "name=$PACKAGE_NAME&links=[\"$FILE_URL\"]"

Not as straightforward as I would hope but at least it works. Ideally it would be just one call with an API key with JSON inside. That would be simple and easy to understand. But pyLoad is pretty old codebase so I was not totally surprised by this complication.

I tested it all in Rider that have pretty nifty tool for HTTP calls:

### GET request to example server
# @no-redirect
POST https://p.np0.pl/login?next=dashboard
Content-Type: application/x-www-form-urlencoded

do=login&username=u&password=p&submit=Login

> {%
   if (response.headers.valueOf("set-cookie")) {
       client.global.set("cookie", response.headers.valueOf("set-cookie"));
       client.log(response.headers.valueOf("set-cookie"))
   }
%}

###
GET https://p.np0.pl/dashboard

> {%
    const csrfRegex = /<meta\s+name=["']csrf-token["']\s+content=["']([^"']+)["']/i;
    const match = response.body.match(csrfRegex);

    if (match && match[1]) {
        const csrfValue = match[1];
        client.global.set("crsf", csrfValue);
        client.log("CSRF token: " + csrfValue);
    }
%}

###
POST https://p.np0.pl/api/addPackage?name=test&links=["https://cdimage.debian.org/cdimage/daily-builds/daily/arch-latest/arm64/iso-cd/debian-testing-arm64-netinst.iso"]
X-Csrf-Token:  {{crsf}}

I used prompt like below to make bot generate this flow as new skill.

Add skill: download via pyLoad. This skill will send files to download using the pyLoad application. PyLoad is located at “pyload.local”. The file is an address in http or https format.

The process works as follows:

  • Log in by performing a POST to the /login endpoint with data 'do=login&username=user&password=password&submit=Login' in application/x-www-form-urlencoded format
  • Get the response header value set-cookie – this will be needed for subsequent requests
  • Fetch HTML from https://p.np0.pl/dashboard using the cookie value from the previous step. In the response, find the value of the meta tag named csrf-token – this will be needed for subsequent requests
  • Send a POST request to https://p.np0.pl/api/addPackage?name={{filename}}&links=["{{fileurl}}"] using the cookie and the meta tag value as csrf token

This was enough for assistant to write working skill with working bash script. I just ask it to adjust name of the skill and name of bash script file since I did not liked what it came up with, but that was really minor things, otherwise it worked pretty much the first time. Qwen 3.5 seems really capable.

Right now I can just write to my AI assistant:

Download using pyLoad following file https://cdimage.debian.org/cdimage/daily-builds/daily/arch-latest/arm64/iso-cd/debian-testing-arm64-netinst.iso

And pyLoad will take care of the rest. It is a bit more convenient then my previous flow that was:

  • Copy the link
  • open pyLoad
  • Login into pyLoad
  • If I not logged into my password manager than login in there first
  • Open new package form
  • Invent some package name that I usually shorten and even mistype sometimes for me later to wonder what that is
  • Copy link into the form
  • Submit

Right now I can write just one prompt. Probably with few prompts of explanations or change of SOUL.md file I could just send a link to the chatbot and it would do all the rest. I will do that next time.

Whole script looks similar to below (it was generated by Qwen):

#!/bin/bash

# pyLoad skill - Download files using pyLoad on p.np0.pl

FILE_URL="$1"
PACKAGE_NAME="${2:-$(basename "$FILE_URL" | cut -d'?' -f1)}"

if [ -z "$FILE_URL" ]; then
    echo "Użycie: pyload-download <adres_pliku> [nazwa_pakietu]"
    exit 1
fi

# Krok 1: Zaloguj się
LOGIN_RESPONSE=$(curl -s -c /tmp/pyload_cookies.txt -b /tmp/pyload_cookies.txt \
    -X POST "https://p.np0.pl/login" \
    -H "Content-Type: application/x-www-form-urlencoded" \
    -d "do=login&username=user&password=pass&submit=Login")

if [ $? -ne 0 ]; then
    echo "Błąd logowania"
    exit 1
fi

# Krok 2: Pobierz dashboard i CSRF token
DASHBOARD=$(curl -s -b /tmp/pyload_cookies.txt -c /tmp/pyload_cookies.txt "https://p.np0.pl/dashboard")

CSRF_TOKEN=$(echo "$DASHBOARD" | grep -oP '(?<=<meta name="csrf-token" content=")[^"]+')

if [ -z "$CSRF_TOKEN" ]; then
    echo "Nie znaleziono CSRF token"
    exit 1
fi                                                                                 
# Krok 3: Dodaj pakiet
ADD_RESPONSE=$(curl -s -b /tmp/pyload_cookies.txt -c /tmp/pyload_cookies.txt \
    -X POST "https://p.np0.pl/api/addPackage" \
    -H "Content-Type: application/x-www-form-urlencoded" \
    -H "x-csrf-token: $CSRF_TOKEN" \                                                   
    -d "name=$PACKAGE_NAME&links=[\"$FILE_URL\"]")
                                                                                   
if [ $? -ne 0 ]; then
    echo "Błąd dodawania pakietu"
    exit 1                                                                         
fi

echo "Pakiet '$PACKAGE_NAME' dodany do pyLoad"
echo "Status: $ADD_RESPONSE"

Summary

Maybe at some point I will think about using my AI assistant to sort my downloads into better directory structure or I will remove pyLoad altogether, but for now it is fine.

Running Qwen 3.5 on AMD Ryzen AI Max+ 395

I bough lately Desktop Framework with intention of running Qwen 3.5 as model for my AI assistant on Nanobot. At first I could not run this model on this hardware for some weird bug in one of the libraries. I explained why and how I fixed it here. In this post I will just put the list of packages that I used to run Qwen finally and vLLM command switches and parameters.

Here is the list of packages that I used to finally get it working:

  • vllm 0.17.1+rocm700
  • amd-aiter 0.1.10.post2
  • torch 2.9.1+git8907517
  • triton 3.4.0
  • rocm 7.2.0.70200-43~24.04

And here is the script that I am using:

TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 \
  VLLM_ROCM_USE_AITER=1 \
  vllm serve \
  cyankiwi/Qwen3.5-35B-A3B-AWQ-4bit \
  --host 0.0.0.0 \
  --port 8000 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --dtype float16 \
  --max-model-len 128k \
  --gpu-memory-utilization 0.33

Happy hacking!

Switching my AI assistant OS to Ubuntu

I did make a mistake trying to run a vLLM server on Debian. I theory everything should be OK: Debian is stable, have docker, docker have images for vLLM so I should be able to run all the models on docker on Debian. Also it should not really matter what distro is as long as it have new kernel and docker works. That in theory. I practice I could not force some models to run and Qwen 3 even if it was capable enough it felt limiting at times. For example I was unable to force it to work in agentic loops. I am not sure what was the problem, probably some configuration issue that I am still unable to understand and fix. Still that was just a simple problem that required me to give my assistant some nudge with another prompt. It was not as bad another issue though.

For example when I was running Qwen 0.6B, I taught him to use some of my smart devices that I made myself, like my smart gate controller. It was working nice until it. somehow started to get confused and asked me repeatedly to give it an API key. But I already did. All it needed was already in the skill file. I tried to explain it to the model but without luck. It felt like talking to the parrot: it kept repeating the same sentence over and over again.

“The gate has been opened! 🚪

If you have the API key secret, I can do that. If you want, I can help with other tasks. 😊”

I just could not explain that it is wrong. Also those emojis everywhere were annoying. But leave it to another time.

I tested few other things. In example asking it to send me an email with reminder to do the thing – let’s call it A. It was either sending me Matrix message with the reminder to do A or reminding me to send email with A. Neither was correct. After that I decided to checkout Qwen 3.0 27B. Even if it still was incorrect sometimes I was able to steer it into correct path with few more prompts.

And it was fine until it again started misbehaving in the same way. It kept asking me to give it an API key for my own devices.

I have no idea why. I did not do any changes or adjustments. At some point it just forgot what is should do. I tweaked the skill files but without luck. I think it need to be aware about the changes, I did not knew that before. I edited manually history and memory to remove any mentions about any API keys. Again no change, even after restart. I decided that the easiest solution to that would be to run Qwen 3.5.

I played with my vLLM docker images a bit trying to debug why I cannot run this model. There were no, stack trace. No meaningful error of any kind. Just some logs saying that main process crashed.

I fed logs to Gemini and asked want might be the cause. Trying to ask internet search about that showed nothing. Gemini first wrote that it is “classic OOM exception behavior” and it is caused by my system not having enough VRAM. Thing is that I was already running Framework PC with 120GB of GPU memory. Running Qwen 3.5 in 30B size should be totally safe and should leave some space too.

I did explained that I have 120GB free and then Gemini confirmed that it should not be the problem – started to say that it is probably a problem with AMD Cuda implementation. It asked me to add --enabled-eager flag to see if it fix it. I think those models knowledge is rather outdated from the start of they existence since they are trained on the data being gathered months or even years before – it takes times to scrap, mark, clean, organize and censor this data and then train the model – so given the rapid evolution of ROCm and LLMs in general, this information was probably old and outdated… But still worth a try! Adding a flag and launching a model takes 2 mins.

But it did not fix it.

If this would not work, Gemini was so convinced about this solution that it gave me alternative in the same message, it asked me to enable vLLM debug log flag.

Again, I did, I relaunched the vLLM and saw new errors connected to Huggingface API. I asked about those but the answer was that it is normal, sometimes some models have some files missing in huggingface storage.

I did tried to switch to other flavors of Qwen 3.5, other sizes, quants and etc… It did not work. I feed entire output with all the debug logs into the Gemini chat. It said that I need to open an issue in vLLM github repo.

Well, that was useless.

At this point it was a bit late and I needed to take care of the kids so a brake sounded good.

Next day I downloaded iso for Ubuntu Server and started from scratch. I was using Ubuntu previously as a server and as my daily PC but I did not liked mostly forcing everything via snap and upgrade process. On Debian upgrade from 11 to 12 I did on 4 machines and I had no problems. I later upgraded all of them to 13 from 12 and again; no problems. I upgraded Ubuntu server to new version few years back and it stopped to boot. It was not terribly broken, it just lost boot partition with Linux image and booted only to Grub emergency terminal. Few adjustments and I was able to fix it in few minutes. Thing is it was headless PC that I was using for few self hosted applications that I and my wife was using on daily basis and having no access to them in the morning usually makes your life a bit worse. You depend on something just being there, and Ubuntu broke it for me. After that Debian was the way to go for me, because of stability of it. It is much harder to brake.

Or it was me, silly person, doing silly things to my linux server, that broke it. It is an also a possibility though I remember that I did just: sudo do-release-upgrade prior to reboot 🙂

Installation was pretty quick so in few minutes I was able to log in into SSH. After that I did the usual process of updating everything, installing usual packages (tmux, mosh, docker), configuring environment, SSH keys etc. Then I installed AMD GPU drivers and ROCm. Good thing it just worked on Ubuntu and there was no problems – I just followed this tutorial. Even amd-ttm worked and I was able to set VRAM limit to 120GB. I guess the tool is fine, it is just designed to work in Ubuntu.

After all was configured and libraries installed, I copied few scripts I saved from Debian installation that I used for running models via docker. I executed the one intended for Qwen 3.5 and vLLM… And it failed. Exactly the same way.

That was a bit of let down.

I tried SgLang and it did not work. I do not remember why thought I just remember that the image was enormous, like 25 GB. It failed exactly the same way: no meaningful error, just stopped. I tried to run one of AMD images that do not start vLLM directly, as an entrypoint but instead you can run bash and then you can experiment with the environment. After installing new pyTorch, I was able to see some logs, that had some meaning.

It was trying to assign 250GB of VRAM! Why? Is there even a GPU with that capacity of memory? The biggest I saw was 200GB enterprise cards. Was it in a way ensuring that engine fail for some reason? Anyway it was like that for full model of Qwen 3.5 and its quantized 4bit version from cyankiwi. That was very weird and I began to suspect it is just a bug in vLLM or one of the libraries.

Anyway that was something I was able to use to search the web. I have found this issue on GH. So it seems like to actually run Qwen 3.5 on Strix Halo you have to run it via some experimental flag. I tried to do it and it failed again.

But at least now it was complaining about some problem with ROCm aiter library.

What now? Since I had ROCm installed I could try and run vLLM directly. Also I could try to change the docker image to install and configure what was missing. Thing is I was not sure what was missing. And even if I would new what is missing I was not sure what version is exactly necessary. And I do not like building docker images directly, instead of testing software without docker images directly in the system – you may be wasting time installing everything on docker and it will fail anyway.

So I decided to install vLLM directly in Ubuntu OS and try to run model from that. I followed this tutorial for bulding vLLM from source and it failed mid way. I installed triton in version 3.6 and vLLM needs 3.4. I hate python packages.

Anyway I corrected triton version, and I already installed aiter. Maybe it is enough? I tried to run Qwen 3.5 and it worked!

Finally!

I tried to ask my AI assistant few questions to check if it works. And it was fine to felt a bit strange, but it happens when you change models on AI assistant that already have memory and some knowledge about your past interactions.

Right now my script to run Qwen 3.5 runs on:

  • vllm 0.17.1+rocm700
  • amd-aiter 0.1.10.post2
  • torch 2.9.1+git8907517
  • triton 3.4.0
  • rocm 7.2.0.70200-43~24.04

and looks like that:

TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1 \
  VLLM_ROCM_USE_AITER=1 \
  vllm serve \
  cyankiwi/Qwen3.5-35B-A3B-AWQ-4bit \
  --host 0.0.0.0 \
  --port 8000 \
  --reasoning-parser qwen3 \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --dtype float16 \
  --max-model-len 128k \
  --gpu-memory-utilization 0.33

In retrospect, switching OS of my device that runs the models was a good idea. AMD produces their software with Ubuntu and Fedora in mind. Probably I could get ROCm installation working on Debian too with some work. But I do not think it is worth my time. At least not now. With installation of ROCm I should be able to install vLLM in virtual environment too. Or install all libraries inside docker and run this image on Debian – that should be possible too.

Futuristic API design

Few days ago I was bragging here how great I am about teaching my AI assistant operating of my property gate.

Today I noticed that it no longer works. I asked it again to:

Open the gate

while wondering why am I actually doing it since I can just press the button on my phone… but well people are sometimes radicoulous like that.

Anyway I wrote message as above and I got answer like below:

Which means that it does not work because API key is not authorized.

No credentials to open the gate.

Error: Unauthorized – API key is not correct or does not have an access.

But it worked before! And I pressed the button after all on my mobile app and it does use the same API and it actually worked! Because I did not changed anything lately it did made perfect sense.

I did not had any time to investigate at the time but when I did, few hours later logs explained what was wrong.

nanobot.agent.loop:_run_agent_loop:230 - Tool call: exec({"command": "curl -s -X POST https://gate.local/gate/open -H \"X-API-Key: 5a3b8c1f-9d2e-4f6a-8b7c-3d1e9f2a4b6c\""})

It was hallucinating usage and API key.

Even worse all of it was explained inside the skill file. The only thing it got correct was the URL of the endpoint.

I changed the skill file and asked it again to try and open the gate. And it refused again. What was most peculiar was the fact that hallucinated API key was the same in every retry. But I did not provided an API key like that anywhere.

I asked my assistant about it: “Where the API key comes from?”. I responded with something strange.

The API key 5a3b8c1f-9d2e-4f6a-8b7c-3d1e9f2a4b6c was detected in the SKILL.md file for open-gate and close-gate.
It is not publicly available — it was loaded from the system’s configuration.

If you want, I can check whether the key is still valid or update it in the file.
Would you like me to do that?

This is funny because I did not provided anything like that in any of those file. Nor any others. Then I asked it to show me memory entries. And now we are getting somewhere..

Saved facts: API key 5a3b8c1f-9d2e-4f6a-8b7c-3d1e9f2a4b6c

It had it in the memory. But… again I did not provided anything like that. So it must hallucinated this at some point and added it to memory.

I asked it to remove the API key from memory and it worked from this point.

What it have to do with API design in the future.

API design for the models

Imagine you are running big API for millions of users. You may have some legacy code that have been running for ages and it is used so it can be changed as easily – you can’t just remove part of your product that is used by significant part of your customers just because is old and not up to modern standards.

Or maybe there is some new part of your API that have non standard headers for some reason. Maybe you want to test new flow or this design, as you may believe is better for some reason, that rest of the world is doing. You want to give it try and show it to the world.

Or maybe your API returns different response than JSON because it lets you better converts strings.

But then you see that there is significant part of the requests that are failing because they differ from what most of the world is doing now and those requests are trying to use what common standard dictates. But your docs clearly states something else. There were never any version of your system that was to be called like that.

What is happening? You reach out to some of those users being bigger customers and you learn that those are requests from automated systems. LLMs, trying to use your API. Some requests are corrected and then reverts to being invalid because models ‘forgot about fix’. Week after week, month after month, percent of invalid calls rises. At some point you are starting to think if this would not be just easier to change your API to make this way of calling valid in fact. Maybe you can’t fight the tide.

You roll out new new version and erroneous calls goes away. Until another weird usage popup because apparently nobody is reading the documentation anymore and asks their models to read it and write some code to call your API. But models being models got it wrong slightly and some edge case is causing requests to fail.

Do you change your API again? Will your users complain that they can’t use it? What then will you explain that it is not *THE RIGHT* way?

What about the case when you do not have some functionality in your API but you are seeing thousands of requests trying to do that? Maybe it will be worthwhile to actually add it and then charge for it? After all errors do not brings revenue?

Maybe you will have model fixing stuff based on errors in your logs and it will be adding features and capabilities to your product nobody really asked for, but what some models hallucinated.

Or maybe your API was written by an AI. And then another AI wrote documentation. And then another AI wrote the client. And another AI is reading whatever that client is returning and presenting it to the user. And nobody is sure what is going anymore with anything.

Future design

I see couple of possibilities here.

This will hinder significantly how we can evolve software. After all why bother trying something new and exciting if your users will be using ‘standard usage’ that they models hallucinated. ‘There is really no point in doing that. It won’t be used by AIs’. It probably could be better even but till significant part of the world will not be using that technology then it won’t be in the training data. If this won’t be in the training data then this technology won’t be used.

Another possibility is that we will create feedback loop of models feeding on themselves. Some models will be writing and improving code and others will be trying to use it, sometimes incorrectly and this usage will be feeder into models that modify software that is being used incorrectly. Maybe it will cause rapid evolutions of such systems that will be very different from what we are used to. Like creating entirely new content type that is binary serialization of memory representation of tokens, normalized for transporting via HTTP.

I think rather first one is more of a possibility. Second one would require rapid improvement in capabilities of such automated systems. Right now I do not saw any convincing example of any bigger product that was written by “AI”.

If that is true then *future designs of APIs* will be rather, safe, boring of more of ‘whatever rest of the world is doing’. Which is how it is now already with ‘this is not RESTfull’ or ‘do what Google and Meta is doing’ that you hear or read occasionally.

Is is possible that innovation will slow down a bit but on other hand… maybe it is for the better? Sometimes it feels like everyone in Software Development indutstry is chasing some vague goal of THE Perfect software but nobody really knows hot that ideal piece of code would look like. For now some people can think that it may be new model, new AI system or GenAI. I am not sure about that. After all term ‘human error’ comes from something, from the problem with our own ‘design’ that we make mistakes, forgets stuff and tend to do shortcuts hoping that ‘it will be fine’. And now we are teaching our computers to do that but much, much worse, slower and less effective.

So what will be future design of APIs?

Seems like it will be: whatever works for models or you will DDOSed with wrong calls.

Assigning VRAM on AMD AI Max+ 395

Few days ago I started playing with AI assistant and I decided to buy new hardware dedicated to running LLM. I bought Framework Desktop board. So far it is really great but running more than one model is a bit difficult and I wanted to test few things while my nanobot is running undisrupted.

I did quick search on my phone before buying and people were saying that it is possible to set it up via BIOS settings up to 96GB.

When I did finally got it and installed some basic system on USB stick (I did not had spare NVME disk at the time) I did test vLLM performance first. It was OK.

But when I tried to load another model (a bit bigger one) I hit OOM exception.

I tested BIOS settings and there was only setting for assigning 64GB of dedicated RAM to GPU, which is not what I wanted.

I reset those settings to default and tried to use this solution with AMD helper script. Solution looked sensible even if installation is via PIP which is always a bit problematic on Debian.

After installation I started the command:

amd-ttm
💻 Current TTM pages limit: 16469033 pages (62.82 GB)
💻 Total system memory: 125.65 GB

So far so good!

I tried changing it to other value:

❯ amd-ttm --set 100
🐧 Successfully set TTM pages limit to 26214400 pages (100.00 GB)
🐧 Configuration written to /etc/modprobe.d/ttm.conf
○ NOTE: You need to reboot for changes to take effect.
Would you like to reboot the system now? (y/n): y

And rebooted!

And guess what? It did not work!

I should have known. From amount of emojis in there the whole thing smells with vibe coding. I tried to do that few more times but constant restarts for headless machine are getting annoying real quick.

I uninstalled it and tried to look for another solution which I think I saw in some forum before I bought Framework Desktop (2000$ is not exactly cheap!) with usage of options command.

But I could not find it and I gave for few days.

Few days later I tried to test another big model and I needed to have more memory and I had to revisit this problem. Luckily I was able to find this thread and this actually worked. I created file called: /etc/modprobe.d/amdgpu_llm_optimized.conf with following content:

options amdgpu gttsize=120000
options ttm pages_limit=31457280
options ttm page_pool_size=15728640

After that I did run:

sudo update-grub

though I must say I am not really sure if this was necessary but since it is really quick and does not brakes anything I am including it in this solution.

After that I just rebooted the machine and it worked. Running the amd-smi showed:

+------------------------------------------------------------------------------+
| AMD-SMI 26.2.1+fc0010cf6a    amdgpu version: Linuxver ROCm version: 7.2.0    |
| VBIOS version: 00107962                                                      |
| Platform: Linux Baremetal                                                    |
|-------------------------------------+----------------------------------------|
| BDF                        GPU-Name | Mem-Uti   Temp   UEC       Power-Usage |
| GPU  HIP-ID  OAM-ID  Partition-Mode | GFX-Uti    Fan               Mem-Usage |
|=====================================+========================================|
| 0000:c1:00.0    AMD Radeon Graphics | N/A        N/A   0                 N/A |
|   0       0     N/A             N/A | N/A        N/A              153/512 MB |
+-------------------------------------+----------------------------------------+
+------------------------------------------------------------------------------+
| Processes:                                                                   |
|  GPU        PID  Process Name          GTT_MEM  VRAM_MEM  MEM_USAGE     CU % |
|==============================================================================|
|    0       4927  python3.12             5.9 MB   62.5 KB    16.0 EB  N/A     |
|    0       5497  python3.12           106.3 GB    5.6 MB   108.8 GB  N/A     |
+------------------------------------------------------------------------------+

I did uninstalled amd-ttm so I do not know if this would be shown by this tool but I have more trust in amd-smi as of now and it really works by reading /sys/module/ttm/parameters/pages_limit file which you can read yourself:

and it was showing correct value now.

And that is it! Happy playing with your models!