“Instead of stiffly suppressing yourself, relax and live without care. Who knows what tomorrow may bring? Instead of expecting the void of tomorrow, enjoy your life today!” – Axis Cult teachings (KonoSuba)

The incredibly dumb AI chatbot Kazuma

Updated February 20, 2022 • Published February 19, 2022
8 minutes read • By MarsRon

Kazuma shiny ehe face

I’ve always wanted to create a AI chatbot of my own, mainly because I was interested in what responses would the artificial intelligence give me. Now that I’ve bodged together a DialoGPT based AI chatbot of my own, let me show you how I created Kazuma the incredibly dumb AI chatbot. Mind you I had pretty much no Python knowledge at all before working on this project lol.


Thanks YouTube recommendation

A few months back, YouTube randomly recommended me this YouTube video, and after some digging lead to this tutorial blog post by Lynn Zheng. I suggest you read through her blog post because most of the stuff I did was just based on her blog post.

Inspiration struck, and then I wanted to create my own AI chatbot based on the character Kazuma from KonoSuba.

Why Kazuma?

Kazuma portrait

I wanted the AI to be Kazuma, because KonoSuba is my favourite anime and Kazuma is the main character of hilarious KonoSuba story. Here’s a summary of what happened to him in the first episode of the anime lmao:

One day after going outside, Kazuma died attempting to save a girl from being hit by an oncoming truck. The truck was, in fact, a slow-moving tractor, but thinking that he was run over, Kazuma wet himself and died from shock.

But how?

But how would I create it? I had no experience of Python, the programming language used in training and deploying the DialoGPT model, nor did I really understand how any of this worked. I can’t just headbutt my way through this, right?

And I did just that, headbutt my way through. It was a slow process, I worked on it for a while then gave up, then got motivation to continue it, and my procrastination took a few months to finally complete it. I also faced a few problems that I didn’t knew how to fix until a few days before publishing this project to GitHub. (Thanks kind stranger for replying to an old question I posted, reminding me of this project)

Gathering training data

GitHub repository

By the way here’s the GitHub repo of the project if you wanna take a look at my ugly code. Feel free to steal take inspiration from this project in any way you want, it’s under MIT license so you can do whatever you want ;)


If I reference something like folder/file.txt, it’s a file from the repo, so go check it out if you need to.

The source of training data

Anyways, the first step to train an AI is just gather training data. Unfortunately, I couldn’t find a KonoSuba or Kazuma dataset from Kaggle, so I had to gather the data myself.

Lucky for me, I was reading the KonoSuba light novel from a website called CG Translations, and I decided to webscrape data from the website.

How the webscraper works

After a few weeks of slow progress on the webscraping process, I finally got it working lmao. This is also my first few times trying to learn webscraping, so the code doesn’t look very neat at all, but hey it works. Here’s a summary of what data/webscrape.js does:

  1. Download and parse the main page at https://cgtranslations.me/konosuba
  2. Search for links that directs to a blog post/article
  3. Download and parse all the articles with a slight delay between each request
  4. Concatenate the articles into a single string
  5. Do some post-processing to remove unwanted elements and write to data/full-data.txt
  6. Finally, run another script which concatenates 5000 lines of dialog, and write the final training data to data/training-data.txt.

Training data regrets

I had to limit the training data to 5000 lines because of this error that I was constantly getting :(

Although this works pretty well, there are some chapters and volumes that are skipped due to various reasons that I was too lazy to worry about. I believe this doesn’t affect the final training data that much since the training data was limited to 5000 lines of dialogs anyway.

Another concern I had was that the training data consists of all the dialogs from all the characters in KonoSuba, most of which isn’t actually from Kazuma (the character). This is fine for KonoSuba, because Kazuma (the character) and most of the other casts have the same carefree demeanor. It worked out okay, and I wouldn’t want to spent another few month trying to associate each dialog with each character and get the perfect “Kazuma” lol (cuz I’m lazyyy)

Anyway, now that we have our training data, it’s time to train Kazuma!

Training Kazuma

Using Google Colab

To train Kazuma with our newly acquired dataset, I got some help from Google Colab, a totally free to use Google service to run Jupyter notebooks. They also provide free access to GPUs to accelerate the training of AI models, perfect for us to train Kazuma with. (Thank you, the people at Google Colab)

I’m using train/train-kazuma.ipynb, a modified version of Lynn Zheng’s training notebook. All you need to do is just to:

  1. Create a folder in your Google Drive
  2. Upload train/train-kazuma.ipynb to the folder
  3. Upload data/training-data.txt to the same folder
  4. Open the Jupyter notebook, it will automatically be opened in Google Colab
  5. Run all the code inside the notebook

How does train-kazuma.ipynb work?

This notebook took me a while to understand, mainly because of my inexperience with Python, and modify it to suit my needs. Basically, it does the following:

  1. Install the needed Python packages and libraries
  2. Parse training-data.txt that was uploaded to Google Drive
  3. Download the microsoft/DialoGPT-small model
  4. Train the downloaded model with the training data
  5. Save the trained model to the Google Drive
  6. Let you have a few conversations with the new trained AI model

If you have any questions regarding this notebook, feel free to not ask me, because I also have no idea how the training code works lmao.

Why small version of DialoGPT?

Anyways, you can actually choose which model size you want to use in the configuration section of the Jupyter notebook. I chose small because it’s already big enough at 500MB I don’t really need Kazuma to be very smart. All it needs to do is give responses that are funny and human-like.

Even if it’s extremely dumb, I like the hilarious responses it gives that has nothing to do with the context, I find it quite comedic.

If you want to create your own chatbot AI, I would suggest reading Lynn Zhen’s tutorial blog. It’s an actual tutorial unlike this blog that’s just about what I did to achieve Kazuma.

Finally finished training

After a ton of CUDA out of memory errors, a ton more configuration change and re-training, and 2 hours of actual training, I finally got a working Kazuma model. I tried having some conversations with Kazuma, and not surprisingly it’s utterly stupid lol.

But still, after so many retries, I was very hyped about deploying and using Kazuma. I was fantasizing about integrating Kazuma with Phobos, a Discord.js bot I was working on at the same time, and then it hit me. I didn’t know shit about deploying PyTorch models.

Deploying Kazuma

First obstacle

All the previous steps were done in late 2021, and I’m so close to finish this project yet I have no idea how to deploy it.

I didn’t want to host the model on Hugging Face, because I’m afraid that I would reach the API free-tier API monthly request limit fairly quickly. A 30k tokens/characters per month quota is very little for a chatbot, it’s easy to reach the request limit in just a few hours.

I then found out you can use the following code extracted from the Jupyter notebook to run the model. A reminder that I had almost no knowledge of Python at the time so the code looks ugly :p

from transformers import AutoTokenizer, AutoModelWithLMHead

tokenizer = AutoTokenizer.from_pretrained('kazuma-small')
model = AutoModelWithLMHead.from_pretrained('kazuma-small')

for step in range(10):
    message = tokenizer.encode(input(">> User: ") + tokenizer.eos_token, return_tensors='pt')

    response = model.generate(
        message, max_length=200,

    print(f"Kazuma: {tokenizer.decode(response, skip_special_tokens=True)}")

As I was using Repl.it to host Phobos back then, I couldn’t host Kazuma because it would surpass the file size limit of Repl.it.

I then lost interest and didn’t work on the project for a while, until I got a Raspberry Pi in November 2021.

Raspberry Pi to the rescue… or not

After I moved Phobos and my personal website to my new Raspberry Pi 4B, I was also thinking about moving Kazuma along. It was then I learnt that Kazuma would not work on the Raspberry Pi, because PyTorch would not work on 32-bit ARM systems.

And I lost interest again :/

64-bit Raspi OS beta didn’t help

A few days later I found out about the beta 64-bit version of the Raspi OS (They rebranded Raspbian as Raspi OS). I then decided to install it to my Raspberry Pi since PyTorch theoretically would work with 64-bit Raspi OS.

And it worked!… until it errored 😔 PyTorch outputs Illegal instruction every time I tried to run the code.

I posted a question about it on the PyTorch Discourse site, and forgot about it after a few days of inactivity…

Learning Python and Turning Kazuma into an API

In the meantime, I was still hopeful about deploying Kazuma in the future. I learnt more about Python and took a lot of inspiration from https://github.com/polakowo/gpt2bot to create a better version of what I have.

I also learnt about virtual environments, it’s similar to node_modules but the Python executable itself is also bundled in the virtual environment.

I then learnt about FastAPI and tried to integrate it to make a RESTful API from Kazuma.

After a few days of slow progress I got the final result you see in api/, and now all it needs is just a host to deploy. I didn’t work on Kazuma ever since that, until…

64-bit Raspi OS out of beta

Just a few days (maybe weeks I forgor 💀) ago, the 64-bit version of Raspi OS just got out of beta. I didn’t thought much of it at first, since I already have the beta version running on my Raspberry Pi and didn’t think I needed the update.

Then I got a reply from the PyTorch forum. And I was like: “Oh right I have this project that I abandoned a while back lmao”

Then I realised I should try out the new 64-bit Raspi OS release, hoping maybe it would solve the Illegal instruction problem I was facing.

It actually did work! And now Kazuma was working perfectly, then I noticed something.

Performance issues

I noticed the RAM usage increased by 700MB when I started up Kazuma. “Sheeesh that’s a lot.” Was my initial reaction, but I have 4GB worth of RAM, and was only using around 400MB, so RAM wasn’t an issue for me (for now).

I also noticed that all 4 cores on the Raspberry Pi was fully utilised (100% utilization) when generating a response, but the compute time was far from good.

Each response took more than 1 second to compute, and the longer the response the longer the compute time. It could reach 10 seconds of compute time quite easily. ARM CPUs weren’t meant for machine learning, so yeah.

"I've won, but at what cost?"

Just keep pushing

Even with the inefficiency of Kazuma, I still wanted to have fun with Kazuma. I didn’t want to give up all the hours I’ve sunk in just because of performance issues >:(

Thus, I started refactoring my code and stuff, and getting it ready to be pushed to GitHub to be open-sourced. I’d like others to benefit from this project as well :) As always, the source code is at https://github.com/MarsRon/kazuma, feel free to take a look.

Now that I have a working API, I just need to actually deploy it.

Deploy Kazuma using a process manager

Since I was using PM2 to run my other projects as well, naturally I wanted to use it to run Kazuma as well. Luckily PM2 does support Python, you can even use virtual environments (Thank you stackoverflow user).

And after some setting up, Kazuma was running successfully. I then added a .kazuma command to Phobos and it worked, slowly but surely, on Discord!

All the command does is just send a HTTP request to Kazuma which is running on port 8002 (don’t ask why 8002), and wait for Kazuma to generate a response and send it back to the user.

Good job writing such a complicated explanation


Though a bit slow and stupid, I’m proud that Kazuma is happily responding to the users of Phobos, and of course me.

It’s not perfect, and there is a lot of room for improvement, but I’ll just leave the project alone for now.

It’s been a journey, and I wish this blog helped you (probably didn’t lol) learn how I made Kazuma a working and living artificial intelligence!

Here’s an example conversation with Kazuma:

Example converstation with Kazuma

P.S. If I screwed up somewhere, either tell me or ignore it, k thx :p

← Back to blogs

© MarsRon 2021-2022