In 2023 I designed and built Pug Witch. Pug Witch is a AI fortune teller that lives in a 1920's GE fridge that can listen, understand and respond to participant's inqueries. This is how it works:
A background python process detects whether a particpant opens the door using a small limit switch and executes an Unreal Engine application.
The Unreal Engine application connects to the local python process using an I/O pipe.
A cached text to speech audio clip is played out of two 50 watt speakers to greet the participant.
The participant speaks to the fridge where a microphone is placed.
A microphone is continously listening to the participant and live transcribed their speech into text.
The transcribed text is streamed from python background process to the Unreal Engine application where it's displayed to the user in the form of a niagara particle system.
Once detectable speech stops for a threshold of time, that text is inputted into a LLaMA 13B parameter model that's cached into memory, and inference is performed using llama.cpp.
The output of the LLaMA model is streamed live from the python background process to the Unreal Engine application that's then displayed to the screen using a niagara particle system.
The text output is also inputted into a text to speech model that generates a series of audio WAV clips that automatically get played out of the 50 watt speakers.
The participant can continue to interact with the AI repeating steps 4 - 9.
When the participant closes the door, we kill the unreal engine application and any other background process in order to save battery.
2 X 24V Battle Born Lithium connected in series to make 48V
Batteries charged by 1400 watts of solar
800 watt 48V Victron Energy Direct Inverter
Nvidia 3060 GPU with custom made copper waterblock
AMD Ryzen 7 CPU
32GB of RAM
4k 32 inch Samsung LCD panel
Arduino Uno
python 3.10.x
Unreal Engine 5.x C++
First the fridge was captured and measured using photogrammetry via AliceVision. About 100 photos were taking using a Nike D5100 DLSR camera.
This scan was then imported into blender to generate a low resolution model and to make measurements for an Onshape CAD file.
The CAD file was generated in order to determine the placement of all the components and how they could slide out to change the batteries. The CAD file can be accessed here:
This is the module that slides into the fridge from the top.
The software is split into two components:
Python backend
Unreal Engine frontend
Originally I started the project trying to implement everything in C++ by integrating various AI/ML libraries directly into the engine as a C++ plugin. However, all of the best AI/ML are implemented as python APIs. Furthermore, the iteration time was quite slow even with hot reloading, so I ended up just rewritiing most of it in python and essentially streaming content back and forth between the python backend and the unreal frontend using pipes.
I'm not the biggest fan of Python as I prefer statically typed languages. However I learned that you can tell the compiler to use strict type checking and linting which improves the overal development experience of the language.
Another frustrating aspect of the language is the aspect of threading versus multithreading. Python's threading implementation seems to be slow due to GIL, so if your doing any heavy calculations, loading or waiting across multiple threads, the behaviour of the main thread seems to be extremely slow and inconsistent. So I ended up doing all the heavy processing on completely separate processes using Python's multithreading framework and just managing the inputs/outputs using a series of queues and threads that stage/unstage the data.
The token outpu rate of llama-cpp-python implementation was not bad. In order to perform text to speech faster I would split up the output of the model live into sentence chunks, then individually run those through TTS to generate audio clips of those sentences faster before the model was done talking.
On the frontend there is still a C++ plugin that stages/unstages data streaming back and forth from the python backend and this is done across two separate send/receive threads that parse and interpret the messages.
At first I didn't really enjoy Niagara because I preferred Unity's Visual Effect Graph's workflow and UX design. With Niagara it feels like your constantly digging and exiting different fullscreen UX contexts which can be difficult sometimes to understand overall flow of things. But after awhile I became atuned to the madness.
Construction took several months as I was also working full time at Unity throughout the project. I learned a lot and made a ton of mistakes which required rebuilding and replacement several components throughout the module. Here are some photos from the build:
The chassis module that slides into the fridge from the top which will mount all the components.
Laying out the components and starting to think about how I'm gonna fit everything inside the volume.
Replacing the air cooling heat sink with the custom copper water cooling block mount. I also added some additional aluminum heatsinks to all the memory modules.
The computer module on a mini-ITX board.
Quick release watercooling connectors in order to separate the module from the fridge.
Filling the tubes with water and testing the pump.
The whole module completed 🙃
Rewiring/soldering fans into series.
Top of the fridge running.
The system is completely sealed, but because there is a lot of heat generated from the processor, GPU, powersupply and inverter. Therefore, in order to keep the system sealed while also dissapating heat we use two radiators, an internal radiator and an exterior radiator.
The internal radiator keeps the interior air circulating and cools it down using the incoming cold water from the exterior radiator while the exterior radiator cools down the warm water coming up from the interior.
The project was taken to a music festival where there is no power and internet and the conditions are very dusty. The entire system was sealed using caulk and a series of gaskets across the doors.
It's a bit heavy.
The conditions are a bit rough, but the internals are completely sealed from the outside and the heat is dissapated through the radiator.
There is no doubt that this project was challenging and completely overkill for the application. I could have done this all on a server and streamed the content down to a Raspberry Pi through a starlink antenna. However, I learned a ton throughout the project and I'm glad I took the harder road. However, next time I will just do all the processing on a server and beam it down 😉
I predicted that I may struggle with noise isolation and performing consistent speech to text. However even with Nvidia's RTX Voice and muffling the microphone the system still struggled to capture prompts consistently. I would like to try OpenAI's latest whisper implementation as maybe that would improve the consistency. However, next time I'd probably need to build some physical device that covers the user's mouth when speaking to further isolate their voice.
Even with 100 amp hours of lithium battery and the various optimizations I performed throughout the software and OS, it would only run for about 3-5 hours depending on how often people used it. The processing requirements in order to execute an Unreal Engine and 13B parameter LLaMa model are just too great.
Furthermore, I didn't have time to implement a charging port for the fridge, so taking the batteries out to charge them by the solar panels and putting them back in was labor intensive. Doing the processing in the cloud would have greatly improved the overall operation of the project.
I rigged the characters on screen, but I ran out of time to do any facial animation or hand gestures, so besides the particle system of the transcription, the screen was pretty static.
I definitely struggled with some race conditions while sharing state across the python backend and Unreal frontend. Furthermore, there were some very challenging bugs in some of the libraries that I used.
Often users would just open the door and not understand what to do. So I had to write some code on site in order for the AI to greet the user and describe that they are talking to an AI and that they should speak clearly into the microphone.