Alex's     Bin of Thoughts

I Built a Transcription App Over Christmas PTO

Published on Sunday, January 4, 2026 by Alexander Mason

Blog article main image

Image provided by ChatGPT

I’ve always enjoyed good transcription software.

Sometime in the early 2000s, I was introduced to a strange little gem called Talking Snap! Crackle! Pop! Voice-Activated Software. I didn’t find it online, or through school. I found it the way many questionable pieces of software were discovered back then. A CD slipped inside a box of Kellogg’s cereal.

I have no idea why Kellogg’s thought this was a good idea. But I remember being fascinated by it. Not because it was polished or accurate. It wasn’t. I was fascinated because the computer listened. I could say something out loud and the machine would respond. That alone felt magical. I’d envisioned a world where you could speak out loud and things happened on your behalf, not through snapping your finger, but through devices and robots.

That interest and desire never really went away.

Over the years, real-life voice systems matured. Alexa. Google Assistant. Siri. Each iteration got better, more capable, more normalized. And with the recent wave of AI, it started to feel like we were approaching a new threshold altogether.

However, something just feels off, especially on the PC.

Consumer dictation tools never quite hit for me. Sure, Dragon exists, but by an reasonable measure, it’s a hard financial justification. Cloud-first tools felt invasive and somehow unnecessary in this world powerful computers. If my phone can talk to text for me, than my PC should be able to as well. And it does exist today to some degree. Windows offers an out of the box speech-to-text tool. But that’s it, its just the standard, boring, you speak, words appear. And it’s a pain in the ass to use. If you don’t speak, it disappears after 10 seconds… why?

So like any rational tinkerer would do, I started playing with an idea.

Just testing it out, seeing what is possible, not committing to anything and just exploring the inner-workings of how to build such an app.

I worked in pretty loose constraints for the V1 prototype. I just wanted something that was practical and easy-to-use, runs offline, and it should be able to support some type of voice actions.

With that I was off to the races to build something. I worked at-it for few weeks or so in my free time, I learned parts of NVIDIA’s Nemo SDK, python’s QtSide6 library, and got really intimate with my OpenAI Codex / ChatGPT window 😅.

Overtime I learnt many speech-to-text concepts, such as voice activation detection, conceptual biasing, pre-roll buffers, decoding parallelization, and much much more.

In short time, I had a working prototype. It was basic, nothing fancy; it transcribed as you spoke, it handled commands.. It worked, and I was very excited and happy with it. I could write emails, format text, and do limited PC navigation with my voice only. My idea shifted from my brain into code and it was working like I had intended. There is something so satisfactory about that.

My wife eventually caught onto what I was doing and found it interesting as well. That’s when it clicked that this idea has grown some legs. I started asking myself a ton of questions, I’ve been at this inflection point before, and usually my wife shoots it down with some sound logic. But this time she didn’t, whether that was her lack of care, or she thought the idea was sound. I took her acceptance it for what it was, barreled foreward, and decided it was best to ask for forgiveness in this scenario if need be.

To bring my idea from prototype to app, I had a lot of unknown work in-front of me. That’s when Christmas PTO crunch time comes in. I had already taken two weeks off work, so I decided to let the creativity flow. And flow it did.

I ended up rebuilding the prototype in a different programming languages due to UI and packaging issues I kept encountering with Python. I settled on Node, Electron and Electron Builder. The distributables are much more manageable size of ~130mb, compared to the 500mb+ installers I was getting with Nuikta and PyInstaller.. Not to mention the UI developer experience is vastly better.

During the reimplementation I had to make many technical design decisions due to language differences. Namely around threads/workers and the available eco-system support Node.JS / AI development. In the end I ironed it all out and ended up with an app that was vastly superior to my prototype implementation.

With the app code out of the way, now I needed to focus on getting users.

Building and distribution are taken care of using electron-builder and Cloudflare R2

Logically in my brain, I started with the name and branding. What’s the point in an app/product without a solid identity. I identified a name and website that was available. I settled on a theme, on a logo, designed a website, created emails. At this point I have everything short branding wise of a brand book.

Now that I have a brand, I needed to figure out how to get it out there, sell it to others and give them access to their purchases. I wanted some type of license system, but I didn’t want to get burned by handling payments myself. So I settled on Paddle to handle my subscriptions and payments. In the process I dove down the rabbit hole of trying to setup reasonable terms and conditions, privacy policy, and refund policy that wouldn’t bite me in the ass later.

I also needed to be able to send emails to my users/customers. Locked down using Resent and React-Email for that at the moment. I tried to setup Amazon SES, but in the end they rejected my production upgrade for sending limits… probably because I have no established history.

So where am I at right now. I sill need to build out the account portal and the API connections between paddle and the apps backend to handle license management automatically. But I’d expect to wrap those up in the next couple of weeks.

I have so many ideas of where I want to go next, I have a working prototype of AI assistants that are agentic. MCP servers are the next logical step. I want to also improve keyword detection and potentially get started with training some of my own models to offer offline. I also have a few low hanging pre-launch tasks like building docs as well.

Keep an eye out for vyvoice early in Q1. If you’d like to stay in touch with us and get notified when we are ready to launch. You can sign up for the waiting list here and you will be first to know!

-Alex

Add my blog to your RSS feed!