How do you talk to your computer ?

Spokenly

I came across spokenly via this YouTube video

At first I was reluctant to install it assuming that I will need to create yet another account just to use the app, but that was not the case.

They have two completely free options: First one where Whisper and Parakeet models are used locally. There are other models that can be run locally, but the size of these models keeps growing bigger and bigger.

In the second free option, if you do not like these models, you may choose other API provider and use your own API keys and that’s all.

The last one is the paid subscription model, where you get :

managed cloud transcription with premium models plus hosted AI text processing and priority support.

I do not think anyone needs these features for personal use so either of the free options are fine. I went with the first one. Everything local.

In case your machine is not powerful enough, then you may choose the second free option where the processing happens on the internet via the frontier models or your own API provider. I have a feeling this will be slightly slower but I have not tested that option.

Spokenly is supported on Mac, Windows and iOS.

Handy

It was recommended on Mastodon recently.

I had seen this recommended somewhere long time ago but I never tried it ¹ till I came across spokenly and decided to give both handy and spokenly a try

Handy is supported on macOS - both Intel and Silicon, Intel and ARM Linux as well as Windows, but no mobile platforms like iOS.

Neither one is supported an Android.

Using the tools

I started first with Handy (This time I did not know about Spokenly, only handy.)

Both the apps have a hotkey that needs to be pressed while you are talking to your computer.

When you release the key, the audio is transcribed by your local models and the text is inserted in your current app at the position of the cursor.

At first I was not sure what key to use for this transcribing. So I selected some weird combination of three four keys, the combination which is not used elsewhere, and I kept forgetting the combination. 😂

I settled on the right option key. This key is not used at all. So it seemed like a perfect use for this hardly used key.

For spokenly, I use the right command key.

For both the tools, there is a visual indicator on your screen which shows us that the tool is listening.

There is a visual indicator when it is capturing the voice. Both apps have this. They are slightly different, but it does not matter.

Both tools work okay but I think handy works a little better than spokenly. Here are some of the examples that will give you some idea what I mean.

Samples

First let us start with Spokenly.

Transcribes Spokenly as Спокон ли?

Once it transcribed spokenly as spokan lee.

Transcribes Transcribes as Транскрайбс

I wonder why it switches to what seems like Russian.

At first I thought it had accidentally switched language Russian, but that was not the case.

I think it has problems with stand-alone words.

Because it correctly transcribed spokenly in the second sentence above.

Экзам <- This was the word exam.

Handy seems to be better. Here are the similar words, but transcribed using handy.

Transcribes Spokenly As Spokenly.

Transcribes transcribes as transcribe.

I tried acronyms like SIG. Worked well. Spokenly, on the other hand, transcribed it like some Russian word. 🤷‍♂️

Also tried Devanagri words like Spardha Pariksha Which also worked well.

Interesting tidbits

I used handy and spokenly for dictating this blog entry.

Where possible I used handy when I was talking about that and spokenly when I was talking about that. For the common ones I prefer and used handy.

I have hardly typed anything in this blog except for minor corrections etc.

Tips

Speak slowly, at least for the Indian users 😄. We tend to speak very fast and probably the tool does not understand. Do not need to be artificially slow, but little bit slower I think is helpful. I never felt that tool was making me slower. It is still faster to talk than to type it out and no mistakes 😜
At first I was talking near the screen almost, but seems like it is not required. Depending on how good your mic is, you can talk from normal distance as well and it works quite well.

I think at that time I was using the older machine and did not think that the machine was capable enough to run the local models. Now that I have a machine with 32GB of memory, I was okay to try local models. ↩︎