Some people know that I'm deaf and I'm really struggling with understanding speech. So figuring out something to make live captioning on-device possible was my holy grail. There were some applications like MyEar, but they sucked ass. Some of them also always worked only in chunks, not allowing for continuous uninterrupted working for tens of minutes.
After stumbling over https://github.com/ggerganov/whisper.cpp/ and it's https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.objc Objective-C example I decided to check out how it works on my iPhone 13 and to attempt to make it usable for myself.
What I found that while base model is unacceptable for me because it takes ~3s to process the full 30 second chunk, tiny.en made it workable, resulting in 0.5-0.7s processings. To boot, it was also way more acceptable than MyEar and similar; worked for me OK.
Some UI changes, making it continuous so it doesn't stop after 30 seconds, etc, etc, and this is the result:
Sorry for vertical video lmao. I'll probably return to testing base model in the future when iPhone 15 Pro will get released. Hopefully it'll lower processing times further.