one of my strong examples cases for the argument that "actually, everything collapsed a long time ago" is speech recognition.
20 years ago you could buy dragon naturallyspeaking, and it was... pretty good, especially for the time. i had a pirated copy and i was very impressed by it, but I wasn't writing anything at the time.
well, since i now do a lot of writing, I figured, hey, now's a good time to save some RSI and buy a speech recognition app. ah. Hm.
it turns out you can't. They aren't sold. this isn't a product. oh, there's Dragon (they dropped the NaturallySpeaking, because that was too recognizable a brand) but it starts at $700, and there's no trial version whatsoever, you just have to buy it. Let's just say I've trialed the current version anyway, and it's unusably bad.
It's astonishing how bad it is. I swear, it's gotten worse since 1999. It makes constant, egregious errors, and also whatever mechanism it uses to hook into apps makes the entire machine DOG SLOW the whole time it's open. That would be forgivable if it at least had good recognition but it doesn't, it's trash, it's absolute trash.
"Windows has speech recognition built in!" it's worse than Dragon, also incredibly slow, EXTREMELY inconvenient to use, and has obviously not been updated since it first came out in Vista in 2006.
"No no, Microsoft has a NEW speech recognition feature!" I think it's part of Office, or a Store App, something like that. Either way, I've tried it, and it's also incredibly shitty. Can't handle basic sentences. As with all of these, I'm using a broadcast grade Audio Technica headset and speaking with excellent diction, and it still turns "I only found two laptop models that could do this" into "I owe laptop models that code is." They're all this bad.
The speech rec on my Android phone, which I do not pay for, is some clown service that probably relies a lot on mechanical turking somewhere, and, that sucks? But like, it's there. It's already there, I have it, and it has at least a 95% accuracy rate. But it's not designed for long-form writing, mixed with keyboard input (for quick corrections and formatting and linebreaks and whatnot) using a headset for input. It's meant for exactly what it was made for: letting you quickly dash off a text message. It doesn't scale well.
I'm sure Macs have some feature that works better. I don't own one.
Everything else on the market is an API. There are dozens of services, some fairly good, and every single one is meant solely for being integrated into some other app. Nobody will simply sell this to me. Capitalism literally does not work, it does not lead to producing products that we want. Every time we think that's happened, it's only because we've bought something that rolls up a thing we want, and we're willing to close our eyes and pretend the unwanted husk wrapped around it isn't there.
I hate my voice. I've had a low register ever since I was like 11 and it's a source of constant anxiety. But the thing is, I've wanted to do some vtuber shit and figured "what if I super distorted the audio and generated some 90s ass closed captions" like white on black OSD looking shit
All I wanted was a way to somewhat accurately generate text from what I was saying. My requirements were the following:
- not a resource hog
- fast text generation
- runs entirely locally
That last one is important. I don't want to deal with something where if a webservice somewhere goes down I suddenly lose any and all communication options. I'd even have been fine with an API! I'm a developer by day and sex freak by afternoon, evening, and night, so I can handle some API hooks! But all the APIs are for accessing online services! But it's like this post says, we HAD dragon naturally speaking ages ago and it worked! It ran on consumer hardware from the MID 2000S AND WAS MORE RELIABLE AND PERFORMANT THAN OTHER OPTIONS TODAY, EVEN ITS OWN MODERN INCARNATIONS
And yeah, the windows thing SUCKS. It's bad, it's sluggish, it collapses before the end of a sentence, its sentence detection is atrocious, not suitable for anything even remotely real time
Shit SUCKS now
