psilocervine

but wife city is two words

56k warning


cohost (arknights)
cohost.org/arkmints

cathoderaydude
@cathoderaydude

one of my strong examples cases for the argument that "actually, everything collapsed a long time ago" is speech recognition.

20 years ago you could buy dragon naturallyspeaking, and it was... pretty good, especially for the time. i had a pirated copy and i was very impressed by it, but I wasn't writing anything at the time.

well, since i now do a lot of writing, I figured, hey, now's a good time to save some RSI and buy a speech recognition app. ah. Hm.

it turns out you can't. They aren't sold. this isn't a product. oh, there's Dragon (they dropped the NaturallySpeaking, because that was too recognizable a brand) but it starts at $700, and there's no trial version whatsoever, you just have to buy it. Let's just say I've trialed the current version anyway, and it's unusably bad.

It's astonishing how bad it is. I swear, it's gotten worse since 1999. It makes constant, egregious errors, and also whatever mechanism it uses to hook into apps makes the entire machine DOG SLOW the whole time it's open. That would be forgivable if it at least had good recognition but it doesn't, it's trash, it's absolute trash.

"Windows has speech recognition built in!" it's worse than Dragon, also incredibly slow, EXTREMELY inconvenient to use, and has obviously not been updated since it first came out in Vista in 2006.

"No no, Microsoft has a NEW speech recognition feature!" I think it's part of Office, or a Store App, something like that. Either way, I've tried it, and it's also incredibly shitty. Can't handle basic sentences. As with all of these, I'm using a broadcast grade Audio Technica headset and speaking with excellent diction, and it still turns "I only found two laptop models that could do this" into "I owe laptop models that code is." They're all this bad.

The speech rec on my Android phone, which I do not pay for, is some clown service that probably relies a lot on mechanical turking somewhere, and, that sucks? But like, it's there. It's already there, I have it, and it has at least a 95% accuracy rate. But it's not designed for long-form writing, mixed with keyboard input (for quick corrections and formatting and linebreaks and whatnot) using a headset for input. It's meant for exactly what it was made for: letting you quickly dash off a text message. It doesn't scale well.

I'm sure Macs have some feature that works better. I don't own one.

Everything else on the market is an API. There are dozens of services, some fairly good, and every single one is meant solely for being integrated into some other app. Nobody will simply sell this to me. Capitalism literally does not work, it does not lead to producing products that we want. Every time we think that's happened, it's only because we've bought something that rolls up a thing we want, and we're willing to close our eyes and pretend the unwanted husk wrapped around it isn't there.


psilocervine
@psilocervine

I hate my voice. I've had a low register ever since I was like 11 and it's a source of constant anxiety. But the thing is, I've wanted to do some vtuber shit and figured "what if I super distorted the audio and generated some 90s ass closed captions" like white on black OSD looking shit

All I wanted was a way to somewhat accurately generate text from what I was saying. My requirements were the following:

  • not a resource hog
  • fast text generation
  • runs entirely locally

That last one is important. I don't want to deal with something where if a webservice somewhere goes down I suddenly lose any and all communication options. I'd even have been fine with an API! I'm a developer by day and sex freak by afternoon, evening, and night, so I can handle some API hooks! But all the APIs are for accessing online services! But it's like this post says, we HAD dragon naturally speaking ages ago and it worked! It ran on consumer hardware from the MID 2000S AND WAS MORE RELIABLE AND PERFORMANT THAN OTHER OPTIONS TODAY, EVEN ITS OWN MODERN INCARNATIONS

And yeah, the windows thing SUCKS. It's bad, it's sluggish, it collapses before the end of a sentence, its sentence detection is atrocious, not suitable for anything even remotely real time

Shit SUCKS now


You must log in to comment.

in reply to @cathoderaydude's post:

I'm sure Macs have some feature that works better. I don't own one.

nah they removed the good speech-to-text in 10.13 or so, i was exploring this the other day and comparing my old macs with my new ones. i can use the old ones entirely with voice control if i must, i can't use the new ones at all because they removed critical features

I don't have a Mac either but I have an iPhone and HomePods and let me tell you: Siri's speech recognition is terrible too. The interface for it on the iPhone isn't so bad, but the actual speech recognition is awful, you're lucky to get through one sentence without an obvious error.

I noticed this phenomenon a few years ago when a friend was looking for a cheap and cheerful way to draw on a screen attached to a desktop computer. Like, they wanted a touchscreen with pen support; an art tablet with a display. And it seemed like there was nothing short of a Cintiq. Cheap tablets seemed to have eaten the whole bottom end of that market. I think it might be a little better now, but... It's like the death of point-and-shoot digital cameras, but worse.

Huion is probably the next best. Possibly the current best; Cintiq seemed to get complacent with its market dominance several years back and started suffering this same problem and getting worse about compatibility and customer support, while getting more expensive. Huion is what I see recommended now. I don't know if it's necessarily better with those problems, but at least it costs less to deal with them. (Apparently all full screen tablets tend to have driver problems because operating systems are still just Bad at handling them.)

The supposedly pro-level speech recognition the doctors at my work use is so bad it's a health hazard - it can turn "asymptomatic" into "a symptomatic" or just go completely off the rails and turn "bloodborne pathogen" into "both-bone pack it in". You sort of learn the typical errors over time but we really shouldn't be putting up with that.

oh i straight up think speech recognition for medical records should be illegal, which is why it's funny that clearly 99% of Dragon's market (and all other products that exist) is medical / legal, nearly the WORST places it could be used.

is this why I can't find a transcription job

God it would be so much harder to proofread this kind of drivel vs just having a human transcribe it to begin with, when you're listening live you can check for medication names and such as you go

I'm not sure if Teams live captioning is an nth separate version from Microsoft or the same thing as one of the others but it's not even fit for the purpose of helping someone who can hear but is only vaguely following along, I can't imagine what it's like for someone who is actually deaf. And that's not even counting the small fraction of non-english words used in my workplace which produce universally hideous results

i briefly used Kaldi and Caster to get out in front of a potential RSI issue and it was pretty ok? i wouldn’t say it’s great but it’s definitely better than cubital tunnel syndrome. pretty easy to tune up to handle specific bad recognitions i ran into and it was plenty sufficient to do my actual fucking programming job for a few weeks

You think that example was bad? That one is more of a pleasant side effect of their larger business: supplying legacy machines cause we need them to run major parts of infrastructure like trains, air traffic, oil refineries, truck weight stations, the list goes on. A good half of our cyber infrastructure is built on 32-bit systems, and they are mostly in the hands of privatized industry that won't upgrade up to the last minute with the Year 2038 problem approaching like a freight train.

Sorry to jump in on a slightly old post, but this is something I’m passionate about and I noticed replies mentioning medical dictation software (bad) and lamenting at a lack of transcription jobs (they do exist)!

I’ve been a transcriber for a really long time in the legal sector, so obviously we need super high accuracy. I’ve been involved in multiple trials of new “cutting edge” speech to text - and they are all completely unusable dogshit. I can’t provide specific examples since, you know, my employment contract, but they are astonishingly incoherent. It is straight up faster to type a transcript ourselves than edit those fucking programs’ output. The speaker change detection basically does not work, even when the two speakers have totally different voices, producing giant run-on paragraphs instead of a conversation.

And yet my company is still trying to pivot to AI!! We had to get the union to take them to court to stop compelling us to “edit” STT for less pay instead of just transcribing and they’re still pushing for it!! So not only do secret professional tier programs just not work, bosses will desperately try to push them through anyway.

The point is I genuinely believe functional STT will never happen lmao

Thank you for all this input, that honestly makes me feel better. I figured the expensive professional stuff was trash but I thought it was at least a little bit better. As usual, this is one of those jobs that industry will just never stop trying to automate out of existence even though it's impossible, you simply need to pay somebody a living wage to do it or you will get garbage in garbage out.

For what it's worth, there isn't much of an interface and I haven't used it consistently enough to know how well it "travels" across machines, but Julius has worked surprisingly well, the few times that I tried it.

I think that the terminology used might help to explain why things get worse, though. People used to talk about (as you see in the description for Julius) "continuous speech recognition." Now, the term is "voice recognition," because the Big Money™ is in letting people turn on the lights and other short, constrained commands, not helping people get work done in general.