roughly 30 transwoman trying to live her best life
ᓚᘏᗢ
catgirl android


https://pleasepraise.me/Laoruna


SnepShark
@SnepShark
Google's "Cyan" TTS character - And I still remember...
And I still remember...
Google's "Cyan" TTS character
00:00
Google's "Lime" TTS character - Go is a deply complex straegic plame... 0 0 0 0 0
Go is a deply complex straegic plame... 0 0 0 0 0
Google's "Lime" TTS character
00:00

catgirlcock
@catgirlcock
This post contains 18+ content. You can view it if you're over 18.
log in

You must log in to comment.

in reply to @SnepShark's post:

it's always so fascinating to watch this kinda thing break against the input, to me - to watch the front of Natural Expression be stripped away from the generator and reveal the machine beneath, completely unable to cope with a situation it was never designed to handle and yet confidently march on through as though everything is fine because it is fundamentally incapable of recognizing that what it just output is a series of incoherent sounds that would be deeply concerning for anyone to make

it's funny how you can tell it's handled per-sentence because the second it gets to the start of the sentence it mangles even the most basic words like "deeply" and "game", because it's looking ahead to the string of 0's and already getting choked up on them

That's a good point. And it's fucked up somehow - why didn't it recognize it as a huge-ass number and just tell the scientific name for that? And why does an error in one place make it shit up the rest of the sentence?

most text-to-speech engines of the past century used some kind of transformer to convert text to phonemes, and then a book of audio files to play those phonemes in order. so if you typed "look" it would look at that word, break it up into three phonemes (l, ʊ, and k, per the IPA), then just play those three sounds in sequence quickly.

however these text-to-speech engines have a very distinct, "robotic" feeling to them (because they're just playing chopped up audio files of human speakers). in the pursuit of more natural sounding speech, modern text-to-speech engines try to swallow chunks of sentences, or entire sentences at a time and directly convert them to audio data using neural networks ("""ai""", if you will; and i won't)

however, this process relies on well-formed language being put into the system as text. if you put in keysmash garbage, or long strings of repeated characters, the neural network effectively winds up "hallucinating" garbage speech data out of whatever the input is. same phenomenon as chatgpt crapping itself and writing nonsense essays if you give it certain kinds of garbage input.

#1 is either having a stroke, or open-brain surgery and the surgeon is really poking around in there.

#2 is a porn star that has burned out years ago, but still shows up for work every single day.