Cyan's is like:
"With HOUSEFLY DIRECTLY IN THE LISTENERS EAR possible configurations"
Nsfw writer & electronic musician, plural system of ~7 (so far), 24 years old
Icon by https://furry.energy/@YemmieArts
Current icon =/= current fronting member/s
Cyan's is like:
"With HOUSEFLY DIRECTLY IN THE LISTENERS EAR possible configurations"
no she's right. i would rather pronounce it like that, than parse it as an actual wordy number, too
it's always so fascinating to watch this kinda thing break against the input, to me - to watch the front of Natural Expression be stripped away from the generator and reveal the machine beneath, completely unable to cope with a situation it was never designed to handle and yet confidently march on through as though everything is fine because it is fundamentally incapable of recognizing that what it just output is a series of incoherent sounds that would be deeply concerning for anyone to make
me when i'm explaining go while walking and don't notice the path is made of burning coals
it's funny how you can tell it's handled per-sentence because the second it gets to the start of the sentence it mangles even the most basic words like "deeply" and "game", because it's looking ahead to the string of 0's and already getting choked up on them
oh shit you're right! I was wondering why it started to get weird before the number
That's a good point. And it's fucked up somehow - why didn't it recognize it as a huge-ass number and just tell the scientific name for that? And why does an error in one place make it shit up the rest of the sentence?
most text-to-speech engines of the past century used some kind of transformer to convert text to phonemes, and then a book of audio files to play those phonemes in order. so if you typed "look" it would look at that word, break it up into three phonemes (l, ʊ, and k, per the IPA), then just play those three sounds in sequence quickly.
however these text-to-speech engines have a very distinct, "robotic" feeling to them (because they're just playing chopped up audio files of human speakers). in the pursuit of more natural sounding speech, modern text-to-speech engines try to swallow chunks of sentences, or entire sentences at a time and directly convert them to audio data using neural networks ("""ai""", if you will; and i won't)
however, this process relies on well-formed language being put into the system as text. if you put in keysmash garbage, or long strings of repeated characters, the neural network effectively winds up "hallucinating" garbage speech data out of whatever the input is. same phenomenon as chatgpt crapping itself and writing nonsense essays if you give it certain kinds of garbage input.
#1 is either having a stroke, or open-brain surgery and the surgeon is really poking around in there.
#2 is a porn star that has burned out years ago, but still shows up for work every single day.
I'm reminded of those early "make a famous actor read your text with AI!" sites which would break almost immediately and turn into the AI voice mumbling and sputtering like they were dying.
they deleted it
Updated March 21, 2024: We've removed the text-to-speech feature on this story due to a bug with how some of the text was being read.