Note: For clarity I will be using the term "AI" here for machine learning models, because that's what the devs themselves are using.
Recently I saw quite a few people heap praises on a game called Vaudeville that supposedly lets you play as a detective interrogating a whole cast of AI voiced and generated characters. While the AI voices were mostly called primitive and unrealistic sounding, people seemed quite impressed with how the AI would try to converse with the player. I was curious, so I checked out some gameplay footage and just based on that I have to say disagree with the notion that "once we get the voices right this will be the future of gaming". In fact, I think the AI generated voices being so terrible is helping to mask an even bigger problem with the game, which is that none of these characters seem to have a "voice".
Now seeing me speak of voices and "voices" might be a bit confusing, so let me elaborate on that a bit. In linguistics and literature analysis when we talk about the characters "voice" we don't just mean what they sound like, but all those different factors that have an effect on what makes their dialogue distinct and uniquely theirs. These include things like age, gender, socioeconomic class, place of birth, education and many, many other things I could name. Simply put, no two characters will ever share the exact same "voice", even if they are both alive during a particular period of time. A rich heiress from New York is not going to speak the same way as a circus acrobat, or a retired big game hunter.
"Voice" is also an important part of how characters tell us about themselves without TELLING us about themselves. A character that has a district Brooklyn accent doesn't need to keep reminding us they are from Brooklyn. Likewise, a character using a highly specialized and obscure professional vocabulary doesn't need to keep telling us they're a lawyer or a doctor. However, when looking at the game Vaudeville it is very clear from even short snippets of dialogue that because the characters are AI generated, they aren't actually capable of possessing a distinct "voice", thus leading to unnatural sounding dialogue where they either feel completely without personality, or resort to clumsy "tell don't show" to remind us about what their character is supposed to be ("As a rich heiress I...").
This extremely obvious lack of "voice" is what I think many people are ignoring when they say the AI generated voices sound unnatural an unemotive. Yes, it's true that the AI voices in the game are bad, but the reason they feel so unnatural is not just because AI can't quite get the voices right. It is also because the AI can't get the "voices" right. If you actually had proper voice actors read the dialogue generated by this game it would still feel artificial, because the characters are literally just reciting their personalities to you from a list of character traits they were given instead of using their "voice" to convey information. Having your heiress occasionally sprinkle an "Oh my!" into her speech can't hide the fact you could swap most of her dialogue with the police chief's and nobody would notice. The only time the characters in Vaudeville feel distinct is when they are very specifically stating things the devs obviously thought were crucial to their character and took pains to make sure they get right, resulting in gripping dialogue in the style of "HELLO I AM THE RICH INVESTOR I LIKE BANKS AND INVESTING MONEY AND I DEFINITELY DON'T LIKE MURDER". Every time the player tries to take the conversation anywhere outside of the default settings of the characters they sound completely identical.
A character written by an actual human being that has an actual "voice" will always feel like that character, even if they only have a simple line of dialogue. Based on what I've seen of the AI Vaudeville uses so far, it disappoints on all levels when it comes to creating authentic and interesting characters. Now granted, my observation is based merely on watching gameplay footage from other people, so perhaps there are amazing parts of this game I simply haven't seen. Somehow I doubt that though, as based on my previous experiences with AI content creation in video games the focus has always been on "more" instead of "better". Imagine being able to talk to anyone about anything in Skyrim! Sounds pretty awesome at first, until you realize there actually isn't much point to talking to anyone about anything, because outside of very limited character parameters none of the characters actually has a distinct "voice" that would make conversation with them interesting.
Will AI ever be able to create characters that have not only a realistic sounding voice but also a "voice" is not something I can predict, not being very well versed on language models myself. However, based on what I've seen so far we are still far, FAR away from automatically generating characters that actually sound authentic. If you're the kind of person who just wants to talk to every NPC to find out what they had for breakfast I suppose even without a "voice" AI characters might be able to scratch that itch for you, but as for creating meaningful narratives with distinct characters you're better off not holding your breath for now.


