Slifter

Maker of Board Games and Food

  • They / Them

kuraine
@kuraine

i finished a really fantastically illustrated & written visual novel called South Scrimshaw Part One. it is oodles of art & text presented and narrated like a nature documentary, investigating the biology of an alien planet while focusing on the life of a single whale.

i was absolutely compelled for the ~2 hours it took me to read through. the quality of it is super high.

getting to the credits, i wanted to know who voiced the documentary--there are two primary voices, a british male voice for the primary narration, and a british female voice for parenthetical asides. voice acting in this game was a pretty wild surprise: the game is free, ostensibly a one-person production, with no publisher or external funding.

the unfortunate realization is that there are no voice actors. the credits extensively list every single sound used from freesound.org and its author. it cites every royalty free music track it uses & its additional composer. credits to all the resources are very open and responsibly cited.

for narration: "In-development voice narration and speech synthesized using tools by ElevenLabs"

does this mean that the voice narration is only using synthesis for development & the final release will hire voice actors? given the type of presentation it is, i would hope so, but it's something i can't put my finger on due to how obfuscated the credit is.

out of ~800 overwhelmingly positive reviews on steam, many refer to it being fully voice acted. only a few mention the generated narration feeling a bit stiff, even fewer mark it as a negative addition.

ElevenLabs is not a great company, ethically-speaking. they're specifically creating ai generated speech tools to replace many actual actors' work, which is openly being fought for (even if fraught with their own ethical issues) by unions right now. i hate that they list 'Don't Nod' and "Paradox Interactive' as clients. i hate what it means for the possibilities of companies ostensibly doing good work with creative teams.

this game could have been released without voice acting. it adds to its appeal & charm, but the conflicted feeling i have about it now far outweighs my positive impressions of its contents. i don't know what to do with those feelings, so i'm writing about them here.

who are the actors whose work was used to train the models that then voiced the narration in this game? how much were they paid? why are their roles in the process that made the biology of this game about alien biology not counted amidst an incredibly detailed list of sources for its material?

where does this cross the threshold from microsoft sam to vocaloid to unidentifiable human-like acting

no answers necessary, really, just staring into the void


DieselBrain
@DieselBrain

But as a visual artist, if I reframe this scenario and think of how I’d feel if an otherwise great game by a small/solo dev used generative AI for its art, then my feelings aren’t very positive or forgiving.

The way I see it is, at THIS size, when it’s just an individual really, how is that not a betrayal of both the trust of fans and consumers of the art, but also their peers within the space?

I’m friends with many creatives, across fields. If I released a VN tomorrow, with full AI voice acting, on what planet would that not feel like selling out my peers, my FRIENDS who voice act, to get another selling point for my game? Does that not say to them that I don’t actually value their work? That I value my OWN work so far above theirs that I’m content to use tech DESIGNED to replace them (if not designed USING them?). Does that not say “sorry, but my work is TOO IMPORTANT to stand in solidarity with you”?

At a time where artists of all stripes need to support and advocate for one another, deciding to use generative AI of any sort to fill a creative role on a project is nothing but selfishness. It’s not even the idea that “a VA could have been paid here”, I understand they likely didn’t have the budget for a voice actor atm. But opting for THIS over the plethora of other options sends the message they don’t actually care or value their peers.

(Also if this is meant to possibly be a placeholder, is doing the VA work yourself, or even getting friends or family to provide temp audio not a valid placeholder option?)

I’m sorry if this comes across as harsh but I think this is fundamentally selfish on part of South Scrimshaw’s dev. We need to support our fellow creatives, not undercut them.


You must log in to comment.

in reply to @kuraine's post:

I noticed this myself but I admit I haven't stopped pushing South Scrimshaw to everyone despite it. It's the one big 'ol asterisk next to its title for me, I think.

this has weighed on my mind when thinking about how to integrate voice into my game.

i want to have good comprehensive narration support (for the visually impaired) which means straying close to this - sure the current windows speech API i'm using is more classical stuff but when i port to consoles or mac or something I'd probably have to integrate some sort of AI synthesis that probably is as ethically compromised as 11labs

It's also kind of expected for any game with a visual novel component to have voiced dialogue if there's budget for it, which would mean getting placeholder audio in to test everything out and get a feel for how the script is going to sound. And doing that placeholder stuff with TTS is the most efficient way of doing it.

But in both cases it becomes really easy to go 'this is good enough actually' and then never be able to justify paying a bunch of actors to replace it with real work. It sucks

Even if I go "it'll be fine because I'll train it on my own voice", I know the reality is that these speech synth networks only work at all because they were already trained on a ton of speech that was probably collected without paying anyone, and training it on my own voice is just kind of lending it the appearance of fairness

I am always continually surprised by how large the segment of gamers is that need voice acting in a game for it to be a "real" game to them or for them to even consider a game at all.

I feel like that is part of the current nightmare void of AI gen voices, if your a small developer do you stand on your morals and say no to it entirely, or do you dip your toes in knowing that there's a significant chunk of players who won't even look at your game twice if your game doesn't have VA in it.

Yeah. ElvenLabs-derived stuff has popped up for New Vegas modding stuff too, and I know mod creators who are understandably psyched about being (theoretically) able to integrate not just high-quality voiced characters but voiced expansions for existing characters. Many people consider silent mod characters jarring, and even more people consider headset-mic quality VA even more jarring, so I can understand the appeal of relatively easy, high quality VA at your fingertips. And you're right with the blurry line from Sam to Vocaloid to Elven- from what I spectated the ElvenLabs stuff for New Vegas still requires quite heavy tuning, but maybe that's "improved"

I have similarly messy thoughts.

this also seems like a game that would have a fairly discerning audience

i typed out "i think as long as people are open it’s not a problem" but I also know if someone did that in game full of generative art i would still tell them to go screw. although i guess the latter is more because of the rampant culture of entitled art theft. either way, purposely obfuscating is a bit disingenuous.

we used ʜᴇʟʟᴏ ɪ ᴀᴍ ᴀ ʀᴏʙᴏᴛ tts as block outs on fable: the journey and i didn’t think twice about it. saves time for us and doesn’t waste voice actors time when pacing is adjusted, the story changes or E3 improv happens (although in hindsight maybe you could see this as us minimising costs, too)