• they/them

@daboross-pottery

For future reference: https://daboross.net/. Blog & RSS feed not yet built as of 2024-09-13.


post-cohost newsletter
buttondown.com/daboross/

lokeloski
@lokeloski
This page's posts are visible only to users who are logged in.

amydentata
@amydentata

Machine translation can only really work as a first draft that is then reviewed by at least one human who knows the language. You can tell when translations and captions are done by machine only, because they suck!


lokeloski
@lokeloski
This page's posts are visible only to users who are logged in.

iiotenki
@iiotenki

The ultimate downfall for machine translation when it comes to game loc is that no matter how advanced the rest of the tech gets or how much better it becomes at covering its tracks when it's bullshitting out of ignorance (which DeepL is especially prone to), the context is rarely self-evident within the raw text itself. As someone who's increasingly hired by some agencies to edit more than translate, if somewhat to my chagrin, I'd even go so far as to say that a game's context is never 100 percent self-contained within those files. LLMs inherently fail at this because their majoritarian approach to translation accuracy can never take into account the "local" realities specific to any project; it's a level of hyper-specificity that makes those large data pools a liability.

It's not just a problem of the machines being unable to play the games themselves to be able to discern that context. These are works belonging in genres in a medium that's decades old. They're all but inevitably in conversation with other games and audiences and discourses that you can't pick up just from processing scripts on their own and you still can't fully discern it even if you play the games themselves. You have to take the material in as an organic response to the circumstances of its creation and the work around it, both within this industry and beyond it.

I know this because when I get hired to proofread and edit other translators' work, I'm so often having to drawn upon my fluency in different genres and their histories to be able to confidently fine tune the translations in the way that they need to be. If, for instance, you want to localize a two-on-two arena fighter, how do you tell an AI to be mindful of the Gundam Vs. arcade games and how those games play when translating tutorials? You can't, because that's contextual knowledge specific to that game that's crucial to parsing its text correctly, but is far too specific for an algorithm to ever bias over the reams and reams of other data that says, "Well, when these words are strung together like this, ordinarily it means this 83 percent of the time." And yet that output would be wrong because it failed to discard that noise and hone in on what's actually relevant to that game and informing its text.

At the risk of sounding haughty, this is not a level of knowledge and discretion that you're likely to get from just playing Japanese games casually or even working in the industry as a localizer in a standard capacity. You have to put in the work to research and experience vast sums of history that never got exported in their time to be able to make those calls. And LLMs will never be able to get there with game localization because to get there, they would have to forsake their biggest selling point and if you have to forsake your biggest selling point, then the rest of the model falls apart in execution and the facade is removed, revealing the same fundamental problems underneath that have always made machine translations a liability for these sorts of professional applications to begin with.

Which is why, as I've said before, I don't fear the technology itself actually coming for my job. The technology will never be able to bring that knowledge and decision making and filtering processes to a game, which is ultimately how I'm able to sell myself as a game translator. The real threat, as it's always been, is the cost-cutters who don't truly understand the industry they're in and will throw us and their own clients under the bus to make a quick buck. That's the battle I'm always fighting and the only way I know how to keep fighting is to just go on doing the work the way I always have and let it speak for itself.


You must log in to comment.

in reply to @lokeloski's post:

That explains the weird wording that shows up, but it overall seems too good for that, if a bit boring? A friend has been streaming it and a lot of the team seems to be from English speaking countries according to their site map. Maybe the early game got more attention or something.

in reply to @lokeloski's post:

backing up MTPE is way worse than starting from scratch, even editing a real janky human translated script is way more consistent and comprehensible.

Even the best mtl I've seen is really dry and sucks the life out of a script.

Can confirm, the only way machine translation actually helps is when a company has trained it with its own documents to translate similar documents. Even then, you need a couple of people to go over it, because the machine is stubborn and sometimes refuses to use the terms the company prefers.
To just feed a game to a machine and use whatever it churns out is just greed and disregard for your customers.

I was thinking the only way a machine translation with a human reviewer would be functional to cover all the constant errors, wrong translations, and inconsistencies machine translations have would be someone: 1. fluent enough in both languages to just translate it, 2. familliar with the material inside and out enough to spot and fix issues, 3. willing to trudge through a book's worth of terrible grammar and rewrite all of it.

At that point just hire someone to translate...

This gets at a core problem that I've started talking about: The issue isn't "how good" the AI tools are at all, and arguing about "how close we are" plays right into the idea that this is an ultra-valuable market. The problem is that, even if it's technically perfect, the software is by definition never going to care about the result and its suitability for the purpose.

If you want to be more precise about it, the problem is fundamental to how large language models work. You can't just feed them an arbitrary metric to judge their output by; they have to work with the metric that was programmed into them at the outset, which is always how closely that output resembles the relevant parts of the corpus. That this is a problem should become apparent the moment you apply this tech to anything that diverges significantly from the corpus, but even if it doesn't, the fact that the text you're working with isn't part of the corpus (why bother translating something the machine's already worked with?) means there's always going to be room for error.

in reply to @iiotenki's post:

Obviously I am on the Tech side in terms of my knowledge and even as someone who finds the tech interesting, it so stupid that any group thinks it can replace a real person doing a proper translation/localization. Like, the tech is obviously an improvement over the very basic system that existed before, but it doesn't take much of using it see the pitfalls and problems that just make it unacceptable for anything that requires any real integrity.

Ironically, I think ChatGPT has actually made it easier to explain to the average layperson the ways in which things like DeepL really aren't suited to anything that needs a real translation.

I think in the past it was harder for someone with no knowledge of either the tech side of it or the language side of it to believe it when people mentioned those tools as not being accurate, because they could go and look up the neat little tools and such people had built to like "auto scanlate" a manga or w/e. The results being convincing enough in a lot of cases I think made it hard for people to understand the ways in which they were off/wrong/or just not up to a standard of quality.

However, with the rise of ChatGPT a lot more people are more away now of the sort of pitfalls these sort of systems have. From being more aware now of how convincingly these systems can produce fake output to also becoming more aware of the incredibly stilted/low quality output they produce even when working "correctly".

Not to give any positive credit to ChatGPT lol, I'm sure it also caused a wave of business idiots to believe that the future is MTL as well.

As Cory Doctorow has stated many times,

"we're nowhere near a place where bots can steal your job, but we're certainly at the point where your boss can be suckered into firing you and replacing you with a bot that fails at doing your job"

Pinned Tags