well uh ok that might sound like a total downgrade but hear me out. i mean i’ll preface this all with “i am not a linguist at all” (though i do know a basic amount of French, so i’m not totally monolingual at least) and also i don’t know if this already exists and i just don’t know about it. but like.
phrasebooks have been a thing since forever, right. because people who don’t speak the local language need to be able to ask strangers where bathrooms are. phrasebooks are very limited by their nature though: they only have whatever phrases the creators thought would be most useful, and however many phrases they had resources to create and include, and in however many languages they could translate them between. but as a tourist with a phrasebook or something, i would be much more confident in the accuracy of those translations compared to arbitrary text translators, because a human has specifically crafted the phrases for my use case of basic common interactions with strangers.
direct arbitrary text translation obviously isn’t nearly as limited, but it has huge tradeoffs. a big problem is that so so many words are very ambiguous, and yet they don’t do anything to clarify sense. what is a bathroom, anyway? does it have a shower or bathtub? a toilet? both?? hopefully a sink in any case, right? probably? and what’s the accepted difference between bathrooms, washrooms, restrooms, lavatories, and water closets across all English cultures? did you know “car” as a noun has 11 distinct English senses on Wiktionary and one of those is “Deliberate misspelling of cat”‽ what a disaster. if you’re trying to communicate with someone across a language barrier, your translator application needs to be able to query your intent to have a fighting chance at actually succeeding!
surely it’s possible to have a happier middle ground here, somewhere between those extremes. AFAICT, every language has subjects, verbs, and objects in some form, which means there are some very simple constructions which you should (i’m not a linguist!) be able to systematically translate between any given languages. you’d just need an interface that strictly clarifies your meaning / intent for each one. so given one field each for subject, verb, and object, you need to make very specific choices for each of them. i think ideally you’d be able to start with arbitrary text by typing in “bathroom”, and then you’d be forced to choose between like “area for bathing” and “facility for excreting” and maybe some others. obviously there’s still other stuff which needs people smarter than me to figure out an interface for (adjectives, tense, questions, politeness, noun definiteness…) but i feel like workable solutions are possible here.
for a full example of how i envision this working, let’s consider an exchange which a native Canadian English speaker might phrase like this:
Hey, do you know where I can find an accessible washroom? […] Gotcha, thanks!
put through this system from another language however, you would end up with something this:
I seek a public‐access toilet. I require wheelchair accommodations. Can you provide directions? […] I understand the directions. I appreciate the assistance.
yeah it’s quite clunky and verbose, but these very simple constructions are at least far more likely (if not guaranteed) to be directly translatable between any two languages. there’s also no ambiguity like with an arbitrary text translation, where maybe “accessible washroom” would come out sounding like “approachable bathing chamber” or something. it’s also more flexible than a static phrasebook, which might not have had any notion of accessible washrooms at all, and you’d just get everyone pointing you towards the inaccessible washrooms uselessly (for the sake of argument let’s assume you weren’t physically in a wheelchair which made your requirement here obvious — maybe you’re walking around asking on behalf of someone else). it’s sort of a middle point between those two things which feels very useful to me. would this not be useful?
and like, you can even still use machine learning with this! it might feel like you wouldn’t be able to interest any big comptech companies to invest in a project like this, but you could use data aggregation and statistical modelling to take something colloquial like the first quote and suggest multiple sentences like the second quote. i think this is at least a better use‐case for this sort of thing, because the user gets to validate that the output matches their intent before using it, as opposed to translations directly between arbitrary text where intent is never queried. ok now actual linguists can tell me all about how this whole thing is definitely impossible to build in the comments because i’m interested to know