thaliarchus
@thaliarchus

People occasionally ask whether (or assume that) computers can currently read the kind of manuscripts I study.

The answer at present is: not well enough for the purposes of my field.

If you've a big (and I mean big) Latin work in early print or in a steady textura, it can be time-efficient to train a computer on it. This will give you a transcription that's error-free enough for [sniff] historians.

The more chaotic world of the cursive and cursive-derived script models used for many (often short) Middle English works is another matter.

That's not to say computers can't do this and won't one day do it. But—and here's a factor people often forget—who will pay for it? Can formidable pattern-recognition learn to handle a really gnarly anglicana hand? Very possibly. Does anyone want to put cash down for that? Well.

The field awaits the benevolent billionaire willing to ready AI for a tongue with more than 500 attested ways of spelling the word through, in order to extract money from a field of study where we struggle to get money for pencils.


You must log in to comment.

in reply to @thaliarchus's post:

I was just talking with a friend about whether their OCR can handle a bunch of archaic names and terms for an old story I want to share and we're trying to convert to be ereader-friendly. And that's an order of magnitude easier than working with the actual manuscripts themselves like you're describing, of course.

Mm, yeah. I am no authority at all on these things, but part of me suspects we might find there’s a new plateau and equilibrium, eventually, not one of AI capability but one of can-they-be-botheredness. Tracing a highly variable language out of highly variable handwriting might be something computers can do, but not as easily as spotting cancer in scans of people’s bodies, and there’s a hell of a lot more money in the latter.

Of course, I might be wrong!

not the same thing, but speech to text gets brought up a lot for transcription (by our bosses themselves a lot of the time, lmao). and it’s like, it can’t even get youtube captions right! if nothing else, who’s going to train an stt engine on a bunch of australian legal citations? what about accents? it struggles badly with even the most clear-voiced judges, so no way it can handle an interpreted witness.

maybe once the current ai bubble bursts, people will start finally having reasonable expectations…

Oh yeah. I've a little speech-to-text experience and while I'm glad to have it, it's absolutely not 'ready for the big time' or whatever, and I don't see it getting there soon.