Translating Japanese texts about nihonto is still very rudimentary.
I’ve been pouring a lot of work into trying to use OpenAI and Claude to properly detect the characters on the page, and then give a semi-decent translation.
I was messing with some Zufu papers yesterday and for some reason it was struggling to recognize the numbers in the measurements section and was auto converting them to shaku, sun, bu and then converting them back to CM and was completely wrong. No amount of messing with the prompt could fix that so I figure I just need an even higher res photo (I was using 1500x1800) and see if that helps.
AI is great at things like whipping up a quick script to do data manipulations and all of that, it’s still a bit off from being able to translate Japanese sword document. I am working a lot on this as a side project however.
This image is a great example of where AI, or any optical character recognition, will struggle. We can tell pretty easy that this is 21.8cm, however:
AI, OCR, google lens, etc. have given me the following:
111.8cm
3.8cm
3.9cm
22.8cm
1.2.8cm
318cm
Some of those make sense as an easy mistake for AI to make, others dont make sense at all where the LLM came up with the answer it did. The other thing is that the Zufu papers, many Taikans, and many other resources write numbers without using the 十 character. IDK why this is and I do not speak Japanese. Maybe this is normal for certain contexts but when I google "How do I write 22 in Japanese" you'll get 二十二 as an answer. Maybe this is throwing off the engine as well and maybe I can tweak/train the LLM prompts to try and account for this and get better results.