Btw, the reason neither version do well at editing images, is because they pipe directly to dall-e afaik. It is not actually them doing the task. I already knew this and was mostly trying to lead you to either show me how very wrong I am, or admit that gpt can't actually translate manga (though it might be able to do ocr if you cut out each bubble/section manually).... long post for brevity
You touched on the real issue - it doesn't actually know where anything is (for ocr it asks probably-dall-e for a summary of the text in the image, but is not told about actual cordinates/regions of anything. Then that makes it have a hard time describe which regions of image to repaint when it is time to exchange with the translated text). And the tools it uses appears to lack training to handle ocr when there are more than one text in the image.
So no, if you want to use it for manga you would at the least want to be cutting out and labeling each dialog by itself.
Interesting to hear that 4o actually got worse at translation though. I havent tried or read anything about 4o so dunno why, but what you described sounded a lot like 3's issues where it forgot it was meant to tl, and eventually forgot the text itself, because of too small token limit. So maybe they actually limited it there too? Though I suppose it could be anything really, I imagine even increasing the limit could have same result, as the prompt to translate becomes a smaller and smaller portion of the convo, until it is flooded out in the opinion of the ai (had to start new sections at times with 4 because what I think were similar issues when I tried to input novels I were reading (and wanted to read ahead with mtl because the translator was worse than raw mtl)).