Waiting for User Title Language Retification Project - Part One

Watermelon Consumer
Staff
Developer
Joined
Jan 18, 2018
Messages
169
MangaDex supports the ability to give each series multiple titles/alt-titles. At some point, we also added the ability to mark which language each alt-title is in. The problem is that we had no data for every title added before that point and they were marked as english, which is not very cool.

I did what anyone would do and ran all titles in the database through a unicode character script classifier, then mapped each script to a list of languages that use it and labeled every title with the list of languages it is likely to belong to based on the % of characters in each script. There are many language which use the Latin script so this method is kinda trash for that (we'll possibly get back to that on part two), but I believe we can fix a significant portion of titles in other scripts with this. I've filtered all titles whose current language does not match its script and put it on a google doc so that people can verify it and contribute to the effort. If you've verified/edited a set of titles, comment the row numbers here and I will periodically update the doc.
 
Last edited:
Contributor
Joined
Jan 8, 2023
Messages
971
but the main story's title is : " I'm Doomed if It Can't Be You"



Idk if it is want you want. I hope it helps a little.
 
Contributor
Joined
Jan 8, 2023
Messages
971
I have a question :

Do we need to write just one time the title without the language tag or we need to rewrite it with the alternative titles ?

An example :


Here the english title is not in the alternative titles list. Is it not needed to mark it as an english title ?
 
Contributor
Joined
Jan 8, 2023
Messages
971

純情ドッロプ => Dorropu not cider ??!

 
Watermelon Consumer
Staff
Developer
Joined
Jan 18, 2018
Messages
169
I have a question :

Do we need to write just one time the title without the language tag or we need to rewrite it with the alternative titles ?

An example :


Here the english title is not in the alternative titles list. Is it not needed to mark it as an english title ?
If it's already in the main title, there's no need to repeat it on the alt titles.

Thanks for the contributions, I have edited the doc to filter out the verified entries.
 
Contributor
Joined
Jan 8, 2023
Messages
971






 
Contributor
Joined
Jan 8, 2023
Messages
971



 
Watermelon Consumer
Staff
Developer
Joined
Jan 18, 2018
Messages
169
I don't understand this entry there was just a main title...
I included both main titles and alt titles in the doc since main titles also have a language attribute, though I now realize the front-end does not have support for editing the lang on those. Keep mentioning them and I'll edit them through the mod tools.

0DgQrHe.png
 
Contributor
Joined
Jan 8, 2023
Messages
971
I now realize the front-end does not have support for editing the lang on those
Ooooooh. I need to review all the main titles to make sure...wait I can't review the language tag. :pepehmm:
 
Watermelon Consumer
Staff
Developer
Joined
Jan 18, 2018
Messages
169
you can check them in the API but I don't really expect people to do that so let's ignore that for now :nyoron:
 
Contributor
Joined
Jan 8, 2023
Messages
971
216 -> 246 are all good now.

id 223 (in the google sheet) :
01432726-c13b-4f6f-a53d-98e911faec6a
no title with at this url, maybe this one ?

id 242 : 015d7710-3924-465a-b302-6731773e9ed2 Uzbek language required

I think it's easier to do it like this for me, if it's no good for you I can change.
 
Contributor
Joined
Jan 8, 2023
Messages
971
Btw, I have a little suggestion :

Maybe you can code something like this.

If you have :
  • Counter({'KATAKANA': 1.., AND/OR 'HIRAGANA': 1.., AND/OR 'KATAKANA-HIRAGANA': 1..* AND 'LATIN': 1..*}) => Japanese
  • Counter({'KATAKANA': 1.., AND/OR 'HIRAGANA': 1.., AND/OR 'KATAKANA-HIRAGANA': 1..*}) => It's Japanese
  • Counter({'HANGUL': 1..* }) => It's Korean
  • Counter({'HANGUL': 1..* AND 'LATIN': 1..*}) => It's Korean
  • Counter({'KATAKANA': 1..* AND/OR 'HIRAGANA': 1..* AND/OR 'KATAKANA-HIRAGANA': 1..* AND 'HANGUL': 1..*}) => ERROR
I think it can help a lot with the work.
 
Watermelon Consumer
Staff
Developer
Joined
Jan 18, 2018
Messages
169
Btw, I have a little suggestion :

Maybe you can code something like this.

If you have :
  • Counter({'KATAKANA': 1.., AND/OR 'HIRAGANA': 1.., AND/OR 'KATAKANA-HIRAGANA': 1..* AND 'LATIN': 1..*}) => Japanese
  • Counter({'KATAKANA': 1.., AND/OR 'HIRAGANA': 1.., AND/OR 'KATAKANA-HIRAGANA': 1..*}) => It's Japanese
  • Counter({'HANGUL': 1..* }) => It's Korean
  • Counter({'HANGUL': 1..* AND 'LATIN': 1..*}) => It's Korean
  • Counter({'KATAKANA': 1..* AND/OR 'HIRAGANA': 1..* AND/OR 'KATAKANA-HIRAGANA': 1..* AND 'HANGUL': 1..*}) => ERROR
I think it can help a lot with the work.
Yeah I guess I could count all the japanese scripts together for example to try and make it more accurate. I'd also like to re-run an updated version to get rid of some of the titles which have already been changed in the meantime and maybe make this more digestable, but for now I've gotta wait until database access is available again.
 
Contributor
Joined
Jan 8, 2023
Messages
971
Any update?

I think you can correct a lot of errors.

Also, it would be good to separate the titles who don't have a title in their original language.
 
Watermelon Consumer
Staff
Developer
Joined
Jan 18, 2018
Messages
169
No change on this front for now unfortunately.
 
Joined
May 24, 2023
Messages
4
Let's get this started up again.
12235 is locked.

Alt titleJapanese
Publication date2016
Demographicnone
Forgot something: The MAL link is wrong. The correct one is https://myanimelist.net/manga/113710/BanG_Dream_Yonkoma__Bandori

How is this format? I think it's good for locked entries, especially for those that need additional fixing.


Another one

7958 + romanized japanese
A really weird entry. This is some kind of BG version and there is a YG version? Besides that I could only find out it's been serialized in Big Gangan. Maybe Lymus knows something, he's the only uploader.

And another one

36074

And another one

3687 + japanese I'd say. The "x" from the cover can be written as x, X, × etc. except for when you want to be exact. Then it should be ×, the for whatever reason quiet, i.e. unspoken japanese cross symbol. Or what the stores use.

The updated editing timeframe is ridiculous compared to the measly minutes we had before.

And another one

19938 + romanized japanese

And another one

48165-48167

And another one

The main title of https://mangadex.org/title/b775a6cc-3f91-46da-ba69-41018004b14e is romanized japanese
 
Last edited:

Users who are viewing this thread

Top