Would a webscraper be welcomed?

Joined
Jul 2, 2023
Messages
2
Hey, Ive recently joined this Site and I enjoy reading Mangas here. Ive noticed that not all mangas are up to date / some of them appear to be missing, I did not fully look though the rules yet so i though i can just drop here a question. Would there be a need for an automatic scraping tool to scan / keep manga up to date by scanning other pages / Adding new mangas? (If so I would be up to program a small tool which automaticall scrapes and compares the mangas on this page with other manga pages, find missing mangas / chapters and would prepare a package which after human Quality controll could be submitted / approved). Is there a need for a tool such as that one?
 
Dex-chan lover
Joined
Jan 25, 2020
Messages
226
Ive noticed that not all mangas are up to date / some of them appear to be missing, I did not fully look though the rules yet so i though i can just drop here a question.
Be careful, usually the reasons why chapters aren't available in Mangadex are because the team that translate the title don't want the chapters uploaded in the site or they are rips from official sources, so be familiar with the scanlation groups, their restrictions and the title itself or you could be banned
 
Joined
Jul 2, 2023
Messages
2
Be careful, usually the reasons why chapters aren't available in Mangadex are because the team that translate the title don't want the chapters uploaded in the site or they are rips from official sources, so be familiar with the scanlation groups, their restrictions and the title itself or you could be banned
ahh i see, how is the current communication channel? Im new to this, and since im a developer I wanted to help out, are there ways on how i could use my skill?
 
Power Uploader
Joined
Jan 22, 2018
Messages
592
since im a developer I wanted to help out, are there ways on how i could use my skill?
https://forums.mangadex.org/threads/third-party-external-chapter-links-we-want-more.1075416/

I think you should look at this thread, as I am aware they were looking for people to contribute on the bot for third party side of things...

or if it is related to full uploads...

https://forums.mangadex.org/threads/blackhole-scans.1363711/

maybe a script for this one if Roler doesn't end up doing it...
 
Yuri Enjoyer
Staff
Developer
Joined
Feb 16, 2020
Messages
443
ahh i see, how is the current communication channel? Im new to this, and since im a developer I wanted to help out, are there ways on how i could use my skill?
For anything like this, the discord server is the place we generally discuss most things.

We're not entirely against automation, however we have a strong negative bias due to users in the past having goodwill and yet almost invariably fucking up doing it, leaving us to manually clear out the mess on many occasions.

So if you plan to automate uploads on MD (which I don't recommend):
1. Please have a very detailed log, on your end [1], of what you changed (what from, and what to). If you think there's no way your program could ever fuck up so this is unnecessary (or annoying, because yes it's extra tedious work), then just don't.
2. Please actually test your thing. If you plan to upload 10k chapters, then testing 3 of them is not enough. Your test should be, say, a representative 10% of it at least. We have a public dev API exactly for that reason, alongside a frontend plugged on it. And please, even if it's dev it's infra we maintain, so no shitting on it with purposefully garbage data.
3. Ask yourself (and have an answer) as to where the files came from and why you believe that they aren't rule violating ([2] and [3] both)

Now there are other (and better) ways to do automation to help us out. The biggest one is automating not fixes/changes but looking for bad data. For example problems of inaccurate chapter volume numbers, duplicate authors, duplicate titles, ... Automating this is quite difficult, but creativity is a powerful tool. From OCR-ing ToCs on Amazon/Bookwalker to diff-ing lang-to-lang volume number diffs, ... Sky's the limit.

If you're interested in doing that ^ then gather up your findings in well-described reports and report the affected entities on-site so mods can check and fix them. If you do it enough times, mods will get bored of handling your reports and make you a contributor, so you can be the one also applying the fixes. At that point then we can more realistically discuss automated changes.

Sorry if this a bit long/ranty, but it's unfortunately something that has bit us in the past and I dislike the 5am pings about half the site's covers having been incorrectly changed (or whatever similar stories).

---
1. And I cannot stress this bit enough. We don't keep an infinite amount of changelogs on our end. Which is why we have rate-limits for changes, roles to limit what random users can do, and so on. It's not economically sustainable (nor do we have the technical means to, right now) for us to keep the git-like level of changelogs to all data that something like wikipedia would. A lot of changes are hard to auto-undo.

2. Hint: The vast majority of the content you'll find "missing on MD but available on other sites" is rule-breaking. Because nearly all of it is officials or from groups that don't want to be on MD (scanlations don't grow on trees and there's only so many places they can come from; and most groups know about us already)

3. And how you ensured they remained non-rule-violating. Quite a few groups will shamelessly steal officials and pass them off as their own work. So you need to be sure that if you upload 60 chapters of theirs for a series, all 60 are their work and it's not a "they did the first 3 chapters, then got lazy to catch up to raws so passed the next 40 officials as their own, then actually did the work".
 
Last edited:

Users who are viewing this thread

Top