Implement anti web scraping protections in the site :rejected:

programmerx · May 24, 2020

To avoid aggregator sites scraping the site's content consider implementing web scraping protections.

https://stackoverflow.com/questions/3161548/how-do-i-prevent-site-scraping#34828465

https://github.com/JonasCz/How-To-Prevent-Scraping/blob/master/README.md

rdn · May 24, 2020

most of the things listed like ratelimiting we already do, other stuff makes no sense because its an arms race that we cannot win or negatively impacts the regular users more than it helps.

thanks for the suggestions, but i think we're fine.

BzzBzz · May 24, 2020

LOL captcha... that's actually best thing if you want to cut your userbase. It's much more annoying for normal user than block for scrapper.

firelight · May 27, 2020

"Use google's captcha (which by the way, only detects people that don't let google track them, not bots)
When someone suggests using recaptcha, it's obvious you shouldn't take any advice from them.

Serve your text content as an image
You can frequently change the id's and classes of elements in your HTML, perhaps even automatically. So, if your div.article-content becomes something like div.a4c36dda13eaf0
If you use Ajax and JavaScript to load your data, obfuscate the data which is transferred. As an example, you could encode your data on the server (with something as simple as base64 or more complex with multiple layers of obfuscation, bit-shifting, and maybe even encryption), and then decode and display it on the client, after fetching via Ajax.

🤢 Absolute garbage.

AB2000 · May 27, 2020

@programmerx Why on Earth would MD need anti-scraping protection? Aren't MD's servers overloaded enough for us to be glad that some people read on Manganelo?

Fighting the potential of the Internet to share information is a weird idea. Taking advantage of it is much smarter.
You may notice that big movie producers in Hollywood have failed to prevent movie sharing on the Internet with huge trials and lots of DRM, whereas websites like YouTube make heaps of cash by offering video content for free with ads.
Some manga publishers also offer raws to read for free on their website, because it may be an incentive to buy paperback versions. They don't seem to get bankruptcy because of that.

Moreover, let's face technical reality: downloading chapters to read offline is convenient for some users, especially since MD has sometimes downtimes for maintenance and is sometimes overloaded. Why willingly making things harder?

Halo · May 27, 2020

@AB2000

Why on Earth would MD need anti-scraping protection?

Perhaps so it won't get batoto'd by bots?

Plykiya · May 28, 2020

@Halo If user registration is anything to go off of, Batoto had 463k registered users while we just passed 1 million registered users. I think we're well past whatever Batoto was experiencing, and it isn't due to bots

Halo · May 28, 2020

@Plykiya
Guests were able to search on Batoto 🙂

justforthelulz · May 28, 2020

Implementing CAPTCHAs would lead to a bigger exodus than that one time a dude parted the Red Sea with a Beyblade.

Plykiya · May 28, 2020

@Halo and they still needed to login to read old chapters :^)

blackyawgdom · May 28, 2020

Grumpy says the registered user count was 481400 (not counting banned/spam accounts). And to excuse him because another raccoon has entered his territory so now he has to fight.

crazybars · May 28, 2020

Why are there so many people dead set on making it more difficult for Mangadex readers to actually READ Manga in order to screw over some Web Aggregators that you are probably never going to kill?

Like, even Captchas are guaranteed to get people to leave Mangadex for other sites, not to mention the problems mangadex itself has. And it's not like you can ever defeat Python Scripters or coders who actually know what they're doing.

Halo · May 28, 2020

@Plykiya a desperate change forced by bots. But, from what I heard, not many people are reading old chapters anyway.
My point is that there was a precedent of scrapping actually killing a manga site, thus why there's a need for scrapping mitigation. Which you're already doing.

blackyawgdom · May 28, 2020

@Halo just asked him,

6:37 PM] Grumpy: also lot of "bots" aren't really nefarious. It's actually people who don't care.
[6:37 PM] Grumpy: tachiyomi, one of apps still very popular
[6:37 PM] Grumpy: was one i hated the most probably
[6:38 PM] Grumpy: but basically when a call to a page fails (like due to rate limiting), it would request again right away
----------
[6:38 PM] Grumpy: and it would request for every page, every thing the user wanted at the same time
[6:38 PM] Grumpy: a single user of tachiyomi making like 1000 simultaneous calls were common.

Halo · May 28, 2020

@blackyawgdom Well yeah, from his announcements it was apparent that "bots" is just a common term for unreasonable scripting, be it done by manga aggregators or apps.
Funny thing, tachiyomi had to implement a rate limit when mangadex/cloudflare began issuing IP bans for their ddosing.

Plykiya · May 28, 2020

@blackyawgdom Tachiyomi was among the first to have access to our API, and we still contact them every now and again when we update the site and it would break their app.

@Halo Our rate limit isn't anything specific to Tachiyomi, it applies to anyone attempting to access the site or use our API. You obviously don't want to allow anyone, valid or not, to be making 100 requests in a few seconds. The way they coded the app would refresh/download a greater number of manga than our limit (roughly 1 per second, slightly greater for the API), and their users would simply get banned. Tachiyomi users aren't an issue for us and their devs are quite flexible

Implement anti web scraping protections in the site :rejected:

programmerx

rdn

Forum Admin

BzzBzz

Dex-chan lover

firelight

AB2000

Halo

Dex-chan lover

Plykiya

is a Reindeer

Halo

Dex-chan lover

justforthelulz

Plykiya

is a Reindeer

blackyawgdom

Fed-Kun's army

crazybars

Member

Halo

Dex-chan lover

blackyawgdom

Fed-Kun's army

Halo

Dex-chan lover

Plykiya

is a Reindeer

Similar threads

Users who are viewing this thread