Implement anti web scraping protections in the site :rejected:

Joined
May 19, 2019
Messages
4
To avoid aggregator sites scraping the site's content consider implementing web scraping protections.

https://stackoverflow.com/questions/3161548/how-do-i-prevent-site-scraping#34828465

https://github.com/JonasCz/How-To-Prevent-Scraping/blob/master/README.md
 

rdn

Forum Admin
Staff
Developer
Joined
Jan 18, 2018
Messages
281
most of the things listed like ratelimiting we already do, other stuff makes no sense because its an arms race that we cannot win or negatively impacts the regular users more than it helps.

thanks for the suggestions, but i think we're fine.
 
Dex-chan lover
Joined
Jan 20, 2018
Messages
1,006
LOL captcha... that's actually best thing if you want to cut your userbase. It's much more annoying for normal user than block for scrapper.
 
Joined
May 5, 2018
Messages
305
"Use google's captcha (which by the way, only detects people that don't let google track them, not bots)
When someone suggests using recaptcha, it's obvious you shouldn't take any advice from them.
Serve your text content as an image
You can frequently change the id's and classes of elements in your HTML, perhaps even automatically. So, if your div.article-content becomes something like div.a4c36dda13eaf0
If you use Ajax and JavaScript to load your data, obfuscate the data which is transferred. As an example, you could encode your data on the server (with something as simple as base64 or more complex with multiple layers of obfuscation, bit-shifting, and maybe even encryption), and then decode and display it on the client, after fetching via Ajax.
🤢 Absolute garbage.
 
Group Leader
Joined
Dec 25, 2019
Messages
91
@programmerx Why on Earth would MD need anti-scraping protection? Aren't MD's servers overloaded enough for us to be glad that some people read on Manganelo?

Fighting the potential of the Internet to share information is a weird idea. Taking advantage of it is much smarter.
You may notice that big movie producers in Hollywood have failed to prevent movie sharing on the Internet with huge trials and lots of DRM, whereas websites like YouTube make heaps of cash by offering video content for free with ads.
Some manga publishers also offer raws to read for free on their website, because it may be an incentive to buy paperback versions. They don't seem to get bankruptcy because of that.

Moreover, let's face technical reality: downloading chapters to read offline is convenient for some users, especially since MD has sometimes downtimes for maintenance and is sometimes overloaded. Why willingly making things harder?
 
is a Reindeer
VIP
Joined
Jan 24, 2018
Messages
3,231
@Halo If user registration is anything to go off of, Batoto had 463k registered users while we just passed 1 million registered users. I think we're well past whatever Batoto was experiencing, and it isn't due to bots
 
Contributor
Joined
Dec 31, 2019
Messages
17,888
Implementing CAPTCHAs would lead to a bigger exodus than that one time a dude parted the Red Sea with a Beyblade.

beybladeredsea.png
 
Fed-Kun's army
Joined
Mar 12, 2018
Messages
933
Grumpy says the registered user count was 481400 (not counting banned/spam accounts). And to excuse him because another raccoon has entered his territory so now he has to fight.
 
Member
Joined
Aug 8, 2018
Messages
1,125
Why are there so many people dead set on making it more difficult for Mangadex readers to actually READ Manga in order to screw over some Web Aggregators that you are probably never going to kill?


Like, even Captchas are guaranteed to get people to leave Mangadex for other sites, not to mention the problems mangadex itself has. And it's not like you can ever defeat Python Scripters or coders who actually know what they're doing.
 
Dex-chan lover
Joined
Jan 17, 2018
Messages
3,198
@Plykiya a desperate change forced by bots. But, from what I heard, not many people are reading old chapters anyway.
My point is that there was a precedent of scrapping actually killing a manga site, thus why there's a need for scrapping mitigation. Which you're already doing.
 
Fed-Kun's army
Joined
Mar 12, 2018
Messages
933
@Halo just asked him,

6:37 PM] Grumpy: also lot of "bots" aren't really nefarious. It's actually people who don't care.
[6:37 PM] Grumpy: tachiyomi, one of apps still very popular
[6:37 PM] Grumpy: was one i hated the most probably
[6:38 PM] Grumpy: but basically when a call to a page fails (like due to rate limiting), it would request again right away
----------
[6:38 PM] Grumpy: and it would request for every page, every thing the user wanted at the same time
[6:38 PM] Grumpy: a single user of tachiyomi making like 1000 simultaneous calls were common.
 
Dex-chan lover
Joined
Jan 17, 2018
Messages
3,198
@blackyawgdom Well yeah, from his announcements it was apparent that "bots" is just a common term for unreasonable scripting, be it done by manga aggregators or apps.
Funny thing, tachiyomi had to implement a rate limit when mangadex/cloudflare began issuing IP bans for their ddosing.
 
is a Reindeer
VIP
Joined
Jan 24, 2018
Messages
3,231
@blackyawgdom Tachiyomi was among the first to have access to our API, and we still contact them every now and again when we update the site and it would break their app.

@Halo Our rate limit isn't anything specific to Tachiyomi, it applies to anyone attempting to access the site or use our API. You obviously don't want to allow anyone, valid or not, to be making 100 requests in a few seconds. The way they coded the app would refresh/download a greater number of manga than our limit (roughly 1 per second, slightly greater for the API), and their users would simply get banned. Tachiyomi users aren't an issue for us and their devs are quite flexible
 

Users who are viewing this thread

Top