TMO Scrapper

Contributor
Joined
Jul 12, 2020
Messages
239
Scrapper for the spanish aggregator TMO - currently visortmo.com - in shitty C# Winforms.

Repo: https://github.com/Ylevo/TMOScrapper

Worth mentioning : it uses sharp-puppeteer and therefore requires & downloads Chromium (around 300 MB). Download location is in AppData/Local/Puppeteer-Sharp. Although this should never happen, if the software crashes unexpectedly, it might leave a chrome process or two hanging around. Those can be killed manually or left alone and they'll be gone next reboot.

Usage is quite straightforward : enter an url such as https://visortmo.com/library/manga/43131/soredemo-ayumu-wa-yosetekuru in the mango URL box, scan the scannies, select which scannies you want to download the chapters of, click on the download button.
Delay is in milliseconds. I recommend ~2000-3000 ms delay atm to avoid hitting the request limit, which will make you wait 3 to 5 seconds, if not more as it will keep retrying until you stop it or it succeeds.
Folder naming follows this format : https://github.com/ArdaxHz/mupl#fil...g---cxxx-vyy-chapter_title-publish_date-group There is never any volume number however, as I've never seen any indicated on their website. Let me know if I'm blind.

Todo :
  • Single chapter download.
  • Range chapters download.
  • Group chapters download.
  • More options, fewer hardcoded things.
  • Refactoring so I can glance at myself in the mirror.
  • Maybe ditch HtmlAgilityPack and switch to puppeteer completely but I'm lazy.
  • Use an actual logger.
Note that I put it on github solely for the sake of transparency. This has no pretension of being well coded/structured.
 
Last edited by a moderator:
Contributor
Joined
Jul 12, 2020
Messages
239
Single/Range/Group are mostly done but I'm missing the volume number for proper upload to MD. I can't seem to find any website with accurate/exhaustive chapter list with volume number. Any idea?

@Roler what did you use for Blackhole Scans? What's your secret? 🫃
 
Contributor
Joined
Apr 28, 2020
Messages
130
Single/Range/Group are mostly done but I'm missing the volume number for proper upload to MD. I can't seem to find any website with accurate/exhaustive chapter list with volume number. Any idea?

@Roler what did you use for Blackhole Scans? What's your secret? 🫃
I auto-matched the manga with MD (of course, there were some mismatches that I had to fix) and then matched the chapter numbers too, and used the volume numbers from MD.
But this can still be inaccurate and you'll have to fix some stuff manually.

I suggest you just leave the user to add volume numbers themselves.
 
Contributor
Joined
Jul 12, 2020
Messages
239
I auto-matched the manga with MD (of course, there were some mismatches that I had to fix) and then matched the chapter numbers too, and used the volume numbers from MD.
But this can still be inaccurate and you'll have to fix some stuff manually.
That's the only solution I had in mind, I was hoping you were a mad genius. Thanks.

I suggest you just leave the user to add volume numbers themselves.
Yeah it's more for personal use and to see if the scrapping itself is usable. I wasn't planning on adding the script for it.
 
Contributor
Joined
Jul 12, 2020
Messages
239
Updated : https://github.com/Ylevo/SpanishScrapper/releases/tag/1.0.14

Added :
  • Single chapter downloading. Accepted URLs : "https://visortmo.com/view_uploads/...", "https://visortmo.com/viewer/...".
  • Range chapters downloading. Check the box and select the chapter range with the same numbering format as TMO's. Exception for chapter number such as 8.02 that becomes 8.20. Starting and ending chapters are included.
  • Group mangos downloading. Accepted URL : "https://visortmo.com/groups/.../.../proyects". Does not create a subfolder with the group name. Includes joint group releases.
  • Mangos skipping for group downloading. Check the "Skip" box and select how many mangos you want to skip downloading. This is mainly to avoid going through dozens of mangos already downloaded after having previously stopped the process midway.
  • Oneshot chapter handling.
Fixed several edge cases and one particular infinite loop (you can tell I'm good at this) when getting the chapter page.
Every page fetching now checks if the ratelimit was hit and wait a few seconds before retrying.

Hopefully I didn't miss any loop-breaking bug/edge case this time.

EDIT : Obviously I did. Fixed mango title cleaning not removing invalid characters like it was supposed to. 🫃

EDIT 2 : Fixed trailing periods on mango title not being removed, creating broken folders.
 
Last edited:
Contributor
Joined
Jul 12, 2020
Messages
239
Updated : https://github.com/Ylevo/TMOScrapper/releases/tag/1.0.16

Changed name, fixed logic. Shamefully realized today that I had been playing their silly little game and overengineered the algo when a very simple solution was at hand. I could even ditch puppeteer at this point but if they change their stupid setup again I might need it for real this time, so I'm not sure.

Still haven't refactored. Adding some real logging would be a great idea too. 🫃
 
୧⍢⃝୨
Staff
Super Moderator
Joined
Jan 7, 2023
Messages
188
I've been wondering if you can add an automatic conversion from WebP images to PNG. Maybe using imagemagick?

Also, for some reason the folders created by the program can't be opened on the Windows terminal through the context menu option on Windows 11, but typing CMD in the address bar will open the folder correctly in the terminal.
 
Contributor
Joined
Jul 12, 2020
Messages
239
I've been wondering if you can add an automatic conversion from WebP images to PNG. Maybe using imagemagick?
Sure. .NET classes should suffice.
Also, for some reason the folders created by the program can't be opened on the Windows terminal through the context menu option on Windows 11, but typing CMD in the address bar will open the folder correctly in the terminal.
What do you mean by "can't be opened"? The terminal doesn't start at all? I'm on Windows 10 but I could check it out in a VM.
 
୧⍢⃝୨
Staff
Super Moderator
Joined
Jan 7, 2023
Messages
188
What do you mean by "can't be opened"? The terminal doesn't start at all? I'm on Windows 10 but I could check it out in a VM.
Sorry, what I meant is that the terminal doesn't open in the directory.
1711472229049.png
 
Contributor
Joined
Jul 12, 2020
Messages
239
Sorry, what I meant is that the terminal doesn't open in the directory.
View attachment 1598
Windows terminal doesn't seem to like brackets : https://github.com/microsoft/terminal/issues/6504 https://github.com/lextm/windowsterminal-shell/issues/35 https://github.com/microsoft/terminal/issues/16024

Works fine with "native" powershell accessible through shift+right click. Upgrading to powershell 7 fixes the issue (you'll have to change the terminal's default shell).
 
Last edited:
Contributor
Joined
Jul 12, 2020
Messages
239
Update 1.1 : https://github.com/Ylevo/TMOScrapper/releases/tag/1.1

Refactored a bit, should hopefully have a more consistent behaviour.

Added :
  • Backup implementation using puppeteer (default implementation uses HtmlAgilityPack without JS). Highly recommend to not use it by default as it's way slower than the no-JS one right now and less reliable due to TMO's fuckeries. It's mainly here as a bruteforce fallback and in case the main implementation doesn't work anymore for some TMO reason.
  • Image conversion to JPEG/PNG/PNG 8 bpp.
  • Image splitting when over 10k pixels in height. Maximum height of slices is 5k (when the image is exactly 10k long).
  • New logger with some colours. Nothing fancy, you still can't scroll up while it's logging, might try to do something about it later.
  • Logging to text files in a "Logs" subfolder. Max size of 50 MB, one log per day (rolling on sizelimit), expires after a week. Can be disabled.
  • More options in a separate window, such as the TMO domain, disable/enable convert/split, and various delays.
  • Windows tooltips (mostly in the options).
  • Fixed several bugs such as not removing forbidden characters in path from group names.
If you wonder why the .exe's size jumped from a few MB to 30 MB, it's because native .NET can't handle the garbage format that is webp to do anything with it, so I had to choose one among many overkill image manipulation libs out there, and I picked Magick.NET. Yes, it's just fucking imagemagick.

Also I never mentioned it but this soft uses .NET 6.0.

Let me know if you find any bug.
 
Contributor
Joined
Jul 12, 2020
Messages
239
Quick fix : https://github.com/Ylevo/TMOScrapper/releases/tag/1.1.1

Fixed/Added :
  • Image conversion not using the format settings.
  • Image conversion/splitting blocking the GUI thread.
  • Verbose logging option.
  • User settings not being persistent across releases. App is now signed.
  • Chromium should also not be downloaded every single release now. Logger will still tell you that it downloaded it at launch, but if you don't see a download progress that means it's just a check.
 
Last edited:
Contributor
Joined
Jul 12, 2020
Messages
239
Another quick fix : https://github.com/Ylevo/TMOScrapper/releases/tag/1.1.2
Mostly important for bulk uploader users with a large name_group_id file. I forgot to replace the spaces with hyphens in titles as I used to. I also found out that it's not even necessary for the bulk uploader to have no space in mango title, the examples are a bit misleading.

Fixed settings and chromium not being really persistent across releases (for sure this time (I fucking hope)).
 
Last edited:

Users who are viewing this thread

Top