Manga Reverse Image Lookup Service

Group Leader
Joined
Feb 1, 2023
Messages
20
I wrote a service that allows for reverse image searching of manga pages: https://cba.index-0.com/manga/search
You can either enter a direct link to an image or upload the image directly. You can also search via the title.
It works a lot better than saucenao and other reverse image search systems for manga pages.

Everything is open source and can be found on my github.

It uses elastic-search for the vector search database, and a c# web-api to keep track of everything.
There is also a bot that handles reverse image searches in a discord server. You can get the bot via the website above, if you want the lookup feature enabled, DM me on discord and I'll enable it.
image.png


How it works:

The system uses 2 different reverse image search services.
The first is google lens, this allows for looking up the images in googles vast database of loaded images and negates the need for the host of this service to load all of the manga images into their database.
The lens request covers about 90% of the requests. Unfortunately, Mangadex isn't scrape-able by bots, so this method relies on third party sites having the manga on there while also being scrape-able. Once it finds the title of the manga from the other sites, it looks it up via the mangadex API and returns any results that match it.

The second source for the reverse image search is my own custom vector search database. This is handled via an elastic-search database and a vector matching algorithm.

For this, I fetch any recently updated "en" manga and index their pages.

As of Feb 1st 2023, there are 647681 images in the database (which is 572GB of images).
There is an API available for the reverse image search system:

GET: https://cba-api.index-0.com/manga/image-search?path={image-url}
POST: https://cba-api.index-0.com/manga/image-search - The body is a general file post with the image data

These endpoints might change in the future as it technically isn't a public API.


There are a bunch of other features on the same site; (let me know if this isn't allowed here and I'll remove this section)
  • You can read manga on it too. I made that portion of it because I have old eyes and I wanted to add a blue-light filter option to the manga pages. I also have other analytics stuff that I like doing as well.
  • It has an AI Image generator built into it (it's not as good as novel-AI tho), but it works and really likes generating hentai :kek:
  • You can read light novels on it too, most of them are fan translated, and it can generate EPUBs that can be exported to kindle or other e-readers.
  • PWA support for mobile and desktop apps, and "native" mobile apps coming soon ('cause Apple likes to fuck up iOS PWAs by not allowing push notifications and breaking "localStorage" every update).

Yeah, that's about it.
 
Last edited:
Contributor
Joined
Apr 28, 2020
Messages
134
I wrote a service that allows for reverse image searching of manga pages: https://cba.index-0.com/manga/search
You can either enter a direct link to an image or upload the image directly. You can also search via the title.
It works a lot better than saucenao and other reverse image search systems for manga pages.

Everything is open source and can be found on my github.

It uses elastic-search for the vector search database, and a c# web-api to keep track of everything.
There is also a bot that handles reverse image searches in a discord server. You can get the bot via the website above, if you want the lookup feature enabled, DM me on discord and I'll enable it.
image.png


How it works:

The system uses 2 different reverse image search services.
The first is google lens, this allows for looking up the images in googles vast database of loaded images and negates the need for the host of this service to load all of the manga images into their database.
The lens request covers about 90% of the requests. Unfortunately, Mangadex isn't scrape-able by bots, so this method relies on third party sites having the manga on there while also being scrape-able. Once it finds the title of the manga from the other sites, it looks it up via the mangadex API and returns any results that match it.

The second source for the reverse image search is my own custom vector search database. This is handled via an elastic-search database and a vector matching algorithm.

For this, I fetch any recently updated "en" manga and index their pages.

As of Feb 1st 2023, there are 647681 images in the database (which is 572GB of images).
There is an API available for the reverse image search system:

GET: https://cba-api.index-0.com/manga/image-search?path={image-url}
POST: https://cba-api.index-0.com/manga/image-search - The body is a general file post with the image data

These endpoints might change in the future as it technically isn't a public API.


There are a bunch of other features on the same site; (let me know if this isn't allowed here and I'll remove this section)
  • You can read manga on it too. I made that portion of it because I have old eyes and I wanted to add a blue-light filter option to the manga pages. I also have other analytics stuff that I like doing as well.
  • It has an AI Image generator built into it (it's not as good as novel-AI tho), but it works and really likes generating hentai :kek:
  • You can read light novels on it too, most of them are fan translated, and it can generate EPUBs that can be exported to kindle or other e-readers.
  • PWA support for mobile and desktop apps, and "native" mobile apps coming soon ('cause Apple likes to fuck up iOS PWAs by not allowing push notifications and breaking "localStorage" every update).

Yeah, that's about it.
You can read manga on it too. I made that portion of it because I have old eyes and I wanted to add a blue-light filter option to the manga pages. I also have other analytics stuff that I like doing as well.
The only problem there I see is not crediting scan groups properly (like displaying them on the chapter anchors)
X7fV2IA.png

and deleted chapters are still being shown like: https://cba.index-0.com/manga/mangadex-we-started-a-so-sweet-newlywed-life/136898/1 -> https://mangadex.org/chapter/1c419ee1-25f9-4e99-a33c-0039cc65b0f6
 
Group Leader
Joined
Feb 1, 2023
Messages
20
The only problem there I see is not crediting scan groups properly (like displaying them on the chapter anchors)
X7fV2IA.png

and deleted chapters are still being shown like: https://cba.index-0.com/manga/mangadex-we-started-a-so-sweet-newlywed-life/136898/1 -> https://mangadex.org/chapter/1c419ee1-25f9-4e99-a33c-0039cc65b0f6
I've fixed the issue of the scanlation groups not being shown. I was under the assumption that the inserts the groups put into the chapter was enough credit, but I've added it to the chapters list and to the page side bar, hopefully that is enough. They will slowly populate for any already loaded manga whenever someone clicks the "refresh" button on the site, or during the periodic automatic update I preform. Any new manga loaded will automatically have their scanlation groups included.

As for the deleted chapters... I have a soft delete system in place so as to not violate any of the database constraints. When I have time over the weekend, I'll see about implementing something to check for them and soft delete them so they won't be seen on the site.
 
Contributor
Joined
Apr 28, 2020
Messages
134
@cardboard_mf
I've fixed the issue of the scanlation groups not being shown. I was under the assumption that the inserts the groups put into the chapter was enough credit, but I've added it to the chapters list and to the page side bar, hopefully that is enough. They will slowly populate for any already loaded manga whenever someone clicks the "refresh" button on the site, or during the periodic automatic update I preform. Any new manga loaded will automatically have their scanlation groups included.
Yup. That's def enough.
As for the deleted chapters... I have a soft delete system in place so as to not violate any of the database constraints. When I have time over the weekend, I'll see about implementing something to check for them and soft delete them so they won't be seen on the site.
That's fine. I just expressed my concern for what might become a problem in the future, so take your time.
 
Group Leader
Joined
Feb 1, 2023
Messages
20
I recently released an update for this service that also adds in Saucenao results if it cannot find anything on either my database or google lens. It tends to be pretty unreliable, but it is better than nothing.

As of today there are 1,292,845 english manga pages in my reverse image search database, totaling 1.2TB of raw images and the API has serviced over 3k requests. Over 90% of those requests had a relevant manga found with a 70%+ positive match before the saucenao update went into place.
 
  • Wow
Reactions: rdn
Dex-chan lover
Joined
Jan 20, 2018
Messages
1,034
The system uses 2 different reverse image search services.
The first is google lens, blah-blah-blah
The lens request covers about 90% of the requests.

So my only question is... why would I use some middleman for the work I can request from google lens directly? 🤔
 
Group Leader
Joined
Feb 1, 2023
Messages
20
So my only question is... why would I use some middleman for the work I can request from google lens directly? 🤔
Because google lens doesn't have any recent releases since MangaDex isn't scrape-able by google robots. Where as my service has any English release already indexed within 30 seconds* of the manga being released. Also some of the less popular manga don't show up on lens at all.

Also, lens requests take longer and cost more to make requests (unless you're manually searching for them on images.google.com using the reverse image lookups, then it's free but requires at least 4 clicks to resolve. Lens is about $0.50 per 1k requests, my service is free and I also swallow the cost of the lens requests for you). Also google lens doesn't have a discord bot available for it.

The metrics in the original post aren't valid anymore, since lens changed their terms and started charging, I'm now using my service as the primary and lens and sauce nao only if it cannot find a match on my service.

* Depending on if my service is up (self hosted, so it's subject to my internet and power availability)

Edit: I feel like I should expand on this a bit more... This was originally developed for a specific discord where we have a channel where people post manga they've recently read (similar to #spoilers in the MD discord). We had a rule where people were required to post the source of the manga to avoid a bunch of pings of people asking for the source. This was not followed, so I took it into my own hands and made a bot for it. Originally it was just using google lens and it didn't get many of the new releases because they were only on MD, so I added a secondary "fallback" service that indexed recently released EN MD manga and it resolved 99% of those cases where lens didn't have the results. Said fallback service ended up being more reliable for new manga than lens was and lens changed their pricing structure for their free tier so I've switched to my service being the primary and saucenoa + lens being the fallbacks all while swallowing the cost of using the lens and saucenao APIs.

I figured other people might like this, so I put it on my site, and made the code FOSS on github for other people to utilize if they wanted it.
 
Last edited:
Dex-chan lover
Joined
Jan 20, 2018
Messages
1,034
Ok. I was just curious because of "I wrote a service but 90% of work is some 3rd party service" wording.
 
Dex-chan lover
Joined
May 18, 2019
Messages
3,747
I'm a bit lazy to read the sauce so if you don't mind, can you tl;dr your comparison algo for me? Kinda curious but too lazy
 
Group Leader
Joined
Feb 1, 2023
Messages
20
I'm a bit lazy to read the sauce so if you don't mind, can you tl;dr your comparison algo for me? Kinda curious but too lazy
Extreme TL;DR: I have a database of "image signatures" that I compare uploaded images against. The comparison returns a "normalized distance" between the two signatures and anything below a certain threshold is considered a duplicate. I scrape mangadex to generate these image signatures.

Not so extreme TL;DR:
There are a few different things going on here:
I have 2 databases running, The first (postgres) stores a list of every image URL I've processed and an MD5 hash of the file contents, these are then related back to the manga and chapter the image came from on mangadex or other aggregate sites I use. This is used as a first pass to find identical files (does not take compression and resizing into consideration).
The second stores the image signatures as vectors in an elasticsearch database. You can read more about how signatures are generated here.

How do I get the image signatures you ask?
To get these file hashes and signatures, I have a service that runs every 30 seconds or so and fetches any English chapters (without external URLs) from mangadex that have been uploaded since the last time I checked (for resumability incase of a service blackout). I then download all of the pages, generate the file hashes, generate the image signatures, and store them in their respective databases. Both of the databases retain metadata regarding the image their associated with (page #, chapter ID, source ID, originating URL), and those are returned as part of the payload when searching for images.

All of the above is just for my own vector search database. I also have backups that fallback onto using google vision and saucenao in case my service returns nothing. But these days, about 80% of the matches are found using my database first. I plan on releasing metrics regarding these searches in the future. I've just been too busy messing around with ChatGPT to work on it.
 
Member
Joined
Jan 12, 2021
Messages
51
I wrote a service that allows for reverse image searching of manga pages: https://cba.index-0.com/manga/search
You can either enter a direct link to an image or upload the image directly. You can also search via the title.
It works a lot better than saucenao and other reverse image search systems for manga pages.

Everything is open source and can be found on my github.

It uses elastic-search for the vector search database, and a c# web-api to keep track of everything.
There is also a bot that handles reverse image searches in a discord server. You can get the bot via the website above, if you want the lookup feature enabled, DM me on discord and I'll enable it.
image.png


How it works:

The system uses 2 different reverse image search services.
The first is google lens, this allows for looking up the images in googles vast database of loaded images and negates the need for the host of this service to load all of the manga images into their database.
The lens request covers about 90% of the requests. Unfortunately, Mangadex isn't scrape-able by bots, so this method relies on third party sites having the manga on there while also being scrape-able. Once it finds the title of the manga from the other sites, it looks it up via the mangadex API and returns any results that match it.

The second source for the reverse image search is my own custom vector search database. This is handled via an elastic-search database and a vector matching algorithm.

For this, I fetch any recently updated "en" manga and index their pages.

As of Feb 1st 2023, there are 647681 images in the database (which is 572GB of images).
There is an API available for the reverse image search system:

GET: https://cba-api.index-0.com/manga/image-search?path={image-url}
POST: https://cba-api.index-0.com/manga/image-search - The body is a general file post with the image data

These endpoints might change in the future as it technically isn't a public API.


There are a bunch of other features on the same site; (let me know if this isn't allowed here and I'll remove this section)
  • You can read manga on it too. I made that portion of it because I have old eyes and I wanted to add a blue-light filter option to the manga pages. I also have other analytics stuff that I like doing as well.
  • It has an AI Image generator built into it (it's not as good as novel-AI tho), but it works and really likes generating hentai :kek:
  • You can read light novels on it too, most of them are fan translated, and it can generate EPUBs that can be exported to kindle or other e-readers.
  • PWA support for mobile and desktop apps, and "native" mobile apps coming soon ('cause Apple likes to fuck up iOS PWAs by not allowing push notifications and breaking "localStorage" every update).

Yeah, that's about it.

Can this be more detailed?
I mean like we send the image and the system ensures which chapter direction the image is in
 
Group Leader
Joined
Feb 1, 2023
Messages
20
Can this be more detailed?
I mean like we send the image and the system ensures which chapter direction the image is in
I'm not sure what you're asking to be more detailed. I also don't know what you mean by "which chapter direction the image is in"?
If you want to see what chapter a page belongs to on either of the websites you can click the "Link:" section:
md-cba-link-2.png
md-cba-link.png

Or if you're talking about the bot, you can click the "Page" field in the embed if it's a direct MangaDex page link, otherwise, you'll have to search the returned page yourself if it's from a 3rd party site.
md-cba-link-3.png
 
Member
Joined
Jan 12, 2021
Messages
51
I'm not sure what you're asking to be more detailed. I also don't know what you mean by "which chapter direction the image is in"?
If you want to see what chapter a page belongs to on either of the websites you can click the "Link:" section:
md-cba-link-2.png
md-cba-link.png

Or if you're talking about the bot, you can click the "Page" field in the embed if it's a direct MangaDex page link, otherwise, you'll have to search the returned page yourself if it's from a 3rd party site.
md-cba-link-3.png

Nvm bro, I'm using a phone, it turns out that when I pressed the "link", even though it was correct, I was actually bounced to the manga information. I really had to press it until pop up appears.

And when I search, most of them don't tell me which chapter exactly, only the website


btw there is a bug where when you want to return to Home / best guess from the manga information/detail it can't go back there instead it returns to manga information
 
Group Leader
Joined
Feb 1, 2023
Messages
20
Nvm bro, I'm using a phone, it turns out that when I pressed the "link", even though it was correct, I was actually bounced to the manga information. I really had to press it until pop up appears.

And when I search, most of them don't tell me which chapter exactly, only the website


btw there is a bug where when you want to return to Home / best guess from the manga information/detail it can't go back there instead it returns to manga information
Ah, yeah, I've mostly stopped updating the manga portion of https://cba.index-0.com/manga
You should probably use the new site: https://manga.index-0.com/reverse

I think the new site still has the same bug as it's to do with how the routing for importing and then going to the manga's page works. I'll see about fixing it shortly.
 
Contributor
Joined
Jan 22, 2018
Messages
703
Ah, yeah, I've mostly stopped updating the manga portion of https://cba.index-0.com/manga
You should probably use the new site: https://manga.index-0.com/reverse
I wrote a service that allows for reverse image searching of manga pages: https://cba.index-0.com/manga/search
You should prolly report your first post in this thread to ask the mods to include (edit) your new website instead of the old one at the top since some people might just read the first post and start using that one instead(like me)

or maybe put up a redirect?
 
Contributor
Joined
Jan 22, 2018
Messages
703
Another fallback could be Yandex Image Search, Usually if I don't find stuff on google and anywhere else, I find it through Yandex...
 
Group Leader
Joined
Feb 1, 2023
Messages
20
Another fallback could be Yandex Image Search, Usually if I don't find stuff on google and anywhere else, I find it through Yandex...
I've added an alert on the old version of the search that redirects to the new site (kinda half assed it though):
new-page-alert.png


I'll look into the Yandex API as another fallback as long as it doesn't cost too much. Thanks for the suggestions!

I'm also in the process of re-writing the backend of the whole manga portion, including the search to make it more robust. Once I do, I'm going to see about processing the back catalog of manga since I only have from early December until now loaded in the reverse image search database. Just trying to figure out a good way to stay within the ratelimits without it taking months to execute.
 
Group Leader
Joined
Feb 1, 2023
Messages
20
In other news, I just checked the database... There are 3,040,128 unique manga pages indexed so far equating to 2.6TB of data processed by the indexing service with roughly 98% up time since December 28th 2022.
That's 9,846 manga processed with 111,176 chapters.
A further breakdown of the manga: 776 erotica, 1308 pornographic, 6183 safe, and 1573 suggestive.
 
Group Leader
Joined
Feb 1, 2023
Messages
20
Aye, we hit a year of operation for the reverse image search database. It went live (in beta) 2022-12-11 00:40:04 EST so it’s a year and day old at this point. There are currently 4,277,044 images indexed with over 3.7TB of image metadata, signatures, and files stored.
my next major update to it will be to index all of the historical manga from prior to Dec 28th 2022 so I don’t need to rely on saucenao and vision so much, but that’s a rather large undertaking, so it’s been a slow grind.
 

Users who are viewing this thread

Top