Option to exclude mangas from my library in advanced search [Rejected] (technical reasons, dev reply)

Nexxurio · Jan 8, 2023

I use advanced search to find new mangas to read, and seeing a lot of mangas that I already have in my library doesn't help, so the option to not see mangas that I already follow would be nice.

Quarnozian · Feb 6, 2023

tristan9 said:
1. The server filters it out

I'm probably going to suggest something incredibly ignorant here... apologies.

In the search page filters when the new checkbox marked "Exclude Titles in My Library" is enabled, the client sends a request to the server for the ID numbers of all titles in the user's library. This is stored on the client, then when the search is applied it sends all those IDs to the server saying, "Hey, exclude these from your SQL query."

Now I shall demonstrate an example with horrible psudocode from what I remember of my SQL class 20 years ago.

SQL:

SELECT * FROM MangaTitles
ORDER BY UserRating
WHERE ContentRating IN ("Safe", "Suggestive" )
AND "Fantasy" IN Genres
AND "Psycological" IN Genres
AND TitleID NOT IN ( /*long ass list of title IDs*/ );

Basically, the query is more complicated, but the server only has to run it once for each page of search results.

edit: When I say "stored on the client"... I mean in something like a timestamped site cookie. 12 hour expiration. If the cookie already exists, don't request it again.

RogueKitsune · Feb 7, 2023

Quarnozian said:
I'm probably going to suggest something incredibly ignorant here... apologies.

In the search page filters when the new checkbox marked "Exclude Titles in My Library" is enabled, the client sends a request to the server for the ID numbers of all titles in the user's library. This is stored on the client, then when the search is applied it sends all those IDs to the server saying, "Hey, exclude these from your SQL query."

Now I shall demonstrate an example with horrible psudocode from what I remember of my SQL class 20 years ago.

SQL:

SELECT * FROM MangaTitles ORDER BY UserRating WHERE ContentRating IN ("Safe", "Suggestive" ) AND "Fantasy" IN Genres AND "Psycological" IN Genres AND TitleID NOT IN ( /*long ass list of title IDs*/ );

Basically, the query is more complicated, but the server only has to run it once for each page of search results.

edit: When I say "stored on the client"... I mean in something like a timestamped site cookie. 12 hour expiration. If the cookie already exists, don't request it again.

The problem is that you have potentially hundreds of thousands of users that would be performing that action every second. Caching some IDs client side isn't going to help.

Lichon · Feb 7, 2023

tristan9 said:
Fair, though it doesn't really make any difference.

This is the problem though... That, thousands of times per second, is in fact not cheap at all. And we lose any caching ability for those searches too.
But yeah, I see the approach you're suggesting, and it would work yes. Though as I said before, it's not so much complexity but rather performance that is the problem here.
Maybe we'll experiment with it eventually, but realistically it's quite unlikely still.

Also we wouldn't really need typesense in any meaninful way for that, afaik? A bunch of must/must-not clauses by doc id on ES should do the exact same.

I'm not sure what part of your argument fixes the 10k$ bandwidth costs we'd incur with their cloud though...? Even assuming that manga searches are half of the total weight of searches (I doubt it, but I don't know, to be fair), that's still 5k$ per month 🤔

I understood the rejection reason being not able to do join-y in/with your current search engine, Elasticsearch. I suggested an alternative to that which could do join-y. Then suddenly 60,000 manga and 3,000,000 users became billions of index recordings (its not like a N-to-N table with RDBM). Only reason I see to index user records, in the search engine, is as a cache technique which would be 3M at most, one for each user - however it would not be my first choice of [user] cache location.
Since billions of indexed records were no longer needed, that ought to reduce cloud costs a lot - again only mentioned this option because of earlier blog releases stated servers needed worldwide, thus would be a requirement.

Next issue is fetching user library - now, I assumed this was a non-issue since I figured there would already be a performance optimised solution, which should be able to used directly into the search engine, but it seems there work to be done on this.
When its costly to do a live/realtime lookup - and assuming data is stored optimally - then caching is way to handle it, which multiple ways to do so depending on requirements.
Scaling is another way, but I believe that would be costlier (like hardware, management and network).
If possible, I would not do client-side caching due to control and sync issues.

In short - Typesense is just a suggestion, because I too want this thread suggestion.

Note: I do not expect any explanation. You are the developers, thus need to make certain choices.

Quarnozian · Feb 8, 2023

RogueKitsune said:
The problem is that you have potentially hundreds of thousands of users that would be performing that action every second. Caching some IDs client side isn't going to help.

I looked up traffic statistics for Mangadex to do some math. Supposedly they get 41.4 million visits a month. If we assume every single visitor is performing searches at the exact instant their 12hr cookie expires, this comes to an average of 32 requests per second.

41.4m visitors / 30 days = 1.38 million visits a day
1.38m / 24hrs = 57,500 visits per hour
57,500 / 60 minutes = 958.33333333 visits per minute
958.33333333 / 60 seconds = 15.972222222 visits per second
16 (rounded) visits per second * 2 cookie requests per day = 32 cookie requests per second.

I don't know how accurate the visits per month number is though. It didn't come from mangadex.

tristan9 · Feb 8, 2023

Lichon said:
Then suddenly 60,000 manga and 3,000,000 users became billions of index recordings (its not like a N-to-N table with RDBM)

It didn't, fwiw. I specifically mentioned that my napkin math earlier was excluding user x manga follow statuses. And cloud costs were very high mainly due to egress costs, not live dataset costs.

Lichon said:
Next issue is fetching user library - now, I assumed this was a non-issue since I figured there would already be a performance optimised solution

It is and it also isn't. Of course we do have an index on user x manga already. Otherwise it'd be impossible to make users' "updates" pages load. However, we do know that it's still one of the most expensive fetches on the website, by far.

Quarnozian said:
41.4m visitors / 30 days = 1.38 million visits a day

41.4m visitors != 41.4m visits though. A single visitor does at least one visit. That said, your figure is still much closer to reality. We're nowhere any close to hundreds of thousands of searches per second. Our stats suggest something more on the order of 50-60 searches per second. But that is also a very small part of all the website has to do, so even if it was "cheap" in general, it has to be cheap as one feature of a whole as well.

Anyway, I think we did the rounds on the main points of contention. The tl;dr is that it's not impossible, and it might even work out better than we fear it would. We'll eventually experiment with it, but we absolutely can't promise anything ahead of time, because we just don't know how expensive or not it will turn out in practice.

SuperUltraMegaGoGoWeeb · Mar 4, 2023

Lichon said:
I have to clear up a misunderstanding. With "indexed content" I meant just the 60,000 mangas.
Assuming each manga a unique identifier "manga_id", and there is a MySQL table named "user_manga" with "user_id, manga_id" combination, then just query for the list of manga_id's for a given user (this you can cache per session or whatever to reduce repeated querying MySQL, and of course invalidate/refresh on changes). For the search query NOT IN() you just feed that query result in matching the field containing "manga_id".
Typesense has built-in query cache support, but I would assume you would control it yourself.

Therefore, their cloud could still be feasible. But you could also self-host in your existing servers to make-up a cloud solution.

If it were me, I'd just cache the user's library, list info, ratings, etc., and last query results in their session data so you can filter on any user data without hitting the database with a complex query on each search pageview. That's not computationally- or storage-expensive, and it lets you cheaply compute the result set once (hit the DB and then filter against user data in code). It might have some user-facing issues if a user keeps a ton of tabs open and their tab data gets out of sync with the most recently cached session data, and it only lets them do one advanced search at a time per session, but that's hardly the end of the world. I don't know how much DB abuse is due to advanced search, but this would probably drop it quite a bit.

tristan9 · Mar 4, 2023

SuperUltraMegaGoGoWeeb said:
I don't know how much DB abuse is due to advanced search, but this would probably drop it quite a bit.

None actually; this was the whole point of using Elasticsearch for us.

All the searcheable documents and their filterable fields are the equivalent of fully preindexed, so all very cheap (at least in comparison to using MySQL for it). This is also why going back to have a MySQL component in our search is something we'd really like to avoid.

Mindofone · Apr 12, 2023

I think it would very helpful if you can see the reading status of titles while scrolling through the search function. Lately I have been clicking on new manga I find interesting, only to find out I’ve already read it and had just forgotten the title. Having something like a small indicator icon on the title in the list view can help with that a bit.

UnnamedPlayer · Apr 12, 2023

Another option : adding a fiter in advanced search to hide titles given their reading status. It could hide title in the reading list as you want, or select only titles in it (to help the search of a forgotten manga, for exemple).

seekermoc · Apr 12, 2023

UnnamedPlayer said:
Another option : adding a fiter in advanced search to hide titles given their reading status.

While this is also #1 on my list for "things I wished MD would do", literally this entire thread is a dev explaining why MD won't do it.

Mindofone said:
I think it would very helpful if you can see the reading status of titles while scrolling through the search function.

This was also suggested earlier in this thread. The dev said it was a possibility, but gave no indication of any timeline for possible implementation, and I haven't seen anything further on that since. It's not as good as being able to exclude followed manga from a search, but it'd still be a massive improvement over what we have now.

BzzBzz · Apr 13, 2023

tristan9 said:
Also I wasn't able to find their number of accounts (as opposed to guest users), so I can't comment on that, but everything seems to point at it being at least quite a bit lower.

Not sure if it's relevant in any way but their forum main page shows...

Forum Statistics
Members: 510,813

Luke_chase · Sep 12, 2023

I assume Anilist uses the same method as the MAL script?
They do have a significant number of users (2.2M) and even more manga titles than MD (96k) as well as a API, so it seems mildly comparable to mangadex? Either way would be curious if anyone knows more about how their search works.
It is different how they just continually add titles to the page as you scroll instead of having multiple pages of results, but idk if that's enough to make a difference.

tristan9 · Sep 12, 2023

They do have a comparable number of accounts, but only a quarter (or so, estimating traffic is more art than science) of the traffic (ie active users).

Notwithstanding that, it remains generally possible, just tricky and we'll need to figure out how we want to approach it if and when we decide to tackle it. We just cannot make commitments yet, to avoid any disappointment later if we decide against it (or just cannot do it in a reasonable way).

gamix · Nov 28, 2023

Hello everyone,

Would it be possible to adapt an extra filter to hide from search results the manga that you already follow? Or, at least, to highlight (or mark) the ones that you already follow?

The issue is like this: I already read around 187 titles and I would like to see what's left for me to read. I don't recognize everything based on image/title, therefore I'm opening a lot of them in a new tab just to check the status of them if I follow or not...

Thanks in advance!

Have a nice day,
Gamix

mod note - merged duplicate thread

solstice258 · Nov 28, 2023

This has been asked before, the devs said it's hard to implement
(see thread)

mod note - merged duplicate thread

Salfaro · Nov 28, 2023

Since it was the one option it sounds like the devs said would likely be possible, just wanna throw my hat in and say that an indicator (for whether or not it's in your library) at least would be very helpful, if actually removing the entries would be too much DB strain.

OisE · Nov 29, 2023

Salfaro said:
Since it was the one option it sounds like the devs said would likely be possible, just wanna throw my hat in and say that an indicator (for whether or not it's in your library) at least would be very helpful, if actually removing the entries would be too much DB strain.

Already suggested in this thread.

Salfaro · Nov 29, 2023

OisE said:
Already suggested in this thread.

Yeah, it was mentioned in this thread too (msg #6/dev response #10), I was just chiming in as an additional person saying it'd be appreciated.

nekoyuuki · Nov 29, 2023

Why don't you load the list of mangas in something like JSON and hold it in LocalStorage, and do the filtering on the frontend?

draxdeveloper · Jan 18, 2024

Hello!
My library have 275 titles, most of then are romcoms. Since I read a lot it's common that I need to search new mangas.
But, when I do an advanced search I end rushing thought a lot of mangas I already have, so I think it would be a nice addition to add an option in the advanced filter to exclude anything that is in your library.

Option to exclude mangas from my library in advanced search [Rejected] (technical reasons, dev reply)

Nexxurio

Quarnozian

Dex-chan lover

RogueKitsune

Lichon

Dex-chan lover

Quarnozian

Dex-chan lover

tristan9

Yuri Enjoyer

SuperUltraMegaGoGoWeeb

Dex-chan lover

tristan9

Yuri Enjoyer

Mindofone

Dex-chan lover

UnnamedPlayer

Dex-chan lover

seekermoc

BzzBzz

Dex-chan lover

Luke_chase

tristan9

Yuri Enjoyer

gamix

solstice258

Dex-chan lover

Salfaro

Dex-chan lover

OisE

Salfaro

Dex-chan lover

nekoyuuki

Dex-chan lover

draxdeveloper

Dex-chan lover

Similar threads

Users who are viewing this thread