I'm probably going to suggest something incredibly ignorant here... apologies.1. The server filters it out
SELECT * FROM MangaTitles
ORDER BY UserRating
WHERE ContentRating IN ("Safe", "Suggestive" )
AND "Fantasy" IN Genres
AND "Psycological" IN Genres
AND TitleID NOT IN ( /*long ass list of title IDs*/ );
The problem is that you have potentially hundreds of thousands of users that would be performing that action every second. Caching some IDs client side isn't going to help.I'm probably going to suggest something incredibly ignorant here... apologies.
In the search page filters when the new checkbox marked "Exclude Titles in My Library" is enabled, the client sends a request to the server for the ID numbers of all titles in the user's library. This is stored on the client, then when the search is applied it sends all those IDs to the server saying, "Hey, exclude these from your SQL query."
Now I shall demonstrate an example with horrible psudocode from what I remember of my SQL class 20 years ago.
SQL:SELECT * FROM MangaTitles ORDER BY UserRating WHERE ContentRating IN ("Safe", "Suggestive" ) AND "Fantasy" IN Genres AND "Psycological" IN Genres AND TitleID NOT IN ( /*long ass list of title IDs*/ );
Basically, the query is more complicated, but the server only has to run it once for each page of search results.
edit: When I say "stored on the client"... I mean in something like a timestamped site cookie. 12 hour expiration. If the cookie already exists, don't request it again.
I understood the rejection reason being not able to do join-y in/with your current search engine, Elasticsearch. I suggested an alternative to that which could do join-y. Then suddenly 60,000 manga and 3,000,000 users became billions of index recordings (its not like a N-to-N table with RDBM). Only reason I see to index user records, in the search engine, is as a cache technique which would be 3M at most, one for each user - however it would not be my first choice of [user] cache location.Fair, though it doesn't really make any difference.
This is the problem though... That, thousands of times per second, is in fact not cheap at all. And we lose any caching ability for those searches too.
But yeah, I see the approach you're suggesting, and it would work yes. Though as I said before, it's not so much complexity but rather performance that is the problem here.
Maybe we'll experiment with it eventually, but realistically it's quite unlikely still.
Also we wouldn't really need typesense in any meaninful way for that, afaik? A bunch of must/must-not clauses by doc id on ES should do the exact same.
I'm not sure what part of your argument fixes the 10k$ bandwidth costs we'd incur with their cloud though...? Even assuming that manga searches are half of the total weight of searches (I doubt it, but I don't know, to be fair), that's still 5k$ per month 🤔
I looked up traffic statistics for Mangadex to do some math. Supposedly they get 41.4 million visits a month. If we assume every single visitor is performing searches at the exact instant their 12hr cookie expires, this comes to an average of 32 requests per second.The problem is that you have potentially hundreds of thousands of users that would be performing that action every second. Caching some IDs client side isn't going to help.
It didn't, fwiw. I specifically mentioned that my napkin math earlier was excluding user x manga follow statuses. And cloud costs were very high mainly due to egress costs, not live dataset costs.Then suddenly 60,000 manga and 3,000,000 users became billions of index recordings (its not like a N-to-N table with RDBM)
It is and it also isn't. Of course we do have an index on user x manga already. Otherwise it'd be impossible to make users' "updates" pages load. However, we do know that it's still one of the most expensive fetches on the website, by far.Next issue is fetching user library - now, I assumed this was a non-issue since I figured there would already be a performance optimised solution
41.4m visitors != 41.4m visits though. A single visitor does at least one visit. That said, your figure is still much closer to reality. We're nowhere any close to hundreds of thousands of searches per second. Our stats suggest something more on the order of 50-60 searches per second. But that is also a very small part of all the website has to do, so even if it was "cheap" in general, it has to be cheap as one feature of a whole as well.41.4m visitors / 30 days = 1.38 million visits a day
If it were me, I'd just cache the user's library, list info, ratings, etc., and last query results in their session data so you can filter on any user data without hitting the database with a complex query on each search pageview. That's not computationally- or storage-expensive, and it lets you cheaply compute the result set once (hit the DB and then filter against user data in code). It might have some user-facing issues if a user keeps a ton of tabs open and their tab data gets out of sync with the most recently cached session data, and it only lets them do one advanced search at a time per session, but that's hardly the end of the world. I don't know how much DB abuse is due to advanced search, but this would probably drop it quite a bit.I have to clear up a misunderstanding. With "indexed content" I meant just the 60,000 mangas.
Assuming each manga a unique identifier "manga_id", and there is a MySQL table named "user_manga" with "user_id, manga_id" combination, then just query for the list of manga_id's for a given user (this you can cache per session or whatever to reduce repeated querying MySQL, and of course invalidate/refresh on changes). For the search query NOT IN() you just feed that query result in matching the field containing "manga_id".
Typesense has built-in query cache support, but I would assume you would control it yourself.
Therefore, their cloud could still be feasible. But you could also self-host in your existing servers to make-up a cloud solution.
None actually; this was the whole point of using Elasticsearch for us.I don't know how much DB abuse is due to advanced search, but this would probably drop it quite a bit.
While this is also #1 on my list for "things I wished MD would do", literally this entire thread is a dev explaining why MD won't do it.Another option : adding a fiter in advanced search to hide titles given their reading status.
This was also suggested earlier in this thread. The dev said it was a possibility, but gave no indication of any timeline for possible implementation, and I haven't seen anything further on that since. It's not as good as being able to exclude followed manga from a search, but it'd still be a massive improvement over what we have now.I think it would very helpful if you can see the reading status of titles while scrolling through the search function.
Also I wasn't able to find their number of accounts (as opposed to guest users), so I can't comment on that, but everything seems to point at it being at least quite a bit lower.
Forum Statistics
Members: 510,813
Already suggested in this thread.Since it was the one option it sounds like the devs said would likely be possible, just wanna throw my hat in and say that an indicator (for whether or not it's in your library) at least would be very helpful, if actually removing the entries would be too much DB strain.
Yeah, it was mentioned in this thread too (msg #6/dev response #10), I was just chiming in as an additional person saying it'd be appreciated.Already suggested in this thread.