Group Leader
- Joined
- Mar 8, 2018
- Messages
- 579
Recently, the behaviour for marking series as read was changed such that series aren't marked as read if they aren't followed. This has caused several annoyances, including read tracking not working for completed series (where there is no point following them), or for the situation where you read some or all chapters of a series, decide you like it and follow, but none of the chapters you've already read have been marked as such.
The suggested change in behaviour is to track read chapters even if they aren't being followed (in other words, revert the change in functionality).
My understanding is that the reason for the change was because the number of database rows for tracking follows was growing out of control. This, to me, seems like an implementation issue, so I thought I'd offer some suggestions on how to mitigate it.
I don't know what MangaDex's database schema looks like, but I assume that the table tracking read chapters has got something like the user ID, the series ID, the chapter ID, and presumably a timestamp. While that's fine from a normalization standpoint, it does lead to a rather large number of rows, as has become the problem. The simplest suggestion would be to stop tracking each chapter's status as a separate row, and instead only store one row per series, with the chapter numbers or IDs stored in a comma delimited string. Generally, string.split() and string.join() operations are quite fast in most languages, and if you're worried about a scenario like the user viewing screens that cover many series (like the follows screen), that can be resolved by caching the parsed results. Just throw the results of parsing that string in an in-memory cache or memcached and just expire the older cached results as required.
A far less drastic alternative: still do everything as now, but just purge read rows for non-followed and non-complete series if the records are older than a certain amount. That covers the scenario where I read a few chapters and then decide I like a series and hit follow, but still gives you the opportunity to keep database size under control.
Another possible idea: it won't help on every series (sometimes there are multiple versions of each chapters, like a LQ and a HQ version), but you can store the "read" record as ranges instead of individual chapters. Give each record a start and end, and then you know the user has read all chapters between those two IDs. If they make a change to remove a read chapter inside a range, you can split the record. If they decide to add a non-contiguous chapter, you just add a new record where the start/end is the same ID. Any time a user adds a new read chapter, first check to see if that ID is contiguous to any existing range, and update that instead of adding a new record.
The suggested change in behaviour is to track read chapters even if they aren't being followed (in other words, revert the change in functionality).
My understanding is that the reason for the change was because the number of database rows for tracking follows was growing out of control. This, to me, seems like an implementation issue, so I thought I'd offer some suggestions on how to mitigate it.
I don't know what MangaDex's database schema looks like, but I assume that the table tracking read chapters has got something like the user ID, the series ID, the chapter ID, and presumably a timestamp. While that's fine from a normalization standpoint, it does lead to a rather large number of rows, as has become the problem. The simplest suggestion would be to stop tracking each chapter's status as a separate row, and instead only store one row per series, with the chapter numbers or IDs stored in a comma delimited string. Generally, string.split() and string.join() operations are quite fast in most languages, and if you're worried about a scenario like the user viewing screens that cover many series (like the follows screen), that can be resolved by caching the parsed results. Just throw the results of parsing that string in an in-memory cache or memcached and just expire the older cached results as required.
A far less drastic alternative: still do everything as now, but just purge read rows for non-followed and non-complete series if the records are older than a certain amount. That covers the scenario where I read a few chapters and then decide I like a series and hit follow, but still gives you the opportunity to keep database size under control.
Another possible idea: it won't help on every series (sometimes there are multiple versions of each chapters, like a LQ and a HQ version), but you can store the "read" record as ranges instead of individual chapters. Give each record a start and end, and then you know the user has read all chapters between those two IDs. If they make a change to remove a read chapter inside a range, you can split the record. If they decide to add a non-contiguous chapter, you just add a new record where the start/end is the same ID. Any time a user adds a new read chapter, first check to see if that ID is contiguous to any existing range, and update that instead of adding a new record.