Please change the way ratings are averaged

Member
Joined
Jun 27, 2018
Messages
265
I think the site managers wanted to create a balanced rating system, but it is flawed in the way it averages the ratings.

Example:

If 3 peoples average vote is 8.0 and I vote 1, the average becomes 6.5.

But why? Did you forget that you can't vote 0 stars?
To get to 6.5 stars, the algorithm has to add together the votes (e.g.
Code:
8 +8 +8 +1 = 25
) and then divide it by the number of votes (i.e.
Code:
24 / 4 =6.25
).

However, it totally neglects that every vote in this system is only worth
Code:
vote number -1
since the lowest number possible is 1, not 0.

So the correct way to average the votes should be
Code:
[(vote a -1) + (vote b -1) + ... +(vote n-1)] / total number of votes
This number would average the 4 votes
Code:
 [(8-1)+(8-1)+(8-1)+(1-1)] /4 = 5.25
to 5.25.
 
Joined
Mar 24, 2018
Messages
1,258
And it changes what exactly?

It's still a ten point scale with nine possible spaces for final ratings to fall in between natural numbers (closed intervals). Just with results above "9.00" replaced with results below "1.00".
 
Joined
Apr 23, 2018
Messages
1,071
@Asriel To reiterate what @mikegnesium said, with an example

First, lets go over your function
Code:
[(vote a -1) + (vote b -1) + ... +(vote n-1)] / total number of votes
Simplified. V = Votes tuple, n = number of votes.
Code:
((V[sub]1[/sub] + V[sub]2[/sub] + ... + V[sub]n[/sub]) - n) / n
(a-b)/b = a/b - 1, (PEMDAS, (a/b)-1, NOT a/(b-1))so we can change this into
Code:
(V[sub]1[/sub] + V[sub]2[/sub] + ... + V[sub]n[/sub]) / n - 1
Which = the same function you claim they already use, minus one. So if C= the current function, then your function is
Code:
C(V) - 1

Now a quick test
1 person votes 1. Average should be 1.
Current function:
Code:
(1) / 1 = 1
Your function:
Code:
C({1})-1 = 0
(1)/1 - 1 = 0

Range of C(V): [1,10]
Range of C(V) - 1: [0,9]
Note the system currently yields 0.0 for null tuples. We both excluded that from the formula.
 
Miku best girl
Admin
Joined
May 29, 2012
Messages
1,441
Maybe you guys should research Bayesian averages and teach me lol
 
Dex-chan lover
Joined
Mar 14, 2018
Messages
215
@Holo maybe you will be interested in this http://www.evanmiller.org/how-not-to-sort-by-average-rating.html
 
Joined
Apr 23, 2018
Messages
1,071
Slightly off subject, but it would be interesting if there were follow recommendations using something like a nearest neighbor algorithm. Steps are, roughly
1) Find other users who have with similar follows and ratings as userX.
2) Find common titles they follow/have rated
3) Calculate a weight based off of how similar their follows/ratings are to userX
4) Take their rating for each series and do a weighted average.

You could do this off of follows alone, but taking the rating into consideration as well allows you to better filter/prioritize series based off of how likely they are to like it judging from similar user.

The more accurate you try to make this, the more complex this gets. Naturally, this doesn't work well/at all for anon users. For them, I would probably find the most common "also following" titles for whatever title they last read. For speed improvements, we usually limit it to the nearest n neighbors of a given or random subset of users.
This can be further expanded to also make recommendations for similar titles (without regard to user X).
I believe there was a decent algorithm originally developed for Netflix that was like this. There may be better alternatives, though I am not sure if your interested in this :p.

A similar thing could be used to give personalized weighted ratings, which reflect the rating we expect you to give it should you read it. That is not what we are trained to expect from the word rating, but it would be pretty cool IMHO. I believe it isn't actually that hard to implement the functionality, but I never did it large scale myself. Not sure if it would be too much load on the cpu for consideration. Or if anyone would be interested in it to begin with.
A search along the lines of "Nearest neighbor filter" or "Nearest neighbor recommendations" should pull it up.
 
Group Leader
Joined
Jan 29, 2018
Messages
613
Uh, it makes no difference. You'd still have to add back the 1, since it's on a 10 point scale from 1-10. In the end, you're still using an arithmetic mean.
so: [(8-1)+(8-1)+(8-1)+(1-1)] /4 +1 = 6.25
 
Joined
Apr 23, 2018
Messages
1,071
Huh, I typed all of that, and still failed to explicitly mention the problem in normal words.

So yes, as @Deathglass said. The problem with the suggested function is that it yields ratings from 0 to 9, while our actual ratings are from 1 to 10 (with an additional 0.0 reserved for null/no rating). There was no change other than erroneously subtracting 1 from the result of the current/in use function.

Another example of your function. 1 user gives a rating of 10. Average should = 10.
Current function:
Code:
10/1 = 10
Your function:
Code:
(10/1) -1 = 9
 
Dex-chan lover
Joined
Jul 4, 2018
Messages
5,172
@firefish5000 your second suggestions sounds like you're trying to turn MangaDex into Google with all the algorithms.

Also, totally unrelated but this is still kinda funny, I googled about PET plastic very repeatedly for the past few days to research for a college assignment and I've been getting nonstop ads about plastic production in YouTube even if I have zero interests in plastic. Talk about software gore.
 
Joined
Jan 29, 2018
Messages
7
@Holo

Bayesian average (for 1..10 score systems) in a nutshell is
* You add an amount of "fake" ratings to every manga. Basically, it's as if a bunch of bots mass voted on everything.
* The amount (x) and score (y) should be the same for all entries.
* The score (x) defines what final scores (the ones visible to users) will be close to when they have a low number of human voters. It could be the average of *all* votes on the database.
* The amount (y) defines how much the fake votes will influence on the final score.

Depending on how the database is, you could have a trigger run every few hours to
* Calculate the global average
* Update the Bayesian average for each manga entry (store on another column).

Updating it in real time isn't probably a good idea.
 

Users who are viewing this thread

Top