MangaDex Forums

ArellaVant
ArellaVant
I'm pretty sure it works
Tristan__
Tristan__
637313337557909516.png
Remocracy
Remocracy
I ain't doing dumb shit anymore here, I've moved on to professional data stores (Wikidata)
Tristan__
Tristan__
I've moved on to professional data stores (Wikidata)
a-as like a database that stores actual data in production for real world applications?
837376271230697502.png
Remocracy
Remocracy
Yes, I actually wrote a bot that pulls data from MangaDex and the other anidbs to there, so I've amassed real power 😀
Tristan__
Tristan__
no way....
I honestly don't even know how to reply
Remocracy
Remocracy
I actually use code to write actual valuable information now. Turns out having a built in undo function helps a lot
Tristan__
Tristan__
Remocracy
Remocracy
The nice thing about MediaWiki is that when you fuck up there are lots of ways to unfuckup. One time my bot added 40 wrong statements in a row. All I had to do was scroll the revision history and then go back to the last working revision and restore it. Another time I deleted statements from 10 thousand items that I should not have deleted, but there was a tool that undid all 10k edits.
Proxymiity
Proxymiity
eh. genuinely stupid code.
Tristan__
Tristan__
Remocracy
Remocracy
I mean the thing I was trying to do was incredibly complex and a lot of it was data issues. How was I supposed to know the romanized form of mangas are different for each manga database, and that "revised hepburn romanization" is not the same thing as "romanization"?
Remocracy
Remocracy
there shouldn't be a fuckup in the first place
I can try my best to make code work in the first time or use a test environment, but the problem with something like Wikidata is that I need to hardcode a lot of things, so it's usually much easier to run the code in production and then if it breaks something to just undo it, since undos are cheap.
Proxymiity
Proxymiity
Well, first, you get samples of the same data entry from different data providers. Then, you compare the data. It should reveal data issues.
and that "revised hepburn romanization" is not the same thing as "romanization"?
Yes, there are multiple ways to romanize.
Remocracy
Remocracy
Well, first, you get samples of the same data entry from different data providers. Then, you compare the data. It should reveal data issues.
This isn't as easy as it seems because I'd have to normalize my data.
Remocracy
Remocracy
Obviously my code normalized the data but I'd compare the data by letting the bot make edits and then undoing whatever was problematic, or if too many things were bad I'd revert to the version prior to the bot's edits.
Proxymiity
Proxymiity
it's usually much easier to run the code in production and then if it breaks something to just undo it, since undos are cheap.
remind me to refuse your job application if my company ever needs data scientists.
Remocracy
Remocracy
remind me to refuse your job application if my company ever needs data scientists.
The thing is that it's not super critical data, so I don't take it super seriously. Also because there are undos available everywhere, it's really hard to footgun myself.
Tristan__
Tristan__
the problem with something like Wikidata is that I need to hardcode a lot of things
t-then don't use it. Use an actual database
so it's usually much easier to run the code in production and then if it breaks something to just undo it, since undos are cheap.
the v3 way
I agree that a deployment undo is cheap, but not if you work with databases.
Remocracy
Remocracy
t-then don't use it. Use an actual database
The reason I'm using Wikidata is that you can edit data manually relatively easily, and their property system makes it so I don't have to define a schema ahead of time.
Top