MangaDex Forums

ArellaVant

I'm pretty sure it works

Tristan__

Remocracy

I ain't doing dumb shit anymore here, I've moved on to professional data stores (Wikidata)

Tristan__

I've moved on to professional data stores (Wikidata)

a-as like a database that stores actual data in production for real world applications?

Remocracy

Yes, I actually wrote a bot that pulls data from MangaDex and the other anidbs to there, so I've amassed real power 😀

Tristan__

no way....
I honestly don't even know how to reply

Remocracy

I actually use code to write actual valuable information now. Turns out having a built in undo function helps a lot

Tristan__

code to write actual valuable information

It's zeroes and ones, you know

Turns out having a built in undo function helps a lot

Jesus christ

Remocracy

The nice thing about MediaWiki is that when you fuck up there are lots of ways to unfuckup. One time my bot added 40 wrong statements in a row. All I had to do was scroll the revision history and then go back to the last working revision and restore it. Another time I deleted statements from 10 thousand items that I should not have deleted, but there was a tool that undid all 10k edits.

Proxymiity

eh. genuinely stupid code.

Tristan__

when you fuck up

there shouldn't be a fuckup in the first place

Remocracy

I mean the thing I was trying to do was incredibly complex and a lot of it was data issues. How was I supposed to know the romanized form of mangas are different for each manga database, and that "revised hepburn romanization" is not the same thing as "romanization"?

Remocracy

there shouldn't be a fuckup in the first place

I can try my best to make code work in the first time or use a test environment, but the problem with something like Wikidata is that I need to hardcode a lot of things, so it's usually much easier to run the code in production and then if it breaks something to just undo it, since undos are cheap.

Proxymiity

Well, first, you get samples of the same data entry from different data providers. Then, you compare the data. It should reveal data issues.

and that "revised hepburn romanization" is not the same thing as "romanization"?

Yes, there are multiple ways to romanize.

Remocracy

Well, first, you get samples of the same data entry from different data providers. Then, you compare the data. It should reveal data issues.

This isn't as easy as it seems because I'd have to normalize my data.

Remocracy

Obviously my code normalized the data but I'd compare the data by letting the bot make edits and then undoing whatever was problematic, or if too many things were bad I'd revert to the version prior to the bot's edits.

Proxymiity

it's usually much easier to run the code in production and then if it breaks something to just undo it, since undos are cheap.

remind me to refuse your job application if my company ever needs data scientists.

Remocracy

remind me to refuse your job application if my company ever needs data scientists.

The thing is that it's not super critical data, so I don't take it super seriously. Also because there are undos available everywhere, it's really hard to footgun myself.

Tristan__

the problem with something like Wikidata is that I need to hardcode a lot of things

t-then don't use it. Use an actual database

so it's usually much easier to run the code in production and then if it breaks something to just undo it, since undos are cheap.

the v3 way
I agree that a deployment undo is cheap, but not if you work with databases.

Remocracy

t-then don't use it. Use an actual database

The reason I'm using Wikidata is that you can edit data manually relatively easily, and their property system makes it so I don't have to define a schema ahead of time.