Pronunciation is a bit more complicated than just the position of the tongue and mouth. The extremely common /n/ (As in "no") is pronounced by releasing air out of the nose, for example. If all you're focusing on is the position of the tongue, you could end up saying /l/ (As in "low") instead, as that has the tongue in the same position
against the alveolar ridge, but is instead pronounced by blocking air that goes through the center of your mouth. /t/, /d/, /s/, /z/, and /ɾ/ (Japanese R or English "Flapped T") are also all pronounced with this same tongue position. (Try saying all of these and you should be able to feel it if you pay attention.)
So, ideally, you'd have to explain the process of pronouncing these consonants as well, but in this scenario neither knows enough of the others' language to do that. And in the first place, most people don't actually consciously know what they're doing to pronounce these sounds. Like, can you, off the top of your head, explain how to pronounce /s/? (You put your tongue against the alveolar ridge, just barely not touching it, and then blow air through the gap.)
Also, as you get older your brain gets used to not needing to hear a distinction between similar consonants or vowels, which can make it harder to hear the difference, making things more difficult, as you can't even tell for yourself if you're saying it correctly until you learn to hear the difference. (You can see this here, where she interprets the /ɾ/ in "Ryouta" as a /d/, as they're both short, voiced alveolar consonants.)