Thanks for the update!
For Sound Effects, there's a 'lazy' method where you type the translation/interpretation of the sound effect at the border of the frame in small font with something like "SFX: [insert translation]" -- this allows you to skip cleaning for those sections without making it too awkward that the cleaning was skipped. Probably not too professional a method, but I see it used reasonably often with scanlations and prefer it over awkwardly chosen typefaces/translations of sound effects that may or may not be type-set convincingly
For multiple sound effects in the same frame... well, you can just let the readers guess 🤣 actually, there's a few methods for helping the reader tell which one is for which set of effects, with font/size choice if there are differences, or proximity with where the SFX are placed, or actually going in and including the SFX in-line with the translations
[you can also just skip cleaning and translating sound effects completely too]
[words that are typed in small letters that are annoying to clean and type-set can be treated the same way -- may or may not need to add in a small asterisk in the frame if it can fit to note which texts belong where]