How to fix mojibake: when UTF-8 text displays as gibberish and how to recover it
Mojibake (Japanese for 'character transformation') is the garbled text you see when bytes written in one encoding are read as another. The classic example: UTF-8 bytes interpreted as Latin-1 produce strings like é instead of é. The data isn't corrupt. It's being misinterpreted. Fixing it means knowing what encoding it was written in and what encoding it's being read as.
How this is calculated
Common mojibake scenarios: a MySQL database column was created as latin1 but the application writes UTF-8 bytes into it; a CSV file exported from Excel doesn't include a BOM and is opened as ASCII; an API response declares charset=ISO-8859-1 but actually returns UTF-8. Recovery depends on whether the bytes were transcoded or just mislabeled. If they were only mislabeled, reinterpret with the correct encoding. If they were double-encoded (UTF-8 bytes treated as Latin-1, then encoded to UTF-8 again), you need to reverse the double-encoding step by step. Prevention is simpler than recovery: always declare UTF-8 explicitly in HTTP headers, HTML meta tags, database schemas, and file formats.
Verdict
Mojibake is always caused by a mismatch between the encoding used to write bytes and the encoding used to read them. Fix it by identifying both encodings and reinterpreting correctly. Prevent it by using UTF-8 everywhere and declaring it explicitly.
