How to fix mojibake: when UTF-8 text displays as gibberish and how to recover it

Mojibake (Japanese for 'character transformation') is the garbled text you see when bytes written in one encoding are read as another. The classic example: UTF-8 bytes interpreted as Latin-1 produce strings like é instead of é. The data isn't corrupt. It's being misinterpreted. Fixing it means knowing what encoding it was written in and what encoding it's being read as.

Encoding focus
Fixing mojibake
mojibake
Category
Pitfalls
Common encoding problems and fixes

How this is calculated

Common mojibake scenarios: a MySQL database column was created as latin1 but the application writes UTF-8 bytes into it; a CSV file exported from Excel doesn't include a BOM and is opened as ASCII; an API response declares charset=ISO-8859-1 but actually returns UTF-8. Recovery depends on whether the bytes were transcoded or just mislabeled. If they were only mislabeled, reinterpret with the correct encoding. If they were double-encoded (UTF-8 bytes treated as Latin-1, then encoded to UTF-8 again), you need to reverse the double-encoding step by step. Prevention is simpler than recovery: always declare UTF-8 explicitly in HTTP headers, HTML meta tags, database schemas, and file formats.

Verdict

Mojibake is always caused by a mismatch between the encoding used to write bytes and the encoding used to read them. Fix it by identifying both encodings and reinterpreting correctly. Prevent it by using UTF-8 everywhere and declaring it explicitly.

More Encoding scenarios

Frequently asked questions

How do I convert text to Base64?
Paste your string into the Text field and the Base64 output appears instantly. The tool uses standard Base64 (RFC 4648), so the output is identical to Linux's base64 command and every major language's built-in Base64 encoder.
What's the difference between Base64 and hex encoding?
Both represent binary data as text, but with different alphabets. Base64 uses 64 characters and needs roughly 4 chars per 3 bytes (33% overhead). Hex uses 16 characters and needs exactly 2 chars per byte (100% overhead). Base64 is denser, while hex is easier to read byte by byte.
Why does my UTF-8 text break when converted to binary?
UTF-8 encodes non-ASCII characters as multibyte sequences, so a single emoji or accented letter becomes 2-4 bytes. The binary output will be longer than the character count suggests, that's correct behavior, not a bug.
Is it safe to paste sensitive data into the converter?
Yes. The encoding conversion runs entirely in your browser with JavaScript, nothing is sent to our servers, logged, or stored. You can verify this with your browser's Network tab: no requests fire when you type.
What is URL-safe Base64?
A variant that replaces `+` with `-` and `/` with `_` so the result can be safely placed in URLs without percent-encoding. JWT tokens use URL-safe Base64. Standard Base64 is fine for most other uses.
Can I decode Base64 back to the original text?
Yes, the converter is bidirectional. Paste Base64 into the Base64 field and you'll get the original UTF-8 string back. If decoding fails silently, the input isn't valid Base64 (wrong characters, bad padding, or it was double-encoded).