UTF-8 vs ASCII: why UTF-8 won and when ASCII still matters
ASCII maps 128 English characters to 7-bit values. UTF-8 is a variable-width encoding that is backwards-compatible with ASCII: every valid ASCII string is also a valid UTF-8 string. UTF-8 can represent every character in Unicode (over 140,000 and growing) using 1 to 4 bytes per character. ASCII-only systems are vanishingly rare in 2026.
How this is calculated
UTF-8 is the dominant text encoding on the web (over 98% of all websites), in programming languages (Python 3, Go, Rust default to UTF-8), and in file formats (JSON spec requires UTF-8). ASCII survives in niches: legacy protocols that only understand 7-bit data, certain embedded systems with fixed-width character assumptions, and CSV files consumed by ancient mainframe systems. The practical rule: always use UTF-8 unless you have a specific, documented reason not to. UTF-8's ASCII compatibility means English text takes exactly the same bytes in both. Non-English text (accents, emoji, CJK characters) is only representable in UTF-8.
Verdict
Default to UTF-8 for everything. ASCII is a historical subset that UTF-8 handles transparently. The only time you choose ASCII over UTF-8 is when you're interfacing with a system that genuinely cannot handle bytes above 0x7F.
