HTML entity encoding: when to use &, <, and Unicode escapes

HTML entity encoding replaces characters that have special meaning in HTML with named or numeric entities: < becomes &lt;, > becomes &gt;, & becomes &amp;, and double-quote becomes &quot;. This is a security requirement, not a style choice. Without it, user-supplied text containing <script> tags will execute in the browser.

Encoding focus
HTML entity encoding
html-entities
Category
Best Practices
Practical encoding guidance

How this is calculated

The five characters that must always be entity-encoded in HTML text content are ampersand, less-than, greater-than, single-quote, and double-quote. Modern frameworks (React, Vue, Svelte) do this automatically when you use their templating syntax. The risk surfaces when you use dangerouslySetInnerHTML, innerHTML in vanilla JS, or server-side template engines that don't auto-escape. For UTF-8 characters beyond ASCII, you can use them directly in HTML source (no entity needed) as long as the page declares <meta charset='utf-8'>. Numeric entities like &#x1F600; (😀) are a fallback for environments where the source file encoding is uncertain.

Verdict

Always entity-encode user-supplied text in HTML. Let your framework do it automatically. Only use named/numeric entities for literal characters you're writing yourself in static HTML. For emoji and non-ASCII text, use UTF-8 directly rather than numeric entities.

More Encoding scenarios

Frequently asked questions

How do I convert text to Base64?
Paste your string into the Text field and the Base64 output appears instantly. The tool uses standard Base64 (RFC 4648), so the output is identical to Linux's base64 command and every major language's built-in Base64 encoder.
What's the difference between Base64 and hex encoding?
Both represent binary data as text, but with different alphabets. Base64 uses 64 characters and needs roughly 4 chars per 3 bytes (33% overhead). Hex uses 16 characters and needs exactly 2 chars per byte (100% overhead). Base64 is denser, while hex is easier to read byte by byte.
Why does my UTF-8 text break when converted to binary?
UTF-8 encodes non-ASCII characters as multibyte sequences, so a single emoji or accented letter becomes 2-4 bytes. The binary output will be longer than the character count suggests, that's correct behavior, not a bug.
Is it safe to paste sensitive data into the converter?
Yes. The encoding conversion runs entirely in your browser with JavaScript, nothing is sent to our servers, logged, or stored. You can verify this with your browser's Network tab: no requests fire when you type.
What is URL-safe Base64?
A variant that replaces `+` with `-` and `/` with `_` so the result can be safely placed in URLs without percent-encoding. JWT tokens use URL-safe Base64. Standard Base64 is fine for most other uses.
Can I decode Base64 back to the original text?
Yes, the converter is bidirectional. Paste Base64 into the Base64 field and you'll get the original UTF-8 string back. If decoding fails silently, the input isn't valid Base64 (wrong characters, bad padding, or it was double-encoded).