What Is HTML Entity Encoding?
Understand what HTML entities are, which characters must be escaped, when entities are still necessary in a UTF-8 world, and how to avoid XSS.
Quick Answer
HTML entities let you display characters the browser would otherwise treat as markup, like
<, >, and &. The entity <
renders as <; & renders as &.
To html encode text, replace reserved characters with their entity equivalents. To html decode
or unescape html, do the reverse. You can also escape html before inserting user content into
a page to prevent XSS. Entities come in two forms: named entity references like
<, and numeric references like <.
If you need to encode or decode HTML entities right now, use the tool directly.
Try the HTML Entity Encoder & Decoder →What Are HTML Entities?
HTML has structure. The < character opens a tag. The > closes it.
The & starts a character reference. If you want to display any of those
characters as content on the page (not as syntax) you need a way to say "I mean the literal
character, not the markup."
HTML entities solve this. They're escape sequences: < for <,
> for >, & for &.
The browser reads the entity, recognizes it as a character reference, and renders the actual
character on screen.
Entities come in two forms. Named references use a human-readable name:
© for ©, — for the em dash (—), for a
non-breaking space. Numeric references use the Unicode code point: either decimal
(©) or hexadecimal (©), and work for any character,
including ones that have no named form.
Why They Exist
HTML parsers treat angle brackets and ampersands as syntax. Write <b>Hello</b>
and you get Hello. If you actually want the text <b>Hello</b>
displayed on screen, the parser needs a different signal.
That's the core problem entities solve, and it predates the modern web. In the early days,
pages often ran in ASCII or Latin-1 encoding. Characters outside those sets: copyright signs,
accent marks, mathematical symbols, currency signs, couldn't always be typed or reliably stored
in source files. Named entities like ©, é, and
£ gave authors a portable way to include them.
Today UTF-8 is universal, so most of those named entities are unnecessary: you can paste the
character directly into your HTML. But the four reserved characters: <,
>, &, and " inside attributes: still require
escaping regardless of encoding, because they carry structural meaning in the parser.
When to Use HTML Entities
HTML entities show up in a handful of specific situations. For example:
- Showing code examples: To display
<div class="container">on a web page, write it as<div class="container">. Without escaping, the browser renders the div, not its source. - User-generated content: Comments, usernames, forum posts, and search terms
must be HTML-escaped before being placed in the page. A username like
<script>alert(1)</script>should render as text, not execute. - Typography:
—(—) and–(–) for proper dashes instead of double hyphens.“and”for curly quotes. - Non-breaking spaces:
between a number and its unit keeps them on the same line:10 kg,§ 4.2. - Symbols:
©,®,™for © ® ™,€for €,°for °. - Math notation:
≠for ≠,≈for ≈,≤and≥for ≤ and ≥.
Common Mistakes
Related Tools
You May Also Need
Alternatives
- URL Encoder & DecoderEncode for URL transport instead
- Base64 Encoder & DecoderEncode binary data for transport instead