Question 1

What is an HTML entity?

Accepted Answer

A character reference that lets you display characters the HTML parser would otherwise treat as markup. `<` renders as `<` without the browser interpreting it as an opening tag. Entities start with `&` and end with `;`, with either a name (`<`) or a numeric code point (`&#60;` or `&#x3C;`) in between.

Question 2

Which characters must always be escaped in HTML?

Accepted Answer

Four: `<` as `<`, `>` as `>`, `&` as `&`, and `"` inside attribute values as `"`. These have structural meaning in HTML: they open and close tags, start entity references, and delimit attribute strings. Everything else is optional if your page is UTF-8, though numeric references always work as alternatives.

Question 3

Do I still need HTML entities if my page uses UTF-8?

Accepted Answer

Only for the four reserved characters above. UTF-8 covers virtually all of Unicode, so you can include an em dash, a copyright symbol, or Japanese text directly in your HTML source without escaping. The reason you still escape `<`, `>`, and `&` has nothing to do with encoding: it's because they carry structural meaning in HTML syntax.

Question 4

What's the difference between & and &#38;?

Accepted Answer

Same character, different notation. `&` is the named character reference. `&#38;` is the decimal numeric reference. `&#x26;` is the hexadecimal numeric reference. All three render as `&`. Named references are more readable; numeric ones work for any Unicode code point, including characters that have no named form.

Question 5

Why does   behave differently from a regular space?

Accepted Answer

A regular space is a line-break opportunity, the browser can wrap text at it and collapse multiple adjacent spaces to one. A non-breaking space (` `) prevents the line break and prevents collapse. Use it to keep values together on one line, like `10 kg` or `Vol. IV`. Don't use it to add visual padding or indentation: that's a job for CSS margins and padding.

Question 6

What's the difference between HTML encoding and URL encoding?

Accepted Answer

Different problems, different formats. HTML encoding escapes characters so they render safely inside HTML markup. URL encoding escapes characters so they transmit safely inside a URL. A `<` in HTML becomes `<`. The same `<` in a URL becomes `%3C`. If you're building a URL inside an HTML attribute, both apply: the URL components get percent-encoded, and any `&` separating query parameters gets written as `&` in the HTML source.

Question 7

What happens if I forget to escape < or > in user content?

Accepted Answer

The browser parses it as HTML. A stray `<script>` in a username, comment, or form field can execute arbitrary JavaScript if the page renders it unescaped: that's cross-site scripting (XSS). Any content from user input, a database, or an external API must be HTML-escaped before it's inserted into the page. Most templating engines do this automatically, but raw DOM manipulation or `innerHTML` assignments don't.

Question 8

Does HTML entity encoding prevent XSS?

Accepted Answer

It blocks the most common XSS vectors in HTML text and attribute contexts. But the protection is context-dependent. The same content placed inside a `<script>` block, a CSS `style` attribute, or a URL `href` needs different escaping rules. HTML entity encoding covers HTML contexts: for other contexts, use the escaping rules specific to that context.

Question 9

Why doesn't ' work in older HTML?

Accepted Answer

`'` was defined in XML and XHTML, but it wasn't added to the HTML spec until HTML5 (2008). Older HTML4 parsers don't recognize it and render the literal text `'` instead of an apostrophe. If you need a single quote in an attribute value and want to support older parsers, use the numeric reference `'` instead.

HTML Entity Encoder & Decoder

Add to Home Screen

Recent conversions

Common Questions