HTML Tag Validation & Extraction Regular Expression
HTML tag validation and extraction are common needs in text processing, sanitization, and parsing pipelines. Whether you need to verify that a string is a well-formed HTML tag or pull all tags out of an HTML document, a carefully crafted regular expression can handle both tasks efficiently. This article provides two patterns: one for validating a complete tag string and one for extracting tags from arbitrary HTML content.
Validation: Is This a Valid HTML Tag?
Use this pattern with anchors (^ and $) to test whether an entire string is a single, well-formed HTML tag — opening, closing, or self-closing.
Extraction: Find All Tags in a String
The same pattern without anchors, combined with the global flag (g), extracts every HTML tag found within a larger block of HTML content.
Explanation
^/$— Start and end anchors (validation only). Omit these when extracting tags from a larger string.<— Literal opening angle bracket.\/?— Optional forward slash, matching closing tags like</div>.[a-zA-Z][a-zA-Z0-9-]*— Tag name: must start with a letter, followed by letters, digits, or hyphens. Hyphens are required for custom elements such as<my-component>.(?:\s+[a-zA-Z_:][a-zA-Z0-9_.:-]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"']*))?)*— Zero or more attributes:\s+— Required whitespace before each attribute.[a-zA-Z_:][a-zA-Z0-9_.:-]*— Attribute name (covers standard and namespaced attributes likedata-valueorxml:lang).(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"']*))?— Optional attribute value: double-quoted, single-quoted, or unquoted.
\s*\/?>$— Optional whitespace, optional self-closing slash, then the closing>.
Supported Tag Syntax
- Opening tags:
<div>,<p class="text"> - Closing tags:
</div>,</span> - Self-closing tags:
<br/>,<img src="x.png" /> - Boolean attributes:
<input required>,<details open> - Quoted attribute values: double-quoted (
"value") and single-quoted ('value') - Namespaced & data attributes:
data-id="1",xml:lang="en" - Custom elements:
<my-component>,<app-header>
Validation Implementation
Extraction Implementation
Validation Test Cases
Each input is tested as a complete string to determine whether it is a valid HTML tag.
| HTML Tag | Valid |
|---|---|
| <div> | |
| <p> | |
| <h1> | |
| <span> | |
| </div> | |
| </p> | |
| <br/> | |
| <hr/> | |
| <br /> | |
| <img /> | |
| <img src="image.png"> | |
| <a href="https://example.com" class="link"> | |
| <input type="text" required> | |
| <div class='container'> | |
| <INPUT TYPE='TEXT'> | |
| <my-component> | |
| <meta charset="UTF-8" /> | |
| <span data-value="123"> | |
| (empty string) | |
| <> | |
| <123> | |
| < div> | |
| <div | |
| div> | |
| hello world | |
| <!-- comment --> |
Extraction Test Cases
Each input is a string of HTML content. The result indicates whether the string contains at least one valid HTML tag.
| HTML Content | Valid |
|---|---|
| <div>hello</div> | |
| <p class="text">paragraph</p> | |
| Click <a href="#">here</a> | |
| <br/> | |
| <ul><li>item</li></ul> | |
| <img src="photo.jpg" alt="photo" /> | |
| hello world | |
| (empty string) | |
| <> | |
| 3 < 5 > 2 |