HTML Tag Validation & Extraction Regular Expression

HTML tag validation and extraction are common needs in text processing, sanitization, and parsing pipelines. Whether you need to verify that a string is a well-formed HTML tag or pull all tags out of an HTML document, a carefully crafted regular expression can handle both tasks efficiently. This article provides two patterns: one for validating a complete tag string and one for extracting tags from arbitrary HTML content.

Validation: Is This a Valid HTML Tag?

^<\/?[a-zA-Z][a-zA-Z0-9-]*(?:\s+[a-zA-Z_:][a-zA-Z0-9_.:-]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"']*))?)*\s*\/?>$

Use this pattern with anchors (^ and $) to test whether an entire string is a single, well-formed HTML tag — opening, closing, or self-closing.

Extraction: Find All Tags in a String

<\/?[a-zA-Z][a-zA-Z0-9-]*(?:\s+[a-zA-Z_:][a-zA-Z0-9_.:-]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"']*))?)*\s*\/?>

The same pattern without anchors, combined with the global flag (g), extracts every HTML tag found within a larger block of HTML content.

Explanation

  • ^ / $ — Start and end anchors (validation only). Omit these when extracting tags from a larger string.
  • < — Literal opening angle bracket.
  • \/? — Optional forward slash, matching closing tags like </div>.
  • [a-zA-Z][a-zA-Z0-9-]* — Tag name: must start with a letter, followed by letters, digits, or hyphens. Hyphens are required for custom elements such as <my-component>.
  • (?:\s+[a-zA-Z_:][a-zA-Z0-9_.:-]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"']*))?)* — Zero or more attributes:
    • \s+ — Required whitespace before each attribute.
    • [a-zA-Z_:][a-zA-Z0-9_.:-]* — Attribute name (covers standard and namespaced attributes like data-value or xml:lang).
    • (?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"']*))? — Optional attribute value: double-quoted, single-quoted, or unquoted.
  • \s*\/?>$ — Optional whitespace, optional self-closing slash, then the closing >.

Supported Tag Syntax

  • Opening tags: <div>, <p class="text">
  • Closing tags: </div>, </span>
  • Self-closing tags: <br/>, <img src="x.png" />
  • Boolean attributes: <input required>, <details open>
  • Quoted attribute values: double-quoted ("value") and single-quoted ('value')
  • Namespaced & data attributes: data-id="1", xml:lang="en"
  • Custom elements: <my-component>, <app-header>

Validation Implementation

const htmlTagRegex = /^<\/?[a-zA-Z][a-zA-Z0-9-]*(?:\s+[a-zA-Z_:][a-zA-Z0-9_.:-]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"']*))?)*\s*\/?>$/;
const isValidHtmlTag = (tag) => htmlTagRegex.test(tag);

Extraction Implementation

const htmlTagExtractRegex = /<\/?[a-zA-Z][a-zA-Z0-9-]*(?:\s+[a-zA-Z_:][a-zA-Z0-9_.:-]*(?:\s*=\s*(?:"[^"]*"|'[^']*'|[^\s>"']*))?)*\s*\/?>/g;
const extractHtmlTags = (html) => html.match(htmlTagExtractRegex) ?? [];

Validation Test Cases

Each input is tested as a complete string to determine whether it is a valid HTML tag.

HTML TagValid
<div>
<p>
<h1>
<span>
</div>
</p>
<br/>
<hr/>
<br />
<img />
<img src="image.png">
<a href="https://example.com" class="link">
<input type="text" required>
<div class='container'>
<INPUT TYPE='TEXT'>
<my-component>
<meta charset="UTF-8" />
<span data-value="123">
(empty string)
<>
<123>
< div>
<div
div>
hello world
<!-- comment -->

Extraction Test Cases

Each input is a string of HTML content. The result indicates whether the string contains at least one valid HTML tag.

HTML ContentValid
<div>hello</div>
<p class="text">paragraph</p>
Click <a href="#">here</a>
<br/>
<ul><li>item</li></ul>
<img src="photo.jpg" alt="photo" />
hello world
(empty string)
<>
3 < 5 > 2