Ttooleras
🌍

Unicode Converter

Encoders & Decoders

Convert text between Unicode codepoints (U+XXXX), UTF-8 byte sequences, UTF-16 code units, HTML entities, and escaped forms.. Free, private — all processing in your browser.

15
Characters
18
UTF-8 Bytes
16
UTF-16 Code Units
Unicode Code Points
U+0048 U+0065 U+006C U+006C U+006F U+002C U+0020 U+0057 U+006F U+0072 U+006C U+0064 U+0021 U+0020 U+1F30D
Decimal Code Points
Hello, World! 🌍
Hex Code Points
Hello, World! 🌍
JavaScript Escape (\u)
\u0048\u0065\u006c\u006c\u006f\u002c\u0020\u0057\u006f\u0072\u006c\u0064\u0021\u0020\u{1f30d}
CSS Escape
\48 \65 \6C \6C \6F \2C \20 \57 \6F \72 \6C \64 \21 \20 \1F30D
UTF-8 Hex Bytes
48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 20 F0 9F 8C 8D
UTF-8 Decimal Bytes
72 101 108 108 111 44 32 87 111 114 108 100 33 32 240 159 140 141
URL Encoded
Hello%2C%20World!%20%F0%9F%8C%8D
HTML Entities
Hello, World! 🌍

Character Breakdown

CharCode PointHexDecimalUTF-8 Bytes
HU+00480x48721
eU+00650x651011
lU+006C0x6C1081
lU+006C0x6C1081
oU+006F0x6F1111
,U+002C0x2C441
U+00200x20321
WU+00570x57871
oU+006F0x6F1111
rU+00720x721141
lU+006C0x6C1081
dU+00640x641001
!U+00210x21331
U+00200x20321
🌍U+1F30D0x1F30D1277574
Advertisement

The Unicode Converter translates text between various Unicode representations: codepoints (U+1F600), UTF-8 byte sequences (F0 9F 98 80), UTF-16 code units (D83D DE00 surrogate pair), HTML entities (😀), JavaScript escape (\\uD83D\\uDE00), Python escape (\\U0001F600), and CSS escape (\\1F600). Each representation serves different use cases: codepoints for specifications, UTF-8 for storage and transmission, UTF-16 for JavaScript strings, HTML entities for web content, escape forms for embedding in code.

Enter any text, see all representations simultaneously. Works for simple ASCII (which has the same codepoint as byte), complex multi-byte UTF-8 sequences, surrogate-pair emoji in UTF-16, and everything in between. Reverse conversions: paste any form and get back the original text. This is the Swiss Army knife for Unicode debugging — when you need to understand what exact bytes or code units your \"Hello 👋\" produces in each encoding, one paste gets you the full breakdown.

Unicode Converter — key features

Every common representation

Codepoints, UTF-8, UTF-16, HTML entities, JavaScript escape, Python escape, CSS escape — all shown for any input.

Bidirectional

Convert text to any form or any form back to text — simultaneously.

Full Unicode range

Handles BMP (basic characters) and supplementary planes (emoji, rare CJK) correctly.

Surrogate pair handling

UTF-16 surrogate pairs for emoji are shown clearly and convert back to single codepoints correctly.

Normalization forms

Normalize text to NFC, NFD, NFKC, or NFKD for consistent processing.

Character inspection

See each character’s name, category, and block alongside its codepoint.

Ready-to-paste escape forms

Generate JavaScript, Python, or CSS escape sequences ready to use in code.

Client-side only

Text stays in your browser.

How to use the Unicode Converter

  1. 1

    Paste text

    Drop any Unicode text into the input — English, CJK, emoji, anything.

  2. 2

    See all representations

    Codepoints, UTF-8, UTF-16, HTML entities, and escape forms appear simultaneously.

  3. 3

    Or reverse-convert

    Paste an escape form or codepoint list and get back the text.

  4. 4

    Normalize if needed

    Apply NFC, NFD, NFKC, or NFKD normalization before display for consistent output.

  5. 5

    Copy the form you need

    One-click copy for any representation — JavaScript escape, HTML entity, UTF-8 hex, etc.

Common use cases for the Unicode Converter

Development

  • JavaScript string escapes: Generate \uXXXX sequences for embedding non-ASCII characters in JS string literals.
  • Python code preparation: Produce \U00XXXXXX or \N{NAME} escape for Python strings.
  • CSS content property: Generate CSS \XXXX escapes for content: values with special characters.

Data processing

  • Debugging encoding issues: See the exact byte sequence produced by a string to diagnose encoding mismatches.
  • Normalization for comparison: Normalize text to NFC before comparing or storing to avoid visually-identical but bit-different strings.
  • Fixing garbled text: Convert garbled UTF-8 text displayed as ASCII back to the original through the correct encoding.

Content and education

  • Understanding character encoding: See how a single emoji like 😀 maps to different representations in different contexts.
  • HTML entity generation: Create HTML entities for any Unicode character when you can’t paste directly.
  • Academic or technical writing: Reference characters by canonical codepoint (U+XXXX) rather than pasting glyphs that may not render.

Unicode Converter — examples

Simple ASCII

All representations for a single letter.

Input
A
Output
codepoint: U+0041
UTF-8: 41
UTF-16: 0041
HTML: A / A
JS: \u0041
Python: \u0041
CSS: \41

Accented

Precomposed é.

Input
é
Output
codepoint: U+00E9
UTF-8: C3 A9 (2 bytes)
UTF-16: 00E9 (1 code unit)
HTML: é / é
JS: \u00E9

Emoji (supplementary plane)

Grinning face with surrogate pair.

Input
😀
Output
codepoint: U+1F600
UTF-8: F0 9F 98 80 (4 bytes)
UTF-16: D83D DE00 (surrogate pair, 2 code units)
HTML: 😀
JS: \u{1F600} or \uD83D\uDE00

CJK ideograph

Chinese character.

Input
Output
codepoint: U+6F22
UTF-8: E6 BC A2 (3 bytes)
UTF-16: 6F22 (1 code unit)
name: CJK UNIFIED IDEOGRAPH-6F22

Reverse conversion

From escape form back to text.

Input
\uD83D\uDE00
Output
😀 (grinning face, U+1F600)
UTF-16 surrogate pair decoded to single codepoint

Technical details

Unicode representations:

Codepoint (U+XXXX): the abstract identifier. Every assigned character has a unique codepoint in the range U+0000 to U+10FFFF. \"A\" is U+0041, \"é\" is U+00E9, \"😀\" is U+1F600.

UTF-8: variable-length byte encoding. 1-4 bytes per codepoint. Byte patterns:
- U+0000-U+007F: 1 byte (0xxxxxxx)
- U+0080-U+07FF: 2 bytes (110xxxxx 10xxxxxx)
- U+0800-U+FFFF: 3 bytes (1110xxxx 10xxxxxx 10xxxxxx)
- U+10000-U+10FFFF: 4 bytes (11110xxx 10xxxxxx 10xxxxxx 10xxxxxx)

UTF-16: variable-length 16-bit code unit encoding. 1-2 code units per codepoint.
- BMP (U+0000-U+FFFF): 1 code unit, directly the codepoint
- Supplementary (U+10000+): 2 code units (surrogate pair: 0xD800-0xDBFF high, 0xDC00-0xDFFF low)

JavaScript strings are UTF-16 internally. \"😀\".length returns 2 because the emoji takes 2 code units. For real character count, use Array.from(\"😀\").length which returns 1 (iterates codepoints).

HTML entities:
- Decimal numeric: 😀
- Hex numeric: 😀
- Named (if available): no name for emoji, but © for ©

JavaScript escape forms:
- \\uXXXX for BMP (\\u0041 = A)
- \\uXXXX\\uXXXX for surrogate pairs (\\uD83D\\uDE00 = 😀)
- \\u{XXXX} for any codepoint in modern JS (\\u{1F600} = 😀) — ES2015+

Python escape forms:
- \\uXXXX for BMP
- \\U00XXXXXX for any codepoint (\\U0001F600 = 😀) — 8 hex digits
- \\N{NAME} for named characters (\\N{GRINNING FACE} = 😀)

CSS escape: \\XXXX with optional trailing space to disambiguate (\\1F600 = 😀)

Normalization: the same visual character can be encoded multiple ways. \"é\" can be U+00E9 (precomposed) or U+0065 U+0301 (e + combining acute). NFC normalizes to precomposed form, NFD to decomposed. NFKC and NFKD are \"compatibility\" forms that may collapse additional variants.

Common problems and solutions

Surrogate pair miscounted

"😀".length in JavaScript returns 2, not 1, because UTF-16 uses a surrogate pair. For real character count, use Array.from(str).length which iterates by codepoint.

Combining characters

"é" can be one codepoint (U+00E9 precomposed) or two (U+0065 + U+0301 combining acute). Visually identical but bit-different. Normalize to NFC for consistent storage and comparison.

Mixing escape syntaxes

\uXXXX works in JS for BMP only. Supplementary characters need \u{XXXX} (ES2015+) or surrogate pair \uXXXX\uXXXX. Python uses \U00XXXXXX (8 hex digits) for supplementary planes.

BMP vs supplementary

BMP characters (U+0000-U+FFFF) are simpler — one UTF-16 code unit each. Supplementary plane (U+10000-U+10FFFF) requires surrogate pairs in UTF-16. Code that handles only BMP breaks on emoji.

Named HTML entities limited

Only about 250 named HTML entities exist in HTML5. For anything else, use numeric entities. 😀 for 😀 is correct; there’s no &grinning; entity.

Normalization not applied

Without normalization, two visually identical strings may compare unequal because of combining-character differences. Always normalize before comparison or storage.

Private use and unassigned

Codepoints in private use areas (U+E000-U+F8FF, etc.) have no assigned character. They show as placeholder in most fonts. Don’t assume arbitrary codepoints render correctly everywhere.

Unicode Converter — comparisons and alternatives

Compared to programming language conversion methods (Python\u2019s ord(), JavaScript\u2019s codePointAt()), this tool shows every representation simultaneously for easy comparison. For automation, use language built-ins.

Compared to other online converters, this tool handles the full Unicode range including supplementary plane characters (emoji) and surrogate pairs correctly. Many simpler converters fail on emoji.

Compared to the Character Map tool, this tool focuses on encoding representations rather than character browsing. Use Character Map to find characters; use this tool to see how they encode.

Frequently asked questions about the Unicode Converter

What is a Unicode codepoint?

An integer identifying a single character in the Unicode standard. Written as U+XXXX in hex. U+0041 is "A", U+1F600 is 😀. Every character has exactly one codepoint; encodings (UTF-8, UTF-16) are different ways to represent these codepoints as bytes.

What is the difference between UTF-8 and UTF-16?

Both encode Unicode. UTF-8 is variable-length (1-4 bytes per codepoint) and ASCII-compatible (English text is identical). UTF-16 is variable-length (2 or 4 bytes per codepoint). UTF-8 is the web standard; UTF-16 is used internally by JavaScript, Java, and some Windows APIs.

What is a surrogate pair?

UTF-16 cannot fit supplementary-plane characters (U+10000+) in one 16-bit code unit. It uses a pair: a high surrogate (U+D800-U+DBFF) followed by a low surrogate (U+DC00-U+DFFF). Together they encode one codepoint. Emoji are almost all supplementary plane, so they use surrogate pairs in UTF-16.

How do I escape an emoji in JavaScript?

Modern JS (ES2015+) supports \u{1F600} for any codepoint. For pre-ES2015 compatibility, use the surrogate pair: \uD83D\uDE00. Both represent the grinning face emoji.

What is Unicode normalization?

The process of converting different representations of the same character to a canonical form. "é" can be U+00E9 (precomposed) or U+0065 U+0301 (base e + combining acute). NFC form uses precomposed; NFD uses decomposed. Normalize before comparing or storing text to ensure consistency.

Why does JavaScript count emoji as 2 characters?

JavaScript strings are UTF-16. "😀".length returns 2 because the emoji takes 2 UTF-16 code units (a surrogate pair). For character count (grapheme cluster count), use Intl.Segmenter or a library like grapheme-splitter.

What is the difference between characters and bytes?

A character is a Unicode codepoint — one "thing" like A, é, 😀. Bytes are the low-level encoding of that character, which varies by encoding. "A" is 1 byte in UTF-8 and 2 bytes in UTF-16. "😀" is 4 bytes in UTF-8 and 4 bytes in UTF-16 (as a surrogate pair of 2 bytes each).

Is my input private?

Yes. All conversion runs in your browser with no server involvement. Text, codepoints, and escape sequences stay local.

Additional resources

Advertisement

Learn more

Explore more tools

200+ free tools that run in your browser.

Browse all tools →