🌍

Unicode Converter

Convert text between Unicode codepoints (U+XXXX), UTF-8 byte sequences, UTF-16 code units, HTML entities, and escaped forms.. Free, private — all processing in your browser.

Enter text to convert

Characters

UTF-8 Bytes

UTF-16 Code Units

Unicode Code Points

U+0048 U+0065 U+006C U+006C U+006F U+002C U+0020 U+0057 U+006F U+0072 U+006C U+0064 U+0021 U+0020 U+1F30D

Decimal Code Points

Hello, World! 🌍

Hex Code Points

Hello, World! 🌍

JavaScript Escape (\u)

\u0048\u0065\u006c\u006c\u006f\u002c\u0020\u0057\u006f\u0072\u006c\u0064\u0021\u0020\u{1f30d}

CSS Escape

\48 \65 \6C \6C \6F \2C \20 \57 \6F \72 \6C \64 \21 \20 \1F30D

UTF-8 Hex Bytes

48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 20 F0 9F 8C 8D

UTF-8 Decimal Bytes

72 101 108 108 111 44 32 87 111 114 108 100 33 32 240 159 140 141

URL Encoded

Hello%2C%20World!%20%F0%9F%8C%8D

HTML Entities

Hello, World! 🌍

Character Breakdown

Char	Code Point	Hex	Decimal	UTF-8 Bytes
H	U+0048	0x48	72	1
e	U+0065	0x65	101	1
l	U+006C	0x6C	108	1
l	U+006C	0x6C	108	1
o	U+006F	0x6F	111	1
,	U+002C	0x2C	44	1
	U+0020	0x20	32	1
W	U+0057	0x57	87	1
o	U+006F	0x6F	111	1
r	U+0072	0x72	114	1
l	U+006C	0x6C	108	1
d	U+0064	0x64	100	1
!	U+0021	0x21	33	1
	U+0020	0x20	32	1
🌍	U+1F30D	0x1F30D	127757	4

The Unicode Converter translates text between various Unicode representations: codepoints (U+1F600), UTF-8 byte sequences (F0 9F 98 80), UTF-16 code units (D83D DE00 surrogate pair), HTML entities (😀), JavaScript escape (\\uD83D\\uDE00), Python escape (\\U0001F600), and CSS escape (\\1F600). Each representation serves different use cases: codepoints for specifications, UTF-8 for storage and transmission, UTF-16 for JavaScript strings, HTML entities for web content, escape forms for embedding in code.

Enter any text, see all representations simultaneously. Works for simple ASCII (which has the same codepoint as byte), complex multi-byte UTF-8 sequences, surrogate-pair emoji in UTF-16, and everything in between. Reverse conversions: paste any form and get back the original text. This is the Swiss Army knife for Unicode debugging — when you need to understand what exact bytes or code units your \"Hello 👋\" produces in each encoding, one paste gets you the full breakdown.

Using the Unicode Converter

1
Paste text
Drop any Unicode text into the input — English, CJK, emoji, anything.
2
See all representations
Codepoints, UTF-8, UTF-16, HTML entities, and escape forms appear simultaneously.
3
Or reverse-convert
Paste an escape form or codepoint list and get back the text.
4
Normalize if needed
Apply NFC, NFD, NFKC, or NFKD normalization before display for consistent output.
5
Copy the form you need
One-click copy for any representation — JavaScript escape, HTML entity, UTF-8 hex, etc.

Worked examples

Simple ASCII

All representations for a single letter.

Input

Output

codepoint: U+0041
UTF-8: 41
UTF-16: 0041
HTML: &#65; / &#x41;
JS: \u0041
Python: \u0041
CSS: \41

Accented

Precomposed é.

Input

é

Output

codepoint: U+00E9
UTF-8: C3 A9 (2 bytes)
UTF-16: 00E9 (1 code unit)
HTML: &#233; / &eacute;
JS: \u00E9

Emoji (supplementary plane)

Grinning face with surrogate pair.

Input

😀

Output

codepoint: U+1F600
UTF-8: F0 9F 98 80 (4 bytes)
UTF-16: D83D DE00 (surrogate pair, 2 code units)
HTML: &#128512;
JS: \u{1F600} or \uD83D\uDE00

CJK ideograph

Chinese character.

Input

漢

Output

codepoint: U+6F22
UTF-8: E6 BC A2 (3 bytes)
UTF-16: 6F22 (1 code unit)
name: CJK UNIFIED IDEOGRAPH-6F22

Reverse conversion

From escape form back to text.

Input

\uD83D\uDE00

Output

😀 (grinning face, U+1F600)
UTF-16 surrogate pair decoded to single codepoint

Features at a glance

Every common representation

Codepoints, UTF-8, UTF-16, HTML entities, JavaScript escape, Python escape, CSS escape — all shown for any input.

Bidirectional

Convert text to any form or any form back to text — simultaneously.

Full Unicode range

Handles BMP (basic characters) and supplementary planes (emoji, rare CJK) correctly.

Surrogate pair handling

UTF-16 surrogate pairs for emoji are shown clearly and convert back to single codepoints correctly.

Normalization forms

Normalize text to NFC, NFD, NFKC, or NFKD for consistent processing.

Character inspection

See each character’s name, category, and block alongside its codepoint.

Ready-to-paste escape forms

Generate JavaScript, Python, or CSS escape sequences ready to use in code.

Client-side only

Text stays in your browser.

Common use cases for the Unicode Converter

Development

→JavaScript string escapes: Generate \uXXXX sequences for embedding non-ASCII characters in JS string literals.
→Python code preparation: Produce \U00XXXXXX or \N{NAME} escape for Python strings.
→CSS content property: Generate CSS \XXXX escapes for content: values with special characters.

Data processing

→Debugging encoding issues: See the exact byte sequence produced by a string to diagnose encoding mismatches.
→Normalization for comparison: Normalize text to NFC before comparing or storing to avoid visually-identical but bit-different strings.
→Fixing garbled text: Convert garbled UTF-8 text displayed as ASCII back to the original through the correct encoding.

Content and education

→Understanding character encoding: See how a single emoji like 😀 maps to different representations in different contexts.
→HTML entity generation: Create HTML entities for any Unicode character when you can’t paste directly.
→Academic or technical writing: Reference characters by canonical codepoint (U+XXXX) rather than pasting glyphs that may not render.

Under the hood

Unicode representations:

Codepoint (U+XXXX): the abstract identifier. Every assigned character has a unique codepoint in the range U+0000 to U+10FFFF. \"A\" is U+0041, \"é\" is U+00E9, \"😀\" is U+1F600.

UTF-8: variable-length byte encoding. 1-4 bytes per codepoint. Byte patterns:
- U+0000-U+007F: 1 byte (0xxxxxxx)
- U+0080-U+07FF: 2 bytes (110xxxxx 10xxxxxx)
- U+0800-U+FFFF: 3 bytes (1110xxxx 10xxxxxx 10xxxxxx)
- U+10000-U+10FFFF: 4 bytes (11110xxx 10xxxxxx 10xxxxxx 10xxxxxx)

UTF-16: variable-length 16-bit code unit encoding. 1-2 code units per codepoint.
- BMP (U+0000-U+FFFF): 1 code unit, directly the codepoint
- Supplementary (U+10000+): 2 code units (surrogate pair: 0xD800-0xDBFF high, 0xDC00-0xDFFF low)

JavaScript strings are UTF-16 internally. \"😀\".length returns 2 because the emoji takes 2 code units. For real character count, use Array.from(\"😀\").length which returns 1 (iterates codepoints).

HTML entities:
- Decimal numeric: 😀
- Hex numeric: 😀
- Named (if available): no name for emoji, but © for ©

JavaScript escape forms:
- \\uXXXX for BMP (\\u0041 = A)
- \\uXXXX\\uXXXX for surrogate pairs (\\uD83D\\uDE00 = 😀)
- \\u{XXXX} for any codepoint in modern JS (\\u{1F600} = 😀) — ES2015+

Python escape forms:
- \\uXXXX for BMP
- \\U00XXXXXX for any codepoint (\\U0001F600 = 😀) — 8 hex digits
- \\N{NAME} for named characters (\\N{GRINNING FACE} = 😀)

CSS escape: \\XXXX with optional trailing space to disambiguate (\\1F600 = 😀)

Normalization: the same visual character can be encoded multiple ways. \"é\" can be U+00E9 (precomposed) or U+0065 U+0301 (e + combining acute). NFC normalizes to precomposed form, NFD to decomposed. NFKC and NFKD are \"compatibility\" forms that may collapse additional variants.

Common problems and solutions

⚠Surrogate pair miscounted

"😀".length in JavaScript returns 2, not 1, because UTF-16 uses a surrogate pair. For real character count, use Array.from(str).length which iterates by codepoint.

⚠Combining characters

"é" can be one codepoint (U+00E9 precomposed) or two (U+0065 + U+0301 combining acute). Visually identical but bit-different. Normalize to NFC for consistent storage and comparison.

⚠Mixing escape syntaxes

\uXXXX works in JS for BMP only. Supplementary characters need \u{XXXX} (ES2015+) or surrogate pair \uXXXX\uXXXX. Python uses \U00XXXXXX (8 hex digits) for supplementary planes.

⚠BMP vs supplementary

BMP characters (U+0000-U+FFFF) are simpler — one UTF-16 code unit each. Supplementary plane (U+10000-U+10FFFF) requires surrogate pairs in UTF-16. Code that handles only BMP breaks on emoji.

⚠Named HTML entities limited

Only about 250 named HTML entities exist in HTML5. For anything else, use numeric entities. 😀 for 😀 is correct; there’s no &grinning; entity.

⚠Normalization not applied

Without normalization, two visually identical strings may compare unequal because of combining-character differences. Always normalize before comparison or storage.

⚠Private use and unassigned

Codepoints in private use areas (U+E000-U+F8FF, etc.) have no assigned character. They show as placeholder in most fonts. Don’t assume arbitrary codepoints render correctly everywhere.

Unicode Converter — comparisons and alternatives

Compared to programming language conversion methods (Python\u2019s ord(), JavaScript\u2019s codePointAt()), this tool shows every representation simultaneously for easy comparison. For automation, use language built-ins.

Compared to other online converters, this tool handles the full Unicode range including supplementary plane characters (emoji) and surrogate pairs correctly. Many simpler converters fail on emoji.

Compared to the Character Map tool, this tool focuses on encoding representations rather than character browsing. Use Character Map to find characters; use this tool to see how they encode.

Unicode Converter — FAQ

▶What is a Unicode codepoint?

An integer identifying a single character in the Unicode standard. Written as U+XXXX in hex. U+0041 is "A", U+1F600 is 😀. Every character has exactly one codepoint; encodings (UTF-8, UTF-16) are different ways to represent these codepoints as bytes.

▶What is the difference between UTF-8 and UTF-16?

Both encode Unicode. UTF-8 is variable-length (1-4 bytes per codepoint) and ASCII-compatible (English text is identical). UTF-16 is variable-length (2 or 4 bytes per codepoint). UTF-8 is the web standard; UTF-16 is used internally by JavaScript, Java, and some Windows APIs.

▶What is a surrogate pair?

UTF-16 cannot fit supplementary-plane characters (U+10000+) in one 16-bit code unit. It uses a pair: a high surrogate (U+D800-U+DBFF) followed by a low surrogate (U+DC00-U+DFFF). Together they encode one codepoint. Emoji are almost all supplementary plane, so they use surrogate pairs in UTF-16.

▶How do I escape an emoji in JavaScript?

Modern JS (ES2015+) supports \u{1F600} for any codepoint. For pre-ES2015 compatibility, use the surrogate pair: \uD83D\uDE00. Both represent the grinning face emoji.

▶What is Unicode normalization?

The process of converting different representations of the same character to a canonical form. "é" can be U+00E9 (precomposed) or U+0065 U+0301 (base e + combining acute). NFC form uses precomposed; NFD uses decomposed. Normalize before comparing or storing text to ensure consistency.

▶Why does JavaScript count emoji as 2 characters?

JavaScript strings are UTF-16. "😀".length returns 2 because the emoji takes 2 UTF-16 code units (a surrogate pair). For character count (grapheme cluster count), use Intl.Segmenter or a library like grapheme-splitter.

▶What is the difference between characters and bytes?

A character is a Unicode codepoint — one "thing" like A, é, 😀. Bytes are the low-level encoding of that character, which varies by encoding. "A" is 1 byte in UTF-8 and 2 bytes in UTF-16. "😀" is 4 bytes in UTF-8 and 4 bytes in UTF-16 (as a surrogate pair of 2 bytes each).

▶Is my input private?

Yes. All conversion runs in your browser with no server involvement. Text, codepoints, and escape sequences stay local.

Additional resources

Unicode Standard — Official Unicode Consortium reference for codepoints and encoding.
RFC 3629 UTF-8 — Definitive UTF-8 encoding specification.
Unicode Normalization Forms — NFC, NFD, NFKC, NFKD normalization reference.
MDN String encoding — JavaScript string handling and Unicode methods.
Python Unicode HOWTO — Official Python Unicode reference.

Related tools

All Encoders & Decoders

🔄

Base64 Encoder/Decoder

Encode and decode Base64 strings, files, and images instantly

0️⃣

Binary to Text Converter

Convert binary code (0s and 1s) to readable text in ASCII or Unicode, with configurable grouping and separator options.

🔣

Character Map

Browse and copy any Unicode character including emoji, symbols, arrows, mathematical signs, and non-Latin scripts.

🔢

Hex to Text Converter

Convert hexadecimal byte sequences to readable ASCII or UTF-8 text with flexible input formatting.

🏷️

HTML Entity Encoder/Decoder

📡

Morse Code Translator

Translate between text and Morse code with support for letters, numbers, punctuation, and audio playback of the encoded signal.

Explore more tools

200+ free tools that run in your browser.

Browse all tools →

Character Breakdown

Using the Unicode Converter

Paste text

See all representations

Or reverse-convert

Normalize if needed

Copy the form you need

Worked examples

Simple ASCII

Accented

Emoji (supplementary plane)

CJK ideograph

Reverse conversion

Features at a glance

Every common representation

Bidirectional

Full Unicode range

Surrogate pair handling

Normalization forms

Character inspection

Ready-to-paste escape forms

Client-side only

Common use cases for the Unicode Converter

Development

Data processing

Content and education

Under the hood

Common problems and solutions

⚠Surrogate pair miscounted

⚠Combining characters

⚠Mixing escape syntaxes

⚠BMP vs supplementary

⚠Named HTML entities limited

⚠Normalization not applied

⚠Private use and unassigned

Unicode Converter — comparisons and alternatives

Unicode Converter — FAQ

Additional resources

Related tools

Base64 Encoder/Decoder

Binary to Text Converter

Character Map

Hex to Text Converter

HTML Entity Encoder/Decoder

Morse Code Translator

Learn more

Why a single emoji breaks string length

The Complete Guide to JSON: Syntax, Parsing, and Best Practices

Regular Expressions Cheat Sheet: From Beginner to Advanced

Explore more tools