Unicode Converter
Encoders & DecodersConvert text between Unicode codepoints (U+XXXX), UTF-8 byte sequences, UTF-16 code units, HTML entities, and escaped forms.. Free, private — all processing in your browser.
U+0048 U+0065 U+006C U+006C U+006F U+002C U+0020 U+0057 U+006F U+0072 U+006C U+0064 U+0021 U+0020 U+1F30DHello, World! 🌍Hello, World! 🌍\u0048\u0065\u006c\u006c\u006f\u002c\u0020\u0057\u006f\u0072\u006c\u0064\u0021\u0020\u{1f30d}\48 \65 \6C \6C \6F \2C \20 \57 \6F \72 \6C \64 \21 \20 \1F30D48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 21 20 F0 9F 8C 8D72 101 108 108 111 44 32 87 111 114 108 100 33 32 240 159 140 141Hello%2C%20World!%20%F0%9F%8C%8DHello, World! 🌍Character Breakdown
| Char | Code Point | Hex | Decimal | UTF-8 Bytes |
|---|---|---|---|---|
| H | U+0048 | 0x48 | 72 | 1 |
| e | U+0065 | 0x65 | 101 | 1 |
| l | U+006C | 0x6C | 108 | 1 |
| l | U+006C | 0x6C | 108 | 1 |
| o | U+006F | 0x6F | 111 | 1 |
| , | U+002C | 0x2C | 44 | 1 |
| U+0020 | 0x20 | 32 | 1 | |
| W | U+0057 | 0x57 | 87 | 1 |
| o | U+006F | 0x6F | 111 | 1 |
| r | U+0072 | 0x72 | 114 | 1 |
| l | U+006C | 0x6C | 108 | 1 |
| d | U+0064 | 0x64 | 100 | 1 |
| ! | U+0021 | 0x21 | 33 | 1 |
| U+0020 | 0x20 | 32 | 1 | |
| 🌍 | U+1F30D | 0x1F30D | 127757 | 4 |
The Unicode Converter translates text between various Unicode representations: codepoints (U+1F600), UTF-8 byte sequences (F0 9F 98 80), UTF-16 code units (D83D DE00 surrogate pair), HTML entities (😀), JavaScript escape (\\uD83D\\uDE00), Python escape (\\U0001F600), and CSS escape (\\1F600). Each representation serves different use cases: codepoints for specifications, UTF-8 for storage and transmission, UTF-16 for JavaScript strings, HTML entities for web content, escape forms for embedding in code.
Enter any text, see all representations simultaneously. Works for simple ASCII (which has the same codepoint as byte), complex multi-byte UTF-8 sequences, surrogate-pair emoji in UTF-16, and everything in between. Reverse conversions: paste any form and get back the original text. This is the Swiss Army knife for Unicode debugging — when you need to understand what exact bytes or code units your \"Hello 👋\" produces in each encoding, one paste gets you the full breakdown.
Unicode Converter — key features
Every common representation
Codepoints, UTF-8, UTF-16, HTML entities, JavaScript escape, Python escape, CSS escape — all shown for any input.
Bidirectional
Convert text to any form or any form back to text — simultaneously.
Full Unicode range
Handles BMP (basic characters) and supplementary planes (emoji, rare CJK) correctly.
Surrogate pair handling
UTF-16 surrogate pairs for emoji are shown clearly and convert back to single codepoints correctly.
Normalization forms
Normalize text to NFC, NFD, NFKC, or NFKD for consistent processing.
Character inspection
See each character’s name, category, and block alongside its codepoint.
Ready-to-paste escape forms
Generate JavaScript, Python, or CSS escape sequences ready to use in code.
Client-side only
Text stays in your browser.
How to use the Unicode Converter
- 1
Paste text
Drop any Unicode text into the input — English, CJK, emoji, anything.
- 2
See all representations
Codepoints, UTF-8, UTF-16, HTML entities, and escape forms appear simultaneously.
- 3
Or reverse-convert
Paste an escape form or codepoint list and get back the text.
- 4
Normalize if needed
Apply NFC, NFD, NFKC, or NFKD normalization before display for consistent output.
- 5
Copy the form you need
One-click copy for any representation — JavaScript escape, HTML entity, UTF-8 hex, etc.
Common use cases for the Unicode Converter
Development
- →JavaScript string escapes: Generate \uXXXX sequences for embedding non-ASCII characters in JS string literals.
- →Python code preparation: Produce \U00XXXXXX or \N{NAME} escape for Python strings.
- →CSS content property: Generate CSS \XXXX escapes for content: values with special characters.
Data processing
- →Debugging encoding issues: See the exact byte sequence produced by a string to diagnose encoding mismatches.
- →Normalization for comparison: Normalize text to NFC before comparing or storing to avoid visually-identical but bit-different strings.
- →Fixing garbled text: Convert garbled UTF-8 text displayed as ASCII back to the original through the correct encoding.
Content and education
- →Understanding character encoding: See how a single emoji like 😀 maps to different representations in different contexts.
- →HTML entity generation: Create HTML entities for any Unicode character when you can’t paste directly.
- →Academic or technical writing: Reference characters by canonical codepoint (U+XXXX) rather than pasting glyphs that may not render.
Unicode Converter — examples
Simple ASCII
All representations for a single letter.
A
codepoint: U+0041 UTF-8: 41 UTF-16: 0041 HTML: A / A JS: \u0041 Python: \u0041 CSS: \41
Accented
Precomposed é.
é
codepoint: U+00E9 UTF-8: C3 A9 (2 bytes) UTF-16: 00E9 (1 code unit) HTML: é / é JS: \u00E9
Emoji (supplementary plane)
Grinning face with surrogate pair.
😀
codepoint: U+1F600
UTF-8: F0 9F 98 80 (4 bytes)
UTF-16: D83D DE00 (surrogate pair, 2 code units)
HTML: 😀
JS: \u{1F600} or \uD83D\uDE00CJK ideograph
Chinese character.
漢
codepoint: U+6F22 UTF-8: E6 BC A2 (3 bytes) UTF-16: 6F22 (1 code unit) name: CJK UNIFIED IDEOGRAPH-6F22
Reverse conversion
From escape form back to text.
\uD83D\uDE00
😀 (grinning face, U+1F600) UTF-16 surrogate pair decoded to single codepoint
Technical details
Unicode representations:
Codepoint (U+XXXX): the abstract identifier. Every assigned character has a unique codepoint in the range U+0000 to U+10FFFF. \"A\" is U+0041, \"é\" is U+00E9, \"😀\" is U+1F600.
UTF-8: variable-length byte encoding. 1-4 bytes per codepoint. Byte patterns:
- U+0000-U+007F: 1 byte (0xxxxxxx)
- U+0080-U+07FF: 2 bytes (110xxxxx 10xxxxxx)
- U+0800-U+FFFF: 3 bytes (1110xxxx 10xxxxxx 10xxxxxx)
- U+10000-U+10FFFF: 4 bytes (11110xxx 10xxxxxx 10xxxxxx 10xxxxxx)
UTF-16: variable-length 16-bit code unit encoding. 1-2 code units per codepoint.
- BMP (U+0000-U+FFFF): 1 code unit, directly the codepoint
- Supplementary (U+10000+): 2 code units (surrogate pair: 0xD800-0xDBFF high, 0xDC00-0xDFFF low)
JavaScript strings are UTF-16 internally. \"😀\".length returns 2 because the emoji takes 2 code units. For real character count, use Array.from(\"😀\").length which returns 1 (iterates codepoints).
HTML entities:
- Decimal numeric: 😀
- Hex numeric: 😀
- Named (if available): no name for emoji, but © for ©
JavaScript escape forms:
- \\uXXXX for BMP (\\u0041 = A)
- \\uXXXX\\uXXXX for surrogate pairs (\\uD83D\\uDE00 = 😀)
- \\u{XXXX} for any codepoint in modern JS (\\u{1F600} = 😀) — ES2015+
Python escape forms:
- \\uXXXX for BMP
- \\U00XXXXXX for any codepoint (\\U0001F600 = 😀) — 8 hex digits
- \\N{NAME} for named characters (\\N{GRINNING FACE} = 😀)
CSS escape: \\XXXX with optional trailing space to disambiguate (\\1F600 = 😀)
Normalization: the same visual character can be encoded multiple ways. \"é\" can be U+00E9 (precomposed) or U+0065 U+0301 (e + combining acute). NFC normalizes to precomposed form, NFD to decomposed. NFKC and NFKD are \"compatibility\" forms that may collapse additional variants.
Common problems and solutions
⚠Surrogate pair miscounted
"😀".length in JavaScript returns 2, not 1, because UTF-16 uses a surrogate pair. For real character count, use Array.from(str).length which iterates by codepoint.
⚠Combining characters
"é" can be one codepoint (U+00E9 precomposed) or two (U+0065 + U+0301 combining acute). Visually identical but bit-different. Normalize to NFC for consistent storage and comparison.
⚠Mixing escape syntaxes
\uXXXX works in JS for BMP only. Supplementary characters need \u{XXXX} (ES2015+) or surrogate pair \uXXXX\uXXXX. Python uses \U00XXXXXX (8 hex digits) for supplementary planes.
⚠BMP vs supplementary
BMP characters (U+0000-U+FFFF) are simpler — one UTF-16 code unit each. Supplementary plane (U+10000-U+10FFFF) requires surrogate pairs in UTF-16. Code that handles only BMP breaks on emoji.
⚠Named HTML entities limited
Only about 250 named HTML entities exist in HTML5. For anything else, use numeric entities. 😀 for 😀 is correct; there’s no &grinning; entity.
⚠Normalization not applied
Without normalization, two visually identical strings may compare unequal because of combining-character differences. Always normalize before comparison or storage.
⚠Private use and unassigned
Codepoints in private use areas (U+E000-U+F8FF, etc.) have no assigned character. They show as placeholder in most fonts. Don’t assume arbitrary codepoints render correctly everywhere.
Unicode Converter — comparisons and alternatives
Compared to programming language conversion methods (Python\u2019s ord(), JavaScript\u2019s codePointAt()), this tool shows every representation simultaneously for easy comparison. For automation, use language built-ins.
Compared to other online converters, this tool handles the full Unicode range including supplementary plane characters (emoji) and surrogate pairs correctly. Many simpler converters fail on emoji.
Compared to the Character Map tool, this tool focuses on encoding representations rather than character browsing. Use Character Map to find characters; use this tool to see how they encode.
Frequently asked questions about the Unicode Converter
▶What is a Unicode codepoint?
An integer identifying a single character in the Unicode standard. Written as U+XXXX in hex. U+0041 is "A", U+1F600 is 😀. Every character has exactly one codepoint; encodings (UTF-8, UTF-16) are different ways to represent these codepoints as bytes.
▶What is the difference between UTF-8 and UTF-16?
Both encode Unicode. UTF-8 is variable-length (1-4 bytes per codepoint) and ASCII-compatible (English text is identical). UTF-16 is variable-length (2 or 4 bytes per codepoint). UTF-8 is the web standard; UTF-16 is used internally by JavaScript, Java, and some Windows APIs.
▶What is a surrogate pair?
UTF-16 cannot fit supplementary-plane characters (U+10000+) in one 16-bit code unit. It uses a pair: a high surrogate (U+D800-U+DBFF) followed by a low surrogate (U+DC00-U+DFFF). Together they encode one codepoint. Emoji are almost all supplementary plane, so they use surrogate pairs in UTF-16.
▶How do I escape an emoji in JavaScript?
Modern JS (ES2015+) supports \u{1F600} for any codepoint. For pre-ES2015 compatibility, use the surrogate pair: \uD83D\uDE00. Both represent the grinning face emoji.
▶What is Unicode normalization?
The process of converting different representations of the same character to a canonical form. "é" can be U+00E9 (precomposed) or U+0065 U+0301 (base e + combining acute). NFC form uses precomposed; NFD uses decomposed. Normalize before comparing or storing text to ensure consistency.
▶Why does JavaScript count emoji as 2 characters?
JavaScript strings are UTF-16. "😀".length returns 2 because the emoji takes 2 UTF-16 code units (a surrogate pair). For character count (grapheme cluster count), use Intl.Segmenter or a library like grapheme-splitter.
▶What is the difference between characters and bytes?
A character is a Unicode codepoint — one "thing" like A, é, 😀. Bytes are the low-level encoding of that character, which varies by encoding. "A" is 1 byte in UTF-8 and 2 bytes in UTF-16. "😀" is 4 bytes in UTF-8 and 4 bytes in UTF-16 (as a surrogate pair of 2 bytes each).
▶Is my input private?
Yes. All conversion runs in your browser with no server involvement. Text, codepoints, and escape sequences stay local.
Additional resources
- Unicode Standard — Official Unicode Consortium reference for codepoints and encoding.
- RFC 3629 UTF-8 — Definitive UTF-8 encoding specification.
- Unicode Normalization Forms — NFC, NFD, NFKC, NFKD normalization reference.
- MDN String encoding — JavaScript string handling and Unicode methods.
- Python Unicode HOWTO — Official Python Unicode reference.
Related tools
All Encoders & DecodersBase64 Encoder/Decoder
Encode and decode Base64 strings, files, and images instantly
Binary to Text Converter
Convert binary code (0s and 1s) to readable text in ASCII or Unicode, with configurable grouping and separator options.
Character Map
Browse and copy any Unicode character including emoji, symbols, arrows, mathematical signs, and non-Latin scripts.
Hex to Text Converter
Convert hexadecimal byte sequences to readable ASCII or UTF-8 text with flexible input formatting.
HTML Entity Encoder/Decoder
Encode special characters to HTML entities (&, <, ", ©) or decode entities back to their literal characters.
Morse Code Translator
Translate between text and Morse code with support for letters, numbers, punctuation, and audio playback of the encoded signal.
Learn more
Explore more tools
200+ free tools that run in your browser.
Browse all tools →