Polyglot

Dark Mode

How to use this tokenizer:

1. Encoding: Enter any text in the left panel and click "Encode" to see how it's broken into tokens. You'll see both token IDs and their text representations.

2. Decoding: Enter comma-separated token IDs in the right panel and click "Decode" to convert them back to text.

3. Analyze: Observe the compression ratio by comparing token count to character count. Use "Copy to Decode" to test decoding of your encoded tokens. This tokenizer is optimized for multilingual content, supporting not only English/Latin scripts but also various international writing systems.

Encode Text to Tokens

Tokens: 0
Characters: 0
Compression Ratio: 0.00
Computation Time: 0.00s
Token Texts Token IDs

Decode Tokens to Text

Decoded Text


                            
Computation Time: 0.00s