Compression
5. Compression Algorithms
5.1 Overview
M2M Protocol supports multiple compression algorithms optimized for different content types.
| Algorithm | Tag | Best For | Typical Savings |
|---|---|---|---|
| TokenNative | TK | Small-medium LLM API (<1KB) | 50-60% |
| Token | T1 | LLM API payloads (fallback) | 25-40% |
| Brotli | BR | Large content (>1KB) | 60-80% |
| Dictionary | DI | Repetitive patterns | 20-30% |
| None | - | Small content (<100B) | 0% |
5.2 Algorithm Selection
5.2.1 Automatic Selection
Implementations SHOULD select algorithms based on:
- Content size: Small content (<100 bytes) → None
- Content type: JSON with LLM keys → TokenNative (preferred) or Token
- Content size threshold: Large content (>1KB) → Brotli
- Token sensitivity: API-bound, small-medium → TokenNative preferred
5.2.2 Selection Heuristics
if content_size < 100: return Noneelif is_llm_api_payload(content): if content_size < 1024: return TokenNative # Best for M2M token efficiency elif content_size > 1024 and repetition_ratio > 0.3: return Brotli else: return TokenNativeelse: if content_size > 1024: return Brotli else: return TokenNative5.3 Token Compression (Algorithm T1)
5.3.1 Overview
Token compression applies semantic abbreviation to JSON keys, values, and model names. The goal is to reduce token count, not just byte count.
5.3.2 Key Abbreviation
Keys are abbreviated to single characters or short sequences.
Request Keys:
| Original | Abbreviated | Tokens Saved |
|---|---|---|
messages | m | 1 |
content | c | 1 |
role | r | 1 |
model | M | 1 |
temperature | T | 2 |
max_tokens | x | 2 |
top_p | p | 2 |
stream | s | 1 |
stop | S | 1 |
frequency_penalty | f | 3 |
presence_penalty | P | 3 |
logit_bias | lb | 2 |
user | u | 1 |
n | n | 0 |
seed | se | 1 |
tools | ts | 1 |
tool_choice | tc | 2 |
function_call | fc | 3 |
functions | fs | 2 |
response_format | rf | 3 |
Response Keys:
| Original | Abbreviated | Tokens Saved |
|---|---|---|
choices | C | 1 |
index | i | 1 |
message | m | 1 |
finish_reason | fr | 3 |
usage | U | 1 |
prompt_tokens | pt | 2 |
completion_tokens | ct | 3 |
total_tokens | tt | 2 |
delta | d | 1 |
logprobs | lp | 2 |
Tool Keys:
| Original | Abbreviated |
|---|---|
tool_calls | tc |
function | fn |
name | n |
arguments | a |
type | t |
5.3.3 Value Abbreviation
Common string values are abbreviated.
Role Values:
| Original | Abbreviated |
|---|---|
system | s |
user | u |
assistant | a |
function | f |
tool | t |
Finish Reason Values:
| Original | Abbreviated |
|---|---|
stop | s |
length | l |
tool_calls | tc |
content_filter | cf |
function_call | fc |
5.3.4 Model Abbreviation
Model identifiers are abbreviated by provider.
OpenAI Models:
| Original | Abbreviated |
|---|---|
gpt-4o | 4o |
gpt-4o-mini | 4om |
gpt-4-turbo | 4t |
gpt-4 | 4 |
gpt-3.5-turbo | 35t |
o1 | o1 |
o1-mini | o1m |
o1-preview | o1p |
o3 | o3 |
o3-mini | o3m |
Meta Llama Models:
| Original | Abbreviated |
|---|---|
meta-llama/llama-3.3-70b | ml3370 |
meta-llama/llama-3.1-405b | ml31405 |
meta-llama/llama-3.1-70b | ml3170 |
meta-llama/llama-3.1-8b | ml318 |
Mistral Models:
| Original | Abbreviated |
|---|---|
mistralai/mistral-large | mim-l |
mistralai/mistral-small | mim-s |
mistralai/mixtral-8x7b | mimx87 |
5.3.5 Default Value Omission
Parameters matching default values MAY be omitted.
| Parameter | Default | Omit When |
|---|---|---|
temperature | 1.0 | Equal to 1.0 |
top_p | 1.0 | Equal to 1.0 |
n | 1 | Equal to 1 |
stream | false | Equal to false |
frequency_penalty | 0 | Equal to 0 |
presence_penalty | 0 | Equal to 0 |
logit_bias | {} | Empty object |
stop | null | Null |
Implementations:
- MUST restore omitted parameters during decompression
- MUST preserve non-default values exactly
- SHOULD NOT omit if value differs from default
5.3.6 Compression Algorithm
function compress_token(json): obj = parse_json(json) obj = abbreviate_keys(obj, KEY_MAP) obj = abbreviate_values(obj, VALUE_MAP) obj = abbreviate_model(obj) obj = omit_defaults(obj) return "#T1|" + serialize_json(obj)
function decompress_token(wire): payload = strip_prefix(wire, "#T1|") obj = parse_json(payload) obj = expand_keys(obj, KEY_MAP) obj = expand_values(obj, VALUE_MAP) obj = expand_model(obj) obj = restore_defaults(obj) return serialize_json(obj)5.3.7 Example
Original (68 bytes, ~42 tokens):
{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}],"temperature":1.0,"stream":false}Compressed (45 bytes, ~29 tokens):
#T1|{"M":"4o","m":[{"r":"u","c":"Hello"}]}Savings: 34% bytes, 31% tokens
5.4 TokenNative Compression (Algorithm TK)
5.4.1 Overview
TokenNative compression transmits BPE token IDs directly instead of text. The tokenizer vocabulary serves as the compression dictionary, achieving 50-60% compression on raw bytes.
Unlike Token compression (T1) which abbreviates keys, TokenNative converts the entire content to token IDs, providing maximum compression for M2M communication where both endpoints share the same tokenizer.
5.4.2 Wire Format
#TK|<tokenizer_id>|<base64_varint_tokens>Components:
#TK|- Algorithm prefix (4 bytes)<tokenizer_id>- Single character:C(cl100k),O(o200k),L(llama)|- Separator<base64_varint_tokens>- Base64-encoded VarInt token IDs
5.4.3 Supported Tokenizers
| ID | Tokenizer | Vocabulary Size | Models |
|---|---|---|---|
C | cl100k_base | 100,256 | GPT-3.5, GPT-4 (canonical fallback) |
O | o200k_base | 200,019 | GPT-4o, o1, o3 |
L | Llama BPE | 128,256 | Llama 3, Mistral |
5.4.4 VarInt Encoding
Token IDs are encoded using variable-length integers:
| Value Range | Bytes | Encoding |
|---|---|---|
| 0-127 | 1 | 0xxxxxxx |
| 128-16383 | 2 | 1xxxxxxx 0xxxxxxx |
| 16384+ | 3+ | Continuation bits |
Average encoding: ~1.5 bytes per token for typical vocabularies.
5.4.5 Binary Format
For binary-safe channels (WebSocket binary, QUIC), skip Base64:
<tokenizer_byte><varint_tokens>- Byte 0: Tokenizer ID (0=cl100k, 1=o200k, 2=llama)
- Bytes 1+: Raw VarInt-encoded token IDs
This achieves ~50% compression (vs ~75% with Base64 overhead).
5.4.6 Compression Algorithm
function compress_token_native(text, encoding): tokens = tokenize(text, encoding) varint_bytes = varint_encode(tokens) base64_data = base64_encode(varint_bytes) return "#TK|" + encoding_id(encoding) + "|" + base64_data
function decompress_token_native(wire): parts = parse_wire(wire) # ["#TK", tokenizer_id, base64_data] encoding = encoding_from_id(parts[1]) varint_bytes = base64_decode(parts[2]) tokens = varint_decode(varint_bytes) return detokenize(tokens, encoding)5.4.7 Compression Ratios
| Content Type | Original | Compressed | Ratio |
|---|---|---|---|
| Small JSON (<200B) | 200 bytes | 80 bytes | 60% |
| Medium JSON (~1KB) | 1,024 bytes | 450 bytes | 56% |
| Large JSON (~10KB) | 10,240 bytes | 4,600 bytes | 55% |
5.4.8 When to Use
-
Prefer TokenNative for:
- Small-to-medium LLM API payloads (<1KB)
- M2M communication where both endpoints support it
- Maximum token efficiency
-
Prefer Brotli for:
- Large content (>1KB)
- Highly repetitive content
- Non-LLM content
5.4.9 Example
Original (68 bytes):
{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}TokenNative Wire (~40 bytes):
#TK|C|W3sib29kZWwiOiJncHQ...Savings: 41% bytes, same semantic content
5.5 Brotli Compression (Algorithm BR)
5.4.1 Overview
Brotli compression is used for large content where byte reduction outweighs Base64 token overhead.
5.4.2 Encoding
- Compress content using Brotli (quality level 4-6)
- Encode compressed bytes as Base64
- Prepend
#BR|prefix
5.4.3 When to Use
- Content size > 4096 bytes
- High repetition (>30% duplicate substrings)
- Non-LLM API content
5.4.4 Example
Original (large JSON):
{"messages":[{"role":"user","content":"...10KB of text..."}]}Compressed:
#BR|G6kEABwHcNP2Yk9N...base64...5.6 Dictionary Compression (Algorithm DI)
5.5.1 Overview
Dictionary compression encodes common JSON patterns as single bytes.
5.5.2 Pattern Table
| Pattern | Code |
|---|---|
{"role":"user","content":" | 0x80 |
{"role":"assistant","content":" | 0x81 |
{"role":"system","content":" | 0x82 |
"} | 0x83 |
"}, | 0x84 |
"}] | 0x85 |
{"messages":[ | 0x86 |
{"model":" | 0x87 |
5.5.3 Encoding
Patterns in the 0x80-0xFF byte range are reserved for dictionary codes.
5.7 No Compression
5.6.1 When to Use
- Content size < 100 bytes
- Already compressed content
- Binary content that cannot be JSON-encoded
5.6.2 Wire Format
Content is passed through without prefix modification.
5.8 Algorithm Negotiation
During session establishment, endpoints negotiate supported algorithms and encodings:
- Client sends list of supported algorithms and encodings in HELLO
- Server responds with intersection in ACCEPT
- Subsequent DATA messages use any negotiated algorithm
For TokenNative, encoding negotiation ensures both endpoints use the same tokenizer:
Client capabilities: algorithms: [TOKEN_NATIVE, TOKEN, BROTLI] encodings: [CL100K_BASE, O200K_BASE] preferred_encoding: O200K_BASE
Server capabilities: algorithms: [TOKEN_NATIVE, BROTLI] encodings: [CL100K_BASE] preferred_encoding: CL100K_BASE
Negotiated: algorithms: [TOKEN_NATIVE, BROTLI] encoding: CL100K_BASE (intersection, fallback to canonical)For stateless mode, the prefix indicates the algorithm and encoding used.
5.9 Decompression
5.9.1 Algorithm Detection
Implementations MUST detect algorithm from prefix:
if starts_with("#TK|"): return decompress_token_native(content)elif starts_with("#T1|"): return decompress_token(content)elif starts_with("#BR|"): return decompress_brotli(content)elif starts_with("#DI|"): return decompress_dictionary(content)else: return content # No compression5.9.2 Error Handling
- Invalid prefix → return error
- Decompression failure → return error
- Invalid JSON after decompression → return error
Implementations MUST NOT return partially decompressed content.