Skip to content

Compression

5. Compression Algorithms

5.1 Overview

M2M Protocol supports multiple compression algorithms optimized for different content types.

AlgorithmTagBest ForTypical Savings
TokenNativeTKSmall-medium LLM API (<1KB)50-60%
TokenT1LLM API payloads (fallback)25-40%
BrotliBRLarge content (>1KB)60-80%
DictionaryDIRepetitive patterns20-30%
None-Small content (<100B)0%

5.2 Algorithm Selection

5.2.1 Automatic Selection

Implementations SHOULD select algorithms based on:

  1. Content size: Small content (<100 bytes) → None
  2. Content type: JSON with LLM keys → TokenNative (preferred) or Token
  3. Content size threshold: Large content (>1KB) → Brotli
  4. Token sensitivity: API-bound, small-medium → TokenNative preferred

5.2.2 Selection Heuristics

if content_size < 100:
return None
elif is_llm_api_payload(content):
if content_size < 1024:
return TokenNative # Best for M2M token efficiency
elif content_size > 1024 and repetition_ratio > 0.3:
return Brotli
else:
return TokenNative
else:
if content_size > 1024:
return Brotli
else:
return TokenNative

5.3 Token Compression (Algorithm T1)

5.3.1 Overview

Token compression applies semantic abbreviation to JSON keys, values, and model names. The goal is to reduce token count, not just byte count.

5.3.2 Key Abbreviation

Keys are abbreviated to single characters or short sequences.

Request Keys:

OriginalAbbreviatedTokens Saved
messagesm1
contentc1
roler1
modelM1
temperatureT2
max_tokensx2
top_pp2
streams1
stopS1
frequency_penaltyf3
presence_penaltyP3
logit_biaslb2
useru1
nn0
seedse1
toolsts1
tool_choicetc2
function_callfc3
functionsfs2
response_formatrf3

Response Keys:

OriginalAbbreviatedTokens Saved
choicesC1
indexi1
messagem1
finish_reasonfr3
usageU1
prompt_tokenspt2
completion_tokensct3
total_tokenstt2
deltad1
logprobslp2

Tool Keys:

OriginalAbbreviated
tool_callstc
functionfn
namen
argumentsa
typet

5.3.3 Value Abbreviation

Common string values are abbreviated.

Role Values:

OriginalAbbreviated
systems
useru
assistanta
functionf
toolt

Finish Reason Values:

OriginalAbbreviated
stops
lengthl
tool_callstc
content_filtercf
function_callfc

5.3.4 Model Abbreviation

Model identifiers are abbreviated by provider.

OpenAI Models:

OriginalAbbreviated
gpt-4o4o
gpt-4o-mini4om
gpt-4-turbo4t
gpt-44
gpt-3.5-turbo35t
o1o1
o1-minio1m
o1-previewo1p
o3o3
o3-minio3m

Meta Llama Models:

OriginalAbbreviated
meta-llama/llama-3.3-70bml3370
meta-llama/llama-3.1-405bml31405
meta-llama/llama-3.1-70bml3170
meta-llama/llama-3.1-8bml318

Mistral Models:

OriginalAbbreviated
mistralai/mistral-largemim-l
mistralai/mistral-smallmim-s
mistralai/mixtral-8x7bmimx87

5.3.5 Default Value Omission

Parameters matching default values MAY be omitted.

ParameterDefaultOmit When
temperature1.0Equal to 1.0
top_p1.0Equal to 1.0
n1Equal to 1
streamfalseEqual to false
frequency_penalty0Equal to 0
presence_penalty0Equal to 0
logit_bias{}Empty object
stopnullNull

Implementations:

  • MUST restore omitted parameters during decompression
  • MUST preserve non-default values exactly
  • SHOULD NOT omit if value differs from default

5.3.6 Compression Algorithm

function compress_token(json):
obj = parse_json(json)
obj = abbreviate_keys(obj, KEY_MAP)
obj = abbreviate_values(obj, VALUE_MAP)
obj = abbreviate_model(obj)
obj = omit_defaults(obj)
return "#T1|" + serialize_json(obj)
function decompress_token(wire):
payload = strip_prefix(wire, "#T1|")
obj = parse_json(payload)
obj = expand_keys(obj, KEY_MAP)
obj = expand_values(obj, VALUE_MAP)
obj = expand_model(obj)
obj = restore_defaults(obj)
return serialize_json(obj)

5.3.7 Example

Original (68 bytes, ~42 tokens):

{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}],"temperature":1.0,"stream":false}

Compressed (45 bytes, ~29 tokens):

#T1|{"M":"4o","m":[{"r":"u","c":"Hello"}]}

Savings: 34% bytes, 31% tokens

5.4 TokenNative Compression (Algorithm TK)

5.4.1 Overview

TokenNative compression transmits BPE token IDs directly instead of text. The tokenizer vocabulary serves as the compression dictionary, achieving 50-60% compression on raw bytes.

Unlike Token compression (T1) which abbreviates keys, TokenNative converts the entire content to token IDs, providing maximum compression for M2M communication where both endpoints share the same tokenizer.

5.4.2 Wire Format

#TK|<tokenizer_id>|<base64_varint_tokens>

Components:

  • #TK| - Algorithm prefix (4 bytes)
  • <tokenizer_id> - Single character: C (cl100k), O (o200k), L (llama)
  • | - Separator
  • <base64_varint_tokens> - Base64-encoded VarInt token IDs

5.4.3 Supported Tokenizers

IDTokenizerVocabulary SizeModels
Ccl100k_base100,256GPT-3.5, GPT-4 (canonical fallback)
Oo200k_base200,019GPT-4o, o1, o3
LLlama BPE128,256Llama 3, Mistral

5.4.4 VarInt Encoding

Token IDs are encoded using variable-length integers:

Value RangeBytesEncoding
0-12710xxxxxxx
128-1638321xxxxxxx 0xxxxxxx
16384+3+Continuation bits

Average encoding: ~1.5 bytes per token for typical vocabularies.

5.4.5 Binary Format

For binary-safe channels (WebSocket binary, QUIC), skip Base64:

<tokenizer_byte><varint_tokens>
  • Byte 0: Tokenizer ID (0=cl100k, 1=o200k, 2=llama)
  • Bytes 1+: Raw VarInt-encoded token IDs

This achieves ~50% compression (vs ~75% with Base64 overhead).

5.4.6 Compression Algorithm

function compress_token_native(text, encoding):
tokens = tokenize(text, encoding)
varint_bytes = varint_encode(tokens)
base64_data = base64_encode(varint_bytes)
return "#TK|" + encoding_id(encoding) + "|" + base64_data
function decompress_token_native(wire):
parts = parse_wire(wire) # ["#TK", tokenizer_id, base64_data]
encoding = encoding_from_id(parts[1])
varint_bytes = base64_decode(parts[2])
tokens = varint_decode(varint_bytes)
return detokenize(tokens, encoding)

5.4.7 Compression Ratios

Content TypeOriginalCompressedRatio
Small JSON (<200B)200 bytes80 bytes60%
Medium JSON (~1KB)1,024 bytes450 bytes56%
Large JSON (~10KB)10,240 bytes4,600 bytes55%

5.4.8 When to Use

  • Prefer TokenNative for:

    • Small-to-medium LLM API payloads (<1KB)
    • M2M communication where both endpoints support it
    • Maximum token efficiency
  • Prefer Brotli for:

    • Large content (>1KB)
    • Highly repetitive content
    • Non-LLM content

5.4.9 Example

Original (68 bytes):

{"model":"gpt-4o","messages":[{"role":"user","content":"Hello"}]}

TokenNative Wire (~40 bytes):

#TK|C|W3sib29kZWwiOiJncHQ...

Savings: 41% bytes, same semantic content

5.5 Brotli Compression (Algorithm BR)

5.4.1 Overview

Brotli compression is used for large content where byte reduction outweighs Base64 token overhead.

5.4.2 Encoding

  1. Compress content using Brotli (quality level 4-6)
  2. Encode compressed bytes as Base64
  3. Prepend #BR| prefix

5.4.3 When to Use

  • Content size > 4096 bytes
  • High repetition (>30% duplicate substrings)
  • Non-LLM API content

5.4.4 Example

Original (large JSON):

{"messages":[{"role":"user","content":"...10KB of text..."}]}

Compressed:

#BR|G6kEABwHcNP2Yk9N...base64...

5.6 Dictionary Compression (Algorithm DI)

5.5.1 Overview

Dictionary compression encodes common JSON patterns as single bytes.

5.5.2 Pattern Table

PatternCode
{"role":"user","content":"0x80
{"role":"assistant","content":"0x81
{"role":"system","content":"0x82
"}0x83
"},0x84
"}]0x85
{"messages":[0x86
{"model":"0x87

5.5.3 Encoding

Patterns in the 0x80-0xFF byte range are reserved for dictionary codes.

5.7 No Compression

5.6.1 When to Use

  • Content size < 100 bytes
  • Already compressed content
  • Binary content that cannot be JSON-encoded

5.6.2 Wire Format

Content is passed through without prefix modification.

5.8 Algorithm Negotiation

During session establishment, endpoints negotiate supported algorithms and encodings:

  1. Client sends list of supported algorithms and encodings in HELLO
  2. Server responds with intersection in ACCEPT
  3. Subsequent DATA messages use any negotiated algorithm

For TokenNative, encoding negotiation ensures both endpoints use the same tokenizer:

Client capabilities:
algorithms: [TOKEN_NATIVE, TOKEN, BROTLI]
encodings: [CL100K_BASE, O200K_BASE]
preferred_encoding: O200K_BASE
Server capabilities:
algorithms: [TOKEN_NATIVE, BROTLI]
encodings: [CL100K_BASE]
preferred_encoding: CL100K_BASE
Negotiated:
algorithms: [TOKEN_NATIVE, BROTLI]
encoding: CL100K_BASE (intersection, fallback to canonical)

For stateless mode, the prefix indicates the algorithm and encoding used.

5.9 Decompression

5.9.1 Algorithm Detection

Implementations MUST detect algorithm from prefix:

if starts_with("#TK|"):
return decompress_token_native(content)
elif starts_with("#T1|"):
return decompress_token(content)
elif starts_with("#BR|"):
return decompress_brotli(content)
elif starts_with("#DI|"):
return decompress_dictionary(content)
else:
return content # No compression

5.9.2 Error Handling

  • Invalid prefix → return error
  • Decompression failure → return error
  • Invalid JSON after decompression → return error

Implementations MUST NOT return partially decompressed content.