Description
There's been numerous discussion on how to implement a new more compact binary protocol. The discussions become hard to follow after a while so this page is intended to be used as an easy to use summary that can later be formalized into different options and finally become a specification. Help needed to fill this page with further details, suggestions and pros/cons for each suggestion.
Implementation suggestions
Encode i32 and i64 types saved as variable size integers
Suggestion |
Pros |
Cons |
ZIP encoding (variable length encoding) for only positive values |
save a max of 3 bytes for small ints |
user has to specify the new type |
Base 128 + zigzag, borrow from protocol buffers? |
|
user has to specify whether zigzag needs to be used for efficiency |
As the user knows best about his data he can choose whichever he wants and save bytes. This means we need more type modifiers for these types
Remove / reduce the size of field prefix tags
Suggestion |
Pros |
Cons |
Reduce from 3 bytes per field to 1 byte, see mail |
Retains versioning support |
Only good for dense structs |
1-byte type-and-modifier, variable length int for field id |
|
|
Drop field prefix altogether |
saves tons of space |
no versioning is possible |
Use a per-struct variable length bitset to specify which all fields present . Preserve type info |
Saves 1 bit/field and adds 1 byte/ 7 fields |
Bad for sparse objects |
Type changes
Suggestion |
Pros |
Cons |
ZIP encoding (variable length encoding) for only positive values |
save a max of 3 bytes for small ints |
user has to specify the new type |
Unsigned integers |
Would alleviate need for separate zigzag type |
Unsigned ints don't exist in all languages |
Type annotations |
Allows us to specify encoding details about the fields/types that the protocols may or may not use |
|
Variable ints for string, binary, and collection sizes |
Will often shrink to one or two bytes |
|
Have two types BOOLEAN_TRUE and BOOLEAN_FALSE instead of type and value |
Save a byte on every boolean |
|
Better usage of type byte
If we spent one whole byte for type it is quite a waste considering we have ~15 types . That is a wastage of almost 4 bits on EACH field. Let us have two types of types. One with extra information and one which does not . Let us take the 5 least significant bit (LSB) to represent them. Let us make use of the 3 most significant bits (MSB) for types with extra information
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
The 5 LSB (green) could be used for these types
- VOID
- STOP
- BOOLEAN_TRUE
- BOOLEAN_FALSE
- DOUBLE
- I16
- I32
- I64
The 3 MSB (red) can be used for a max 7 types. The 5 MSB can be used in these types for length ,value etc (depending on the type)
- STRING
- SET
- LIST
- MAP
- POSITIVE_I32
- STRUCT
- EXTERN_STRING