The syntax of the schema language (aka IDL, Interface Definition Language) should look quite familiar to users of any of the C family of languages, and also to users of other IDLs. Let's look at an example first:
// example IDL file
namespace MyGame;
attribute "priority";
enum Color : byte { Red = 1, Green, Blue }
union Any { Monster, Weapon, Pickup }
struct Vec3 {
x:float;
y:float;
z:float;
}
table Monster {
pos:Vec3;
mana:short = 150;
hp:short = 100;
name:string;
friendly:bool = false (deprecated, priority: 1);
inventory:[ubyte];
color:Color = Blue;
test:Any;
}
root_type Monster;
(Weapon
& Pickup
not defined as part of
this example).
Tables are the main way of defining objects in FlatBuffers, and
consist of a name (here Monster
) and a list of fields. Each
field has a name, a type, and optionally a default value. If the default
value is not specified in the schema, it will be 0
for
scalar types, or null
for other types. Some languages
support setting a scalar's default to null
. This makes the
scalar optional.
Fields do not have to appear in the wire representation, and you can choose to omit fields when constructing an object. You have the flexibility to add fields without fear of bloating your data. This design is also FlatBuffer's mechanism for forward and backwards compatibility. Note that:
You can add new fields in the schema ONLY at the end of a table
definition. Older data will still read correctly, and give you the
default value when read. Older code will simply ignore the new field. If
you want to have flexibility to use any order for fields in your schema,
you can manually assign ids (much like Protocol Buffers), see the
id
attribute below.
You cannot delete fields you don't use anymore from the schema,
but you can simply stop writing them into your data for almost the same
effect. Additionally you can mark them as deprecated
as in
the example above, which will prevent the generation of accessors in the
generated C++, as a way to enforce the field not being used any more.
(careful: this may break code!).
You may change field names and table names, if you're ok with your code breaking until you've renamed them there too.
See "Schema evolution examples" below for more on this topic.
Similar to a table, only now none of the fields are optional (so no
defaults either), and fields may not be added or be deprecated. Structs
may only contain scalars or other structs. Use this for simple objects
where you are very sure no changes will ever be made (as quite clear in
the example Vec3
). Structs use less memory than tables and
are even faster to access (they are always stored in-line in their
parent object, and use no virtual table).
Built-in scalar types are
8 bit: byte
(int8
), ubyte
(uint8
), bool
16 bit: short
(int16
),
ushort
(uint16
)
32 bit: int
(int32
), uint
(uint32
), float
(float32
)
64 bit: long
(int64
),
ulong
(uint64
), double
(float64
)
The type names in parentheses are alias names such that for example
uint8
can be used in place of ubyte
, and
int32
can be used in place of int
without
affecting code generation.
Built-in non-scalar types:
Vector of any other type (denoted with [type]
).
Nesting vectors is not supported, instead you can wrap the inner vector
in a table.
string
, which may only hold UTF-8 or 7-bit ASCII.
For other text encodings or general binary data use vectors
([byte]
or [ubyte]
) instead.
References to other tables or structs, enums or unions (see below).
You can't change types of fields once they're used, with the
exception of same-size data where a reinterpret_cast
would
give you a desirable result, e.g. you could change a uint
to an int
if no values in current data use the high bit
yet.
Arrays are a convenience short-hand for a fixed-length collection of elements. Arrays can be used to replace the following schema:
struct Vec3 {
x:float;
y:float;
z:float;
}
with the following schema:
struct Vec3 {
v:[float:3];
}
Both representations are binary equivalent.
Arrays are currently only supported in a struct
.
There are three, mutually exclusive, reactions to the non-presence of a table's field in the binary data:
null
depending on the local language. (In a sense, null
is the
default value).required
tag
below.When writing a schema, values are a sequence of digits. Values may be
optionally followed by a decimal point (.
) and more digits,
for float constants, or optionally prefixed by a -
. Floats
may also be in scientific notation; optionally ending with an
e
or E
, followed by a +
or
-
and more digits. Values can also be the keyword
null
.
Only scalar values can have defaults, non-scalar
(string/vector/table) fields default to null
when not
present.
You generally do not want to change default values after they're initially defined. Fields that have the default value are not actually stored in the serialized data (see also Gotchas below). Values explicitly written by code generated by the old schema old version, if they happen to be the default, will be read as a different value by code generated with the new schema. This is slightly less bad when converting an optional scalar into a default valued scalar since non-presence would not be overloaded with a previous default value. There are situations, however, where this may be desirable, especially if you can ensure a simultaneous rebuild of all code.
Define a sequence of named constants, each with a given value, or
increasing by one from the previous one. The default first value is
0
. As you can see in the enum declaration, you specify the
underlying integral type of the enum with :
(in this case
byte
), which then determines the type of any fields
declared with this enum type.
Only integer types are allowed, i.e. byte
,
ubyte
, short
ushort
,
int
, uint
, long
and
ulong
.
Typically, enum values should only ever be added, never removed (there is no deprecation for enums). This requires code to handle forwards compatibility itself, by handling unknown enum values.
Unions share a lot of properties with enums, but instead of new names
for constants, you use names of tables. You can then declare a union
field, which can hold a reference to any of those types, and
additionally a field with the suffix _type
is generated
that holds the corresponding enum value, allowing you to know which type
to cast to at runtime.
It's possible to give an alias name to a type union. This way a type can even be used to mean different things depending on the name used:
table PointPosition { x:uint; y:uint; }
table MarkerPosition {}
union Position {
Start:MarkerPosition,
Point:PointPosition,
Finish:MarkerPosition
}
Unions contain a special NONE
marker to denote that no
value is stored so that name cannot be used as an alias.
Unions are a good way to be able to send multiple message types as a FlatBuffer. Note that because a union field is really two fields, it must always be part of a table, it cannot be the root of a FlatBuffer by itself.
If you have a need to distinguish between different FlatBuffers in a more open-ended way, for example for use as files, see the file identification feature below.
There is an experimental support only in C++ for a vector of unions (and types). In the example IDL file above, use [Any] to add a vector of Any to Monster table. There is also experimental support for other types besides tables in unions, in particular structs and strings. There's no direct support for scalars in unions, but they can be wrapped in a struct at no space cost.
These will generate the corresponding namespace in C++ for all helper
code, and packages in Java. You can use .
to specify nested
namespaces / packages.
You can include other schemas files in your current one, e.g.:
include "mydefinitions.fbs";
This makes it easier to refer to types defined elsewhere.
include
automatically ensures each file is parsed just
once, even when referred to more than once.
When using the flatc
compiler to generate code for
schema definitions, only definitions in the current file will be
generated, not those from the included files (those you still generate
separately).
This declares what you consider to be the root table of the serialized data. This is particularly important for parsing JSON data, which doesn't include object type information.
Typically, a FlatBuffer binary buffer is not self-describing, i.e. it needs you to know its schema to parse it correctly. But if you want to use a FlatBuffer as a file format, it would be convenient to be able to have a "magic number" in there, like most file formats have, to be able to do a sanity check to see if you're reading the kind of file you're expecting.
Now, you can always prefix a FlatBuffer with your own file header, but FlatBuffers has a built-in way to add an identifier to a FlatBuffer that takes up minimal space, and keeps the buffer compatible with buffers that don't have such an identifier.
You can specify in a schema, similar to root_type
, that
you intend for this type of FlatBuffer to be used as a file format:
file_identifier "MYFI";
Identifiers must always be exactly 4 characters long. These 4 characters will end up as bytes at offsets 4-7 (inclusive) in the buffer.
For any schema that has such an identifier, flatc
will
automatically add the identifier to any binaries it generates (with
-b
), and generated calls like
FinishMonsterBuffer
also add the identifier. If you have
specified an identifier and wish to generate a buffer without one, you
can always still do so by calling FlatBufferBuilder::Finish
explicitly.
After loading a buffer, you can use a call like
MonsterBufferHasIdentifier
to check if the identifier is
present.
Note that this is best for open-ended uses such as files. If you simply wanted to send one of a set of possible messages over a network for example, you'd be better off with a union.
Additionally, by default flatc
will output binary files
as .bin
. This declaration in the schema will change that to
whatever you want:
file_extension "ext";
You can declare RPC calls in a schema, that define a set of functions that take a FlatBuffer as an argument (the request) and return a FlatBuffer as the response (both of which must be table types):
rpc_service MonsterStorage {
Store(Monster):StoreResponse;
Retrieve(MonsterId):Monster;
}
What code this produces and how it is used depends on language and
RPC system used, there is preliminary support for GRPC through the
--grpc
code generator, see grpc/tests
for an
example.
May be written as in most C-based languages. Additionally, a triple
comment (///
) on a line by itself signals that a comment is
documentation for whatever is declared on the line after it
(table/struct/field/enum/union/element), and the comment is output in
the corresponding C++ code. Multiple such lines per item are
allowed.
Attributes may be attached to a declaration, behind a field/enum
value, or after the name of a table/struct/enum/union. These may either
have a value or not. Some attributes like deprecated
are
understood by the compiler; user defined ones need to be declared with
the attribute declaration (like priority
in the example
above), and are available to query if you parse the schema at runtime.
This is useful if you write your own code generators/editors etc., and
you wish to add additional information specific to your tool (such as a
help text).
Current understood attributes:
id: n
(on a table field): manually set the field
identifier to n
. If you use this attribute, you must use it
on ALL fields of this table, and the numbers must be a contiguous range
from 0 onwards. Additionally, since a union type effectively adds two
fields, its id must be that of the second field (the first field is the
type field and not explicitly declared in the schema). For example, if
the last field before the union field had id 6, the union field should
have id 8, and the unions type field will implicitly be 7. IDs allow the
fields to be placed in any order in the schema. When a new field is
added to the schema it must use the next available ID.deprecated
(on a field): do not generate accessors for
this field anymore, code should stop using this data. Old data may still
contain this field, but it won't be accessible anymore by newer code.
Note that if you deprecate a field that was previous required, old code
may fail to validate new data (when using the optional verifier).required
(on a non-scalar table field): this field must
always be set. By default, fields do not need to be present in the
binary. This is desirable, as it helps with forwards/backwards
compatibility, and flexibility of data structures. By specifying this
attribute, you make non- presence in an error for both reader and
writer. The reading code may access the field directly, without checking
for null. If the constructing code does not initialize this field, they
will get an assert, and also the verifier will fail on buffers that have
missing required fields. Both adding and removing this attribute may be
forwards/backwards incompatible as readers will be unable read old or
new data, respectively, unless the data happens to always have the field
set.force_align: size
(on a struct): force the alignment of
this struct to be something higher than what it is naturally aligned to.
Causes these structs to be aligned to that amount inside a buffer, IF
that buffer is allocated with that alignment (which is not necessarily
the case for buffers accessed directly inside a
FlatBufferBuilder
). Note: currently not guaranteed to have
an effect when used with --object-api
, since that may
allocate objects at alignments less than what you specify with
force_align
.force_align: size
(on a vector): force the alignment of
this vector to be something different than what the element size would
normally dictate. Note: Now only work for generated C++ code.bit_flags
(on an unsigned enum): the values of this
field indicate bits, meaning that any unsigned value N specified in the
schema will end up representing 1<<N, or if you don't specify
values at all, you'll get the sequence 1, 2, 4, 8, ...nested_flatbuffer: "table_name"
(on a field): this
indicates that the field (which must be a vector of ubyte) contains
flatbuffer data, for which the root type is given by
table_name
. The generated code will then produce a
convenient accessor for the nested FlatBuffer.flexbuffer
(on a field): this indicates that the field
(which must be a vector of ubyte) contains flexbuffer data. The
generated code will then produce a convenient accessor for the
FlexBuffer root.key
(on a field): this field is meant to be used as a
key when sorting a vector of the type of table it sits in. Can be used
for in-place binary search.hash
(on a field). This is an (un)signed 32/64 bit
integer field, whose value during JSON parsing is allowed to be a
string, which will then be stored as its hash. The value of attribute is
the hashing algorithm to use, one of fnv1_32
fnv1_64
fnv1a_32
fnv1a_64
.original_order
(on a table): since elements in a table
do not need to be stored in any particular order, they are often
optimized for space by sorting them to size. This attribute stops that
from happening. There should generally not be any reason to use this
flag.The same parser that parses the schema declarations above is also able to parse JSON objects that conform to this schema. So, unlike other JSON parsers, this parser is strongly typed, and parses directly into a FlatBuffer (see the compiler documentation on how to do this from the command line, or the C++ documentation on how to do this at runtime).
Besides needing a schema, there are a few other changes to how it parses JSON:
strict_json
flag.field: EnumVal
. If a field is of integral type, you can
still use symbolic names, but values need to be prefixed with their type
and need to be quoted, e.g. field: "Enum.EnumVal"
. For
enums representing flags, you may place multiple inside a string
separated by spaces to OR them, e.g.
field: "EnumVal1 EnumVal2"
or
field: "Enum.EnumVal1 Enum.EnumVal2"
.foo
, you must add a field foo_type: FooOne
right before the foo
field, where FooOne
would
be the table out of the union you want to use.null
(e.g.
field: null
) is intended to have the default value for that
field (thus has the same effect as if that field wasn't specified at
all).rad(180)
where ever you'd normally write
3.14159
. Currently supports the following functions:
rad
, deg
, cos
, sin
,
tan
, acos
, asin
,
atan
.When parsing JSON, it recognizes the following escape codes in strings:
\n
- linefeed.\t
- tab.\r
- carriage return.\b
- backspace.\f
- form feed.\"
- double quote.\\
- backslash.\/
- forward slash.\uXXXX
- 16-bit unicode code point, converted to the
equivalent UTF-8 representation.\xXX
- 8-bit binary hexadecimal number XX. This is the
only one that is not in the JSON spec (see http://json.org/), but is needed to be able
to encode arbitrary binary in strings to text and back without losing
information (e.g. the byte 0xFF can't be represented in standard
JSON).It also generates these escape codes back again when generating JSON from a binary representation.
When parsing numbers, the parser is more flexible than JSON. A format of numeric literals is more close to the C/C++. According to the [grammar](@ref flatbuffers_grammar), it accepts the following numerical literals:
An integer literal can have any number of leading zero
0
digits. Unlike C/C++, the parser ignores a leading zero,
not interpreting it as the beginning of the octal number. The numbers
[081, -00094]
are equal to [81, -94]
decimal
integers.
The parser accepts unsigned and signed hexadecimal integer
numbers. For example: [0x123, +0x45, -0x67]
are equal to
[291, 69, -103]
decimals.
The format of float-point numbers is fully compatible with C/C++
format. If a modern C++ compiler is used the parser accepts hexadecimal
and special floating-point literals as well:
[-1.0, 2., .3e0, 3.e4, 0x21.34p-5, -inf, nan]
.
The following conventions for floating-point numbers are used:
NaN
converted to unsigned IEEE-754
quiet-NaN
value.Extended floating-point support was tested with:
MSVC2015
and higher.LLVM 6.0
, GCC 4.9
and
higher.For details, see [Use in C++](@ref flatbuffers_guide_use_cpp) section.
For compatibility with a JSON lint tool all numeric literals of
scalar fields can be wrapped to quoted string:
"1", "2.0", "0x48A", "0x0C.0Ep-1", "-inf", "true"
.
FlatBuffers is all about efficiency, but to realize that efficiency you require an efficient schema. There are usually multiple choices on how to represent data that have vastly different size characteristics.
It is very common nowadays to represent any kind of data as dictionaries (as in e.g. JSON), because of its flexibility and extensibility. While it is possible to emulate this in FlatBuffers (as a vector of tables with key and value(s)), this is a bad match for a strongly typed system like FlatBuffers, leading to relatively large binaries. FlatBuffer tables are more flexible than classes/structs in most systems, since having a large number of fields only few of which are actually used is still efficient. You should thus try to organize your data as much as possible such that you can use tables where you might be tempted to use a dictionary.
Similarly, strings as values should only be used when they are truly open-ended. If you can, always use an enum instead.
FlatBuffers doesn't have inheritance, so the way to represent a set of related data structures is a union. Unions do have a cost however, so an alternative to a union is to have a single table that has all the fields of all the data structures you are trying to represent, if they are relatively similar / share many fields. Again, this is efficient because non-present fields are cheap.
FlatBuffers supports the full range of integer sizes, so try to pick the smallest size needed, rather than defaulting to int/long.
Remember that you can share data (refer to the same string/table within a buffer), so factoring out repeating data into its own data structure may be worth it.
Identifiers in a schema are meant to translate to many different programming languages, so using the style of your "main" language is generally a bad idea.
For this reason, below is a suggested style guide to adhere to, to keep schemas consistent for interoperation regardless of the target language.
Where possible, the code generators for specific languages will generate identifiers that adhere to the language style, based on the schema identifiers.
Formatting (this is less important, but still worth adhering to):
:
for types,
on both sides for =
.For an example, see the schema at the top of this file.
FlatBuffers relies on new field declarations being added at the end,
and earlier declarations to not be removed, but be marked deprecated
when needed. We think this is an improvement over the manual number
assignment that happens in Protocol Buffers (and which is still an
option using the id
attribute mentioned above).
One place where this is possibly problematic however is source control. If user A adds a field, generates new binary data with this new schema, then tries to commit both to source control after user B already committed a new field also, and just auto-merges the schema, the binary files are now invalid compared to the new schema.
The solution of course is that you should not be generating binary data before your schema changes have been committed, ensuring consistency with the rest of the world. If this is not practical for you, use explicit field ids, which should always generate a merge conflict if two people try to allocate the same id.
Some examples to clarify what happens as you change a schema:
If we have the following original schema:
table { a:int; b:int; }
And we extend it:
table { a:int; b:int; c:int; }
This is ok. Code compiled with the old schema reading data generated
with the new one will simply ignore the presence of the new field. Code
compiled with the new schema reading old data will get the default value
for c
(which is 0 in this case, since it is not
specified).
table { a:int (deprecated); b:int; }
This is also ok. Code compiled with the old schema reading newer data
will now always get the default value for a
since it is not
present. Code compiled with the new schema now cannot read nor write
a
anymore (any existing code that tries to do so will
result in compile errors), but can still read old data (they will ignore
the field).
table { c:int; a:int; b:int; }
This is NOT ok, as this makes the schemas incompatible. Old code
reading newer data will interpret c
as if it was
a
, and new code reading old data accessing a
will instead receive b
.
table { c:int (id: 2); a:int (id: 0); b:int (id: 1); }
This is ok. If your intent was to order/group fields in a way that makes sense semantically, you can do so using explicit id assignment. Now we are compatible with the original schema, and the fields can be ordered in any way, as long as we keep the sequence of ids.
table { b:int; }
NOT ok. We can only remove a field by deprecation, regardless of whether we use explicit ids or not.
table { a:uint; b:uint; }
This is MAYBE ok, and only in the case where the type change is the same size, like here. If old data never contained any negative numbers, this will be safe to do.
table { a:int = 1; b:int = 2; }
Generally NOT ok. Any older data written that had 0 values were not
written to the buffer, and rely on the default value to be recreated.
These will now have those values appear to 1
and
2
instead. There may be cases in which this is ok, but care
must be taken.
table { aa:int; bb:int; }
Occasionally ok. You've renamed fields, which will break all code (and JSON files!) that use this schema, but as long as the change is obvious, this is not incompatible with the actual binary buffers, since those only ever address fields by id/offset.
Suppose we have the following schema:
union Foo { A, B }
We can add another variant at the end.
union Foo { A, B, another_a: A }
and this will be okay. Old code will not recognize
another_a
. However if we add another_a
anywhere but the end, e.g.
union Foo { A, another_a: A, B }
this is not okay. When new code writes another_a
, old
code will misinterpret it as B
(and vice versa). However
you can explicitly set the union's "discriminant" value like so:
union Foo { A = 1, another_a: A = 3, B = 2 }
This is okay.
union Foo { original_a: A = 1, another_a: A = 3, B = 2 }
Renaming fields will break code and any saved human readable representations, such as json files, but the binary buffers will be the same.
Most serialization formats (e.g. JSON or Protocol Buffers) make it very explicit in the format whether a field is present in an object or not, allowing you to use this as "extra" information.
FlatBuffers will not write fields that are equal to their default value, sometimes resulting in significant space savings. However, this also means we cannot disambiguate the meaning of non-presence as "written default value" or "not written at all". This only applies to scalar fields since only they support default values. Unless otherwise specified, their default is 0.
If you care about the presence of scalars, most languages support
"optional scalars." You can set null
as the default value
in the schema. null
is a value that's outside of all types,
so we will always write if add_field
is called. The
generated field accessor should use the local language's canonical
optional type.
Some FlatBufferBuilder
implementations have an option
called force_defaults
that circumvents this "not writing
defaults" behavior you can then use IsFieldPresent
to query
presence. / Another option that works in all languages is to wrap a
scalar field in a struct. This way it will return null if it is not
present. This will be slightly less ergonomic but structs don't take up
any more space than the scalar they represent.
See [our intermediate representation](@ref intermediate_representation).