The IceCube Portable Binary Archive Format¶
I3Files are actually portable binary archives, modelled after an old Boost archive format. While it was removed from Boost, the archive lives on in IceCube software. This document attempts to explain the format for anyone doing very, very low-level work with I3Files.
Caution
If you don’t really need to know this, run now. Really, we mean it.
History¶
2006:
Decide we like portable binaries better than xml files or Root files.
Troy D. Straszheim creates this portable binary format using the Boost (1.34) third-party portable binary archive.
2009:
Move to Boost 1.38. The STL container serialization changed, so use patches in cmake to bring back the old format for us.
2011:
Claudio Kopper adds support for reading the binary archive in python, mostly to support pickling of I3Frames and I3FrameObjects.
2012:
Nathan Whitehorn decouples IceTray from Boost 1.38 by copying the portable binary archive to IceTray.
2013:
Updates to be compatible with the newest version of Boost (1.52 at the time).
2015:
Boost 1.58 makes large changes to the internals, rendering our patches invalid. A new strategy is needed. For now, ban all boost versions >= 1.58.
2016:
The new strategy is to fork boost::serialization at 1.57 and embed this into all software. Not ideal, but a good patch.
The Rules¶
First, some information on the size of various header objects:
Name
Type
Size (bytes)
tracking
uint8_t
1
version
uint8_t
1
class_id
int16_t
2
class_id_reference
int16_t
2
class_id_optional
NONE
0
class_name
NONE
0
object_id
uint32_t
4
object_reference
uint32_t
4
collection_size
uint32_t
4
Note that everything is saved as little endian. A converter is used on big endian platforms to achieve this portability. For more info on endianness, read the wikipedia article.
And now some pseudo-code for serialization (writing to binary format) and deserialization (reading from binary format).
Serialization¶
of objects¶
if serialized through pointer:
save_data
else:
if class_info and !initialized:
class_id_optional
tracking
version
if !tracking:
save_data
elif new object:
object_id
save_data
else:
object_reference(object_id)
of pointers¶
if !initialized:
class_id
if unregistered_class and is_polymorphic:
class_name
if class_info:
tracking
version
else:
class_id_reference(class_id)
if !tracking:
save_object_ptr
elif new_object:
object_id
save_object_ptr
else:
object_reference(object_id)
Deserialization¶
of objects¶
if serialized through pointer:
save_data
else:
if !initialized:
if class_info
class_id_optional
tracking
version
if tracking:
object_id or object_reference
if !is_object_reference:
load_data
of pointers¶
class_id
if is_null_pointer:
return
if is_abstract or is_polymorphic:
class_name
if !initialized:
if class_info:
class_id_optional
tracking
version
if tracking:
object_id or object_reference
if !is_object_reference:
load_object_ptr
I3FrameObject¶
The basic building blocks in IceCube software are made up of I3FrameObjects. Thus, this is where serialization begins.
I3FrameObject is an abstract class used to hold things that will go into an I3Frame. This requires that the object be serializable in order to be saved to file. See Making a class serializable for more details about how to create such objects, but just know that I3FrameObject is the base class.
The basic structure of the serialized blob is like such:
(your class) - 4 bytes
class_id - 2 bytes
tracking_id - 1 byte
version_id - 1 byte
I3FrameObject base class - 8 bytes
class_id - 2 bytes
tracking_id - 1 byte
version_id - 1 byte
object_id - 4 bytes
(serialized data for your class)
A very common binary blob header is 0x010000000000010001000000. This is because the I3FrameObject always has tracking enabled, version 0, and is the 0th class and 1st object serialized.
Tip
The easiest way to obtain the binary blob in python is:
frame['MyObject'].__getstate__()[1]
or, if you just have the object directly:
my_object.__getstate__()[1]
Note that this only works on objects that use the dataclass_suite for pybindings.
Internal Objects¶
If your I3FrameObject is not just a primitive (bool, int, double, string), but is instead composed of other objects, those need to be serializable as well.
There are two main cases, tracking vs non-tracking. In the tracking case, objects will have a header that will vary depending on if this is the first time the type is seen.
Tracking - First Occurrence¶
Object Header - 8 bytes
class_id - 2 bytes
tracking_id - 1 byte
version_id - 1 byte
object_id - 4 bytes
(serialized data)
Tracking - Second (or later) Occurrence¶
Object Header - 4 bytes
object_id - 4 bytes
(serialized data)
In the no tracking case, the class just needs to be registered the first time, and all other occurrences are strictly the serialized data of the object.
No Tracking - First Occurrence¶
Object Header - 2 bytes
class_id - 2 bytes
(serialized data)
No Tracking - Second (or later) Occurrence¶
(serialized data)
Standard Library Containers¶
A vector needs some special consideration because of optimizations based on contents. Lists and maps will follow the un-optimized approach.
Vector Optimization¶
If the vector contents are a primitive datatype, the array inside the vector can be serialized directly from memory:
Count - 4 bytes
memory copy of array - (sizeof(type) * count)
Non-Optimized Container¶
Count - 4 bytes
for each object in the container:
(serialized object)
Maps serialize each object as a std::pair of key,value.
I3Frame¶
An I3Frame is basically a map of frame object names to serialized I3FrameObjects.
A detailed format is:
I3 tag ‘[i3]’ - 4 bytes (0x5b69335d)
version - 4 bytes
stream - 3 bytes
tracking_id - 1 byte
version_id - 1 byte
value - 1 byte
size - 4 bytes
for each frame object:
key - [string]
type_name - [string]
buf - [serialized I3FrameObject]
checksum - 4 bytes
Note
The checksum is currently a crc32 checksum with the following bytes going into it in this order:
stream value
size
for each frame object:
key
type_name
buf
I3File¶
I3Files are now fairly straightforward - just a bunch of I3Frames. Since the archive format is a stream of binary data, I3Files are just one serialized I3Frame after another.
This creates a few interesting effects. First, and most negatively, there is no header for seeking directly to the Nth frame. (this is a much desired feature if you want to implement it :) On the positive side, I3Files can be any file-like object, including pipes or network sockets. This makes it easy to do live streaming or directly writing to disk as processing happens, without storing the whole file in memory.
Serialization Examples¶
Now that we have some theoretical knowledge, let’s go through some examples of how different objects are serialized.
I3Int¶
Say we have an I3Int(10). The serialization is:
0x0100000000000100010000000a000000
We break this up into pieces:
I3Int - 4 bytes (0x01000000)
class_id - 2 bytes = 1
tracking_id - 1 byte = 0
version_id - 1 byte = 0
I3FrameObject base class - 8 bytes (0x0000010001000000)
class_id - 2 bytes = 0
tracking_id - 1 byte = 1
version_id - 1 byte = 0
object_id - 4 bytes = 1
The actual int - 4 bytes (0x0a000000) = 10
I3Double¶
Say we have an I3Double(3.14159). The serialization is:
0x0100000000000100010000006e861bf0f9210940
We break this up into pieces:
I3Double - 4 bytes (0x01000000)
class_id - 2 bytes = 1
tracking_id - 1 byte = 0
version_id - 1 byte = 0
I3FrameObject base class - 8 bytes (0x0000010001000000)
class_id - 2 bytes = 0
tracking_id - 1 byte = 1
version_id - 1 byte = 0
object_id - 4 bytes = 1
The actual double - 8 bytes (0x6e861bf0f9210940) = 3.14159
I3String¶
Say we have an I3String(‘testing’). The serialization is:
0x01000000000001000100000000700000074657374696367
We break this up into pieces:
I3String - 4 bytes (0x01000000)
class_id - 2 bytes = 1
tracking_id - 1 byte = 0
version_id - 1 byte = 0
I3FrameObject base class - 8 bytes (0x0000010001000000)
class_id - 2 bytes = 0
tracking_id - 1 byte = 1
version_id - 1 byte = 0
object_id - 4 bytes = 1
The string length - 4 bytes (0x07000000) = 7
The actual string - 7 bytes (0x74657374696367) = ‘testing’
OMKey¶
Say we have an OMKey(25,45,0). The serialization is:
0x01010000000000190000002d00000000
Note that this is a little different than normal because the OMKey is not meant to be directly added to the frame. The object is not an I3FrameObject, and serializes differently to reflect that.
We break this up into pieces:
OMKey - 6 bytes (0x010100000000)
tracking_id - 1 byte = 1
version_id - 1 byte = 1
object_id - 4 bytes = 0
string number - 4 bytes (0x19000000) = 25
OM number - 4 bytes (0x2d000000) = 45
PMT number - 1 byte (0x00) = 0
I3VectorOMKey¶
Say we have an I3VectorOMKey() with [OMKey(35,56,0),OMKey(25,45,0)]. The serialization is:
0x01000000000001000000000002000000010102000000230000003800
0x00000003000000190000002d00000000
We break this up into pieces:
I3VectorOMKey - 4 bytes (0x01000000)
class_id - 2 bytes = 1
tracking_id - 1 byte = 0
version_id - 1 byte = 0
I3FrameObject base class - 8 bytes (0x0000010001000000)
class_id - 2 bytes = 0
tracking_id - 1 byte = 1
version_id - 1 byte = 0
object_id - 4 bytes = 1
Vector base class - 2 bytes (0x0000)
tracking_id - 1 byte = 0
version_id - 1 byte = 0
Count - 4 bytes (0x02000000) = 2
OMKey - 6 bytes (0x0101020000)
tracking_id = 1
version_id = 1
object_id = 2
string number - 4 bytes (0x23000000) = 35
OM number - 4 bytes (0x38000000) = 56
PMT number - 1 byte (0x00) = 0
OMKey - 4 bytes (0x030000)
object_id = 3
string number - 4 bytes (0x19000000) = 25
OM number - 4 bytes (0x2d000000) = 45
PMT number - 1 byte (0x00) = 0
[To Do: fill in more examples]