Working with tableio .root files¶
File Structure¶
Rootwriter creates a TTree
for each key you book with
tableio. The fields created by the I3Converter
will be branches of this tree. All trees will be stored inside a single
output file along with a master tree, which all other trees are
friends of. The master tree is usually called MasterTree
, but that
name can be configured (see the documentation of I3ROOTTableService
).
All data can be accessed via the master tree.
Thus, if you open a tableio-created .root file:
$ root -l output.root
you may find the following contents:
root [1] .ls
TFile** output.root
TFile* output.root
KEY: TTree IceTopHLCVEMPulsesInfo;1 IceTopHLCVEMPulsesInfo
KEY: TTree IceTopSLCVEMPulses;1 IceTopSLCVEMPulses
KEY: TTree MCPrimary;1 MCPrimary
KEY: TTree MCPrimaryInfo;1 MCPrimaryInfo
KEY: TTree CleanedTankPulses;1 CleanedTankPulses
KEY: TTree CleanedTankPulsesInfo;1 CleanedTankPulsesInfo
KEY: TTree IceTopHLCVEMPulses;1 IceTopHLCVEMPulses
KEY: TTree ShowerCOG;1 ShowerCOG
KEY: TTree ShowerCombined;1 ShowerCombined
KEY: TTree ShowerCombinedInfo;1 ShowerCombinedInfo
KEY: TTree ShowerCombinedParams;1 ShowerCombinedParams
KEY: TTree ShowerPlane;1 ShowerPlane
KEY: TTree ShowerPlaneParams;1 ShowerPlaneParams
KEY: TTree MasterTree;1 MasterTree
As mentioned above, all data can be accessed via the MasterTree
. You can
check this by having a look at the tree viewer:
root [2] MasterTree->StartViewer()
The window that opens might look like that:
As you can see, all the other trees show up as if they were branches of
MasterTree
. The quickest way to create a plot is simply double-clicking
a branch in this viewer.
Tree Structure¶
Rootwriter TTree
s are completely flat. Thus, each
branch contains only one leaf. However, branches can be arrays, either of fixed
length (e.g. for filter masks: condition_passed
and prescale_passed
) or
variable length (e.g. when booking I3RecoPulseSeriesMap
s).
Each tree contains at least six branches:
UInt_t Run
- the run number,UInt_t Event
- the event number,UInt_t SubEvent
- the sub-event number,Int_t SubEventStream
- the ID of the splitter module that made this stream,Char_t SubEventStream_String[]
- the name of the splitter module that made this stream as a string,Bool_t exists
- set to false, if the corresponding object did not exist in the frame.
In order to align the trees every tree contains one line for each event.
Therefore it is important to always check the value of the branch called
exists
.
If the object stored in the tree is an array like structure (like e.g. an
I3RecoPulseSeriesMap
) the data will be stored in variable-length
arrays and an additional branch is added to the tree
ULong64_t Count_<tree_name>
- the number of items in the current event.
In case of fixed-length arrays the length is not stored anywhere in the root
file, but it is always the same as defined by the I3Converter
.
Note
I3FilterResultMapConverter
creates one branch for each filter.
Each of these branches is an array of two bool
s. The first one
for the condition_passed
flag, the second one for prescale_passed
.
In some cases, fixed and variable-length arrays are combined. For instance,
when booking ATWD waveforms, a branch of type double[Count_<tree_name>][128]
will be created. Each entry will be an array of 128 double
s.
Note
To workaround issues with ROOT’s interpretation of branch types,
rootwriter will replace all arrays (not single values) of
type char
or unsigned char
with arrays of
int16_t
or uint16_t
, respectively. Thus, the tree
structure might differ from what one would expect from the converter
implementation.
Besides opening a TreeViewer you can also use TTree::Print()
to get
information about the structure of a tree and the stored variable types:
root [3] IceTopHLCVEMPulses->Print();
******************************************************************************
*Tree :IceTopHLCVEMPulses: IceTopHLCVEMPulses *
*Entries : 348915 : Total = 417028258 bytes File Size = 89738443 *
* : : Tree compression factor = 4.64 *
******************************************************************************
*Br 0 :Count_IceTopHLCVEMPulses : *
* | ULong64_t Number of objects in each field *
*Entries : 348915 : Total Size= 2828723 bytes File Size = 414516 *
*Baskets : 35 : Basket Size= 386560 bytes Compression= 6.74 *
*............................................................................*
*Br 1 :Run : UInt_t run number *
*Entries : 348915 : Total Size= 18123225 bytes File Size = 827981 *
*Baskets : 202 : Basket Size= 2502144 bytes Compression= 21.85 *
*............................................................................*
*Br 2 :Event : UInt_t event number *
*Entries : 348915 : Total Size= 18123641 bytes File Size = 2239237 *
*Baskets : 202 : Basket Size= 2502144 bytes Compression= 8.08 *
*............................................................................*
*Br 3 :SubEvent : UInt_t sub-event number *
*Entries : 348915 : Total Size= 18124266 bytes File Size = 875798 *
*Baskets : 202 : Basket Size= 2502144 bytes Compression= 20.66 *
*............................................................................*
// etc
Note
MasterTree->Print()
will only print the structure of MasterTree
,
which is probably not what you want. You will have to call
Print()
on each tree.
The tableio description field is stored in the branch titles. You can retrieve the description of an individual branch as follows:
root [4] IceTopHLCVEMPulses->GetBranch("Event")->GetTitle()
(const char* 0x255e5d8)"event number"
These descriptions are provided by the individual converters and are the same as those stored in the hdf header. Unfortunately, ROOT trees do not have a field where the tableio unit field can be stored.
Using C++¶
Of course, the easiest way to create a plot from a tableio root file
is using the Draw()
method, for instance:
MasterTree->Draw("IceTopHLCVEMPulses.charge");
This will fill all pulses in all events into a histogram.
However, there are more complicated cases, where Draw()
and simple
cuts are insufficient and you might have to actually loop over the tree
by hand.
To do this, assign a variable to the branches you want to inspect and call
TTree::GetEntry()
inside a loop. For example:
double energy;
MasterTree->SetBranchAddress("MCPrimary.energy", &energy);
for (Long64_t evt = 0; evt < MasterTree->GetEntries(); ++evt) {
MasterTree->GetEntry(evt);
// do something with energy
}
For multi-row tables you will need sufficiently large arrays and make use of
Count_<tree_name>
:
ULong64_t nPulses;
MasterTree->SetBranchAddress("IceTopHLCVEMPulses.Count_IceTopHLCVEMPulses", &nPulses);
double charge[MAX_PULSES]; // set MAX_PULSES to a number large enough for any event you might encounter
MasterTree->SetBranchAddress("IceTopHLCVEMPulses.charge", charge); // no & before charge!
for (Long64_t evt = 0; evt < MasterTree->GetEntries(); ++evt) {
MasterTree->GetEntry(evt);
for (int i = 0; i < nPulses; ++i) {
// do something with charge[i]
}
}
Warning
You might be tempted to simplify this task using
TTree::MakeClass()
. However, this can lead to undesired behaviour
for multi-row tables. TTree::MakeClass()
will allocate arrays for
these tables whose length is inferred from the longest array occurring in
the file used when running TTree::MakeClass()
. If you then use
the resulting code to read other files, longer arrays can lead to crashes
through segmentation faults.
When using this approach you need to know the types of branches created by
converters. The easiest way to find out (besides checking the code) is
TTree::Print()
as described above.
Using Python¶
PyROOT offers several ways to read data from root trees. In order to work
with pyROOT simply import the ROOT
module and open a file:
import ROOT
f = ROOT.TFile('output.root')
tree = f.MasterTree
Just like in C++ you can then use TTree.Draw()
to create histograms:
tree.Draw("log10(MCPrimary.energy)")
Looping over trees is a little simpler in python. All branches of a tree can be
access as attributes of the tree. TTree
s are iterable just as
branches that contain arrays. PyROOT will automatically make sure that all
loops are stopped in time. You do not have to care about the
Count_<tree_name>
branch.
An example:
for event in f.IceTopHLCVEMPulses:
for charge in event.charge:
# do something with charge
Unfortunately, this is relatively slow.
Note
While branches are accessible as attributes of a tree, friend trees are not.
Thus code like MasterTree.IceTopHLCVEMPulses.charge
will not work.
A faster way to loop over trees is to assign variables to branches just as in
C++. In Python this is only possible using array.array
or
numpy.array
because you have to pass a pointer to a fixed-type
variable to TTree.SetBranchAddress()
.
Here is a simple example using numpy.array
s:
import ROOT
import numpy as n
f=ROOT.TFile('output.root')
t=f.MasterTree
energy = n.array([0], dtype=n.double)
t.SetBranchAddress('MCPrimary.energy', energy)
for evt in range(t.GetEntries()):
t.GetEntry(evt)
# do something with energy[0], e.g.
print energy[0]
In the same way you can also read arrays stored in trees. However, you will have to know in advance how many entries you will expect. Let’s add to the example above:
count = n.array([0], dtype=n.uint64)
MAX_PULSES=324 # for IceTop this is enough
charge = n.zeros(MAX_PULSES, dtype=n.double)
t.SetBranchAddress('IceTopHLCVEMPulses.Count_IceTopHLCVEMPulses', count)
t.SetBranchAddress('IceTopHLCVEMPulses.charge', charge)
for evt in range(t.GetEntries()):
t.GetEntry(evt)
for q in range(count[0]):
# do something with charge, e.g.
print charge[q]
The obvious disadvantage of this way is that you have to write just as much code as in C++.
A note on FilterResultMaps¶
I3FilterResultMapConverter
creates a branch containing an array
of two bool
s for each filter. The first of these two
bool
s stores the condition_passed
flag, the second one the
prescale_passed
flag. So to determine the rate of a filter (e.g. for
comparison with monitoring data) you have to look at only one of them, namely
the second one, for example:
MasterTree->Draw("FilterMask.IceTopSTA3_11[1]")