IceTray I/O

The dataio modules I3Writer, I3MultiWriter and I3Reader support on-the-fly compression and exclusion of various frame items via regular expressions.

Reading in .i3 files

Make sure you order the files sent to the I3Reader so that the proper GCD frames are supplied prior to P frames.

Usage

Usage is straightforward. Available parameters are listed below (from icetray-inspect dataio):

I3Reader
 Filename (string)
   Description :  Filename to read.  Use either this or Filenamelist, not both.
   Default     :  ""

 FilenameList (vector<string>)
   Description :  List of files to read, *IN SORTED ORDER*
   Default     :  []

 SkipKeys (vector<string>)
   Description :  Don't load frame objects with these keys
   Default     :  []

An example python script:

#!/usr/bin/env python3
import os
import sys

from icecube.icetray import I3Tray
from icecube import dataclasses, dataio

tray = I3Tray()
tray.Add("I3Reader", filename="pass1.i3")
tray.Add("Dump","dump")
tray.Execute()

Compression

The writers will automatically compress the data using one of the following compression algorithms: gzip, bzip2, or zstd. The actual to-be-used algorithm can be specified by giving the file a filename that ends in .gz, .bz2, or .zst, respectively. This:

tray.Add("I3Writer", filename="mystuff.i3.gz")

will get you run-of-the-mill gzip compression. With CompressionLevel you can specify:

GZip Compression Level

Meaning

0

no compression

1

fastest

6

default

9

best compression

so this:

tray.Add("I3Writer", filename="mystuff.i3.gz")

should get you the same result as just writing to disk and then gzipping, and this:

tray.Add("I3Writer", filename="mystuff.i3.gz", compressionlevel=9)

will compress better at the cost of speed.

In addition, I3Reader will transparently read gzip, bzip2, zstd, or xz-compressed .i3 files, or .i3 files inside of compressed tar archives like the PFFilt files packaged and transferred over the satellite link by JADE. Any format that libarchive supports can be read.

General information about the Zstandard compression library can be found in the zstd online documentation, and a more technical documentation in the zstd manual. The documentation starts with “zstd, short for Zstandard, is a fast lossless compression algorithm, targeting real-time compression scenarios at zlib-level and better compression ratios.”.

Other compression strategies can similarly be selected by the filename suffix:

Compression Strategy

I3Writer File Suffix

gzip

.gz

bzip2

.bz2

zstd

.zst

The meanings of bzip2 compression levels are the same as GZip, i.e. 1-9, with 1 the fastest and 9 the best compression. ZSTD compression levels are slightly different:

ZSTD Compression Level

Meaning

0

no compression

1

fastest

4

default

22

best compression

Reading from and writing to remote locations (staging)

Sometimes the files you want to read are not available on your local filesystem. For example, if you’re running on a random node on the Open Science Grid, you will not have direct access to the /data/sim and /data/exp filesystems in Madison. For such situations, dataio has built-in support for “staging” files in and out of local storage. This support is activated by replacing the “I3Reader” module with the dataio.I3Reader tray segment. For example, to read files via GridFTP, use the following snippet:

from icecube import icetray, dataio
tray.Add(dataio.I3Reader, filenamelist=['gsiftp://gridftp.icecube.wisc.edu/data/sim/IceCube/2010/filtered/level3-cscd/CORSIKA-in-ice/9493/92000-92999/Level3_IC79_corsika.009493.092110.i3.bz2 '])

The segment dataio.I3Reader is equivalent to:

tray.context['I3FileStager'] = dataio.get_stagers()
tray.AddModule('I3Reader', **kwargs)

Behind the scenes the stager will recognize supported URL schemes (currently file:// http://, ftp://, gsiftp://, and scp://), download the file to a local directory, and provide the reader with a local filename to read instead. As soon as the file is no longer needed, it will be automatically deleted. The inverse operation works for writing as well. If the staging mechanism has been set up (using either of the snippets above), then I3Writer will also recognize URL schemes, write the output to a temporary file, and upload it to the destination when the file is closed. For example, the following snippet will write an .i3 file to my /data/user directory in Madison from anywhere in the world:

tray.Add('I3Writer', filename='gsiftp://gridftp-users.icecube.wisc.edu/data/user/jvansanten/foo.i3.bz2')

Plain POSIX path names, e.g. “/data/user/jvansanten/foo.i3.bz,” will not be staged, but instead read directly. The hdfwriter.I3HDFWriter() and rootwriter.I3ROOTWriter() segments use the same staging mechanism. Currently, the following stager classes are implemented:

class icecube.dataio.I3FileStagerFile.I3FileStagerFile(blocksize=65536, ssl=None, retry=5)

Handles http://, https://, ftp://, and file:// URLs

Note

A username/password combination may be embedded in http URLs in the format specified in RFC 3986. This should only be used for “dummy” shared passwords like the standard IceCube password.

A ssl parameter is available for ssl configuration.

class icecube.dataio.I3FileStagerFile.GridFTPStager(globus_url_copy='globus-url-copy', options=['-nodcau', '-rst', '-cd'])

Handles ftp:// and gsiftp:// URLs

Note

GridFTP requires that you have a proxy certificate either in the standard location or in the location specified by the environment variable X509_USER_PROXY. See the Globus Toolkit documentation for more information. You will also need to obtain a user certificate :wiki:`Using_GridFTP.

class icecube.dataio.I3FileStagerFile.SCPStager

Handles scp:// URLs

Note

Since there is no way to enter your password, you must have public key authentication set up to use the scp stager. If you try to embed the password in the URL, an error will be raised.

Additional Scenarios

SkipKeys

You can specify that the reader not read (or the writer not write) certain keys (that is, the names they’re stored under) with SkipKeys, which now takes, instead of a space-separated list of strings, a vector of perl-style regular expressions.

so given a frame that looks like this:

Frame: 5/8
Key:   1/59               Type                                      Size (bytes)
DrivingTime               I3Time                                    38
F2kEventHeader            I3EventHeader                             83
F2kHitSel_DummyTrig5      I3Vector<int>                             291
F2kHitSel_DummyTrig6      I3Vector<int>                             291
F2kHitSel_DummyTrig7      I3Vector<int>                             291
F2kHitSel_DummyTrig8      I3Vector<int>                             291
F2kHitSel_FinalHitSel     I3Vector<int>                             171
F2kHitSel_HitSel0         I3Vector<int>                             283
F2kHitSel_HitSel1         I3Vector<int>                             199
F2kHitSel_HitSel2         I3Vector<int>                             171
F2kMCPrimaryTrack00       I3Particle                                152
F2kMCTracks               I3Vector<I3Particle>                      9098
F2kMuonDAQ                I3Map<OMKey, I3AMANDAAnalogReadout>       4242
F2kMuonDAQ_uncalib        I3Map<OMKey, I3AMANDAAnalogReadout>       4242
F2kSoftwareTriggerFlags   I3Vector<std::string>                     78
F2kTrack00                I3Particle                                152
F2kTrack00HitSel          I3Vector<int>                             411
F2kTrack00Params          I3Map<std::string, double>                180
F2kTrack01                I3Particle                                152
F2kTrack01HitSel          I3Vector<int>                             411
F2kTrack01Params          I3Map<std::string, double>                180
F2kTrack02                I3Particle                                152
F2kTrack02HitSel          I3Vector<int>                             411
F2kTrack02Params          I3Map<std::string, double>                180
F2kTrack03                I3Particle                                152
F2kTrack03HitSel          I3Vector<int>                             411
F2kTrack03Params          I3Map<std::string, double>                180
F2kTrack04                I3Particle                                152
F2kTrack04HitSel          I3Vector<int>                             411
F2kTrack04Params          I3Map<std::string, double>                180
F2kTrack05                I3Particle                                152
F2kTrack05HitSel          I3Vector<int>                             411
F2kTrack05Params          I3Map<std::string, double>                180
F2kTrack06                I3Particle                                152
F2kTrack06HitSel          I3Vector<int>                             411
F2kTrack06Params          I3Map<std::string, double>                180
F2kTrack07                I3Particle                                152
F2kTrack07HitSel          I3Vector<int>                             411
F2kTrack07Params          I3Map<std::string, double>                180
F2kTrack08                I3Particle                                152
F2kTrack08HitSel          I3Vector<int>                             411
F2kTrack08Params          I3Map<std::string, double>                180
F2kTrack09                I3Particle                                152
F2kTrack09HitSel          I3Vector<int>                             411
F2kTrack09Params          I3Map<std::string, double>                180
F2kTrack10                I3Particle                                152
F2kTrack10HitSel          I3Vector<int>                             411
F2kTrack10Params          I3Map<std::string, double>                180
F2kTrack11                I3Particle                                152
F2kTrack11HitSel          I3Vector<int>                             411
F2kTrack11Params          I3Map<std::string, double>                43
F2kTrack12                I3Particle                                152
F2kTrack12HitSel          I3Vector<int>                             411
F2kTrack12Params          I3Map<std::string, double>                180
F2kTrack13                I3Particle                                152
F2kTrack13HitSel          I3Vector<int>                             411
F2kTrack13Params          I3Map<std::string, double>                180
F2kTriggers               I3Tree<I3Trigger>                         122

This:

tray.Add("I3Writer",
         filename="mystuff.i3.gz",
         skipkeys=["F2kHitSel_DummyTrig.*"])

Will skip all the f2k dummy triggers.

This:

skipkeys = ["F2kTrack.*HitSel", ".*Bryant"]

Will skip all the f2ktrack hit selection thingys, and anything that ends with “Bryant”. This:

skipkeys = ["F2kTrack.*HitSel", ".*Bryant"]

But note the dot-star in there, these are perl-style regular expressions, not the filesystem-globbing stuff that you use in your shell when doing things like ‘ls .f2k’. To match anything once, (like ? in the shell) use a dot. To match anything any number of times, use dot-star, like F2k.

The syntax is a little different, and they can be both absurdly powerful and, well, simply absurd, if you geek out on them:

skipkeys = ["F2kTrack\d*(([02468]Params)|([13579]HitSel))"]

This, for instance, removes the Params from even numbered tracks and HitSels from odd-numbered tracks. This is the reason for vectors of regular expressions. If you just want to type out every single track name, you certainly can:

skipkeys =  ["DrivingTime",
             "F2kEventHeader",
             "F2kHitSel_DummyTrig5",
             "F2kHitSel_DummyTrig6",
             "F2kHitSel_DummyTrig7",
             "F2kHitSel_DummyTrig8",
             "F2kHitSel_FinalHitSel",
             "F2kHitSel_HitSel0",
             "F2kHitSel_HitSel1",
             "F2kHitSel_HitSel2",
             "F2kMCPrimaryTrack00",
             "F2kMCTracks",
             "F2kMuonDAQ",
             "F2k_all_the_others_etc"
             "F2kMuonDAQ_uncalib",
             "F2kSoftwareTriggerFlags",
             "F2kTrack00",
             "F2kTrack00HitSel",
             "F2kTrack11Params",
             "F2kTrack12",
             "F2kTrack12HitSel",
             "F2kTrack12Params",
             "F2kTrack13",
             "F2kTrack13HitSel",
             "F2kTrack13Params",
             "F2kTriggers"]

will work too.

Dropping Orphan Streams

If a filter module operates only on ‘P’ frames, but an input file contains both ‘Q’ and ‘P’ frames, the output at the end can look like:

QQQQQPQQQQQQQQQPQQQQQQQPQQQQQQPQQQ

There are a lot of left over ‘Q’ frames that we should drop to save space. The easy option to take care of that is DropOrphanStreams:

tray.Add("I3Writer",
         Filename="outfile.i3",
         DropOrphanStreams=[icetray.I3Frame.DAQ])

Writing Multiple Files

The module I3MultiWriter will split the output into multiple data files. The filename argument is actually a printf() type string, not a plain filename. This string must contain a %u formatting character, which will be replaced with the index of the file in the series written. For instance:

tray.Add("I3MultiWriter",
         Filename="foo/myfile-%u.i3.gz",
         SizeLimit=10**6)  # Files of 1MB size: double-star is the exponent operator

will cause the I3MultiWriter to write files foo/myfile-0.i3.gz, foo/myfile-1.i3.gz, foo/myfile-2.i3.gz, etc.

Probably you will want to specify something like

foo/myfile-%04u.i3.gz

where 04 in %04u means that the index number of the file will be left-padded with zeros to a width of 4:

foo/myfile-0000.i3.gz
foo/myfile-0001.i3.gz
foo/myfile-0002.i3.gz

etc. This is so that the files stay in generated order when listed with ls or passed to the I3Reader via glob().

The other necessary parameter is SizeLimit which specifies, in bytes, a soft limit on the size of each file. This is not a hard limit: a file will be closed and the next one opened after a frame write causes the current file size to exceed this limit. The files written will typically exceed this size by the size of one half of one frame. One consequence of this behavior is that you can write one-frame-per-file by specifying a SizeLimit of one byte.

Splitting off the Geometry, Calibration, and DetectorStatus

This is useful in sim production. You use two writers, an I3Writer for geometry, calibration and detector status, and an I3MultiWriter for the physics:

tray.Add("I3Writer","gcdwriter",
         filename="split.gcd.i3",
         streams=["Geometry", "Calibration", "DetectorStatus"])

tray.Add("I3MultiWriter","physwriter",
         filename="split.physics.%04u.i3",
         streams=["Physics"],
         sizelimit=10**5)

The ‘streams’ parameter specifies to each writer which streams they should react to. The I3TrayInfo frames get written to all files. The names of the streams are case-sensitive.

Reading multiple files with glob

To read multiple files use the parameter ‘FilenameList’. To generate the list of files from a directory, you might find the python <code>glob()</code> function convenient:

from glob import glob

file_list = glob("/my/data/\*.i3.gz")
tray.Add("I3Reader", FilenameList=file_list)

as usual with vector<string> parameters, you can pass an array literal:

tray.Add("I3Reader", FilenameList=["file1.i3.gz", "file2.i3.gz", file3.i3.gz"])

The files will be read in order. When then end of one file is reached, the next will be opened.

You may mix compressed (.i3.gz) and noncompressed (.i3) files in any order.

If you specify both a ‘Filename’ and a ‘FilenameList’ the reader service will log_fatal() complaining that the configuration is ambiguous and tell you to use one or the other.

Reading Geometry/Calibration/Status from a separate file

Simulation runs have the Geometry, Calibration, and Detector Status frames in a separate file from the physics. You want to read this GCD file first, and then the rest of them in order.

python’s glob() function can generate the list of physics files for you. Assuming the GCD is in GCD_0340.i3.gz and the associated physics frames are in files physics_0340.00001.i3.gz through, say, physics_0340.00999.i3.gz:

from glob import glob

physics = glob("physics_0340.*.i3.gz")    # glob() the list of files from the disk

physics.sort()                            # sort() them (they probably won't glob in alphabetical order)

tray.Add("I3Reader", FilenameList=["GCD_0340.i3.gz"]+physics)

Examples

There are some example python scripts using dataio in the resources/examples directory.