Working with arbitrary data#
In this tutorial, we demonstrate working with arbitrary, rather than pre-defined, data.
Setup#
[1]:
import copy
import numpy as np
import matplotlib.pyplot as plt
import histlite as hl
import csky as cy
%matplotlib inline
[2]:
cy.plotting.mrichman_mpl()
/mnt/lfs7/user/ssclafani/software/external/csky/csky/plotting.py:92: MatplotlibDeprecationWarning: Support for setting the 'text.latex.preamble' or 'pgf.preamble' rcParam to a list of strings is deprecated since 3.3 and will be removed two minor releases later; set it to a single string instead.
r'\SetSymbolFont{operators} {sans}{OT1}{cmss} {m}{n}'
Get some real data#
[3]:
timer = cy.timing.Timer()
time = timer.time
[6]:
with time('ana setup: original'):
ana = cy.get_analysis(cy.selections.Repository(), 'version-003-p02', cy.selections.PSDataSpecs.ps_2011)
a = ana[0]
data, sig, livetime = a.data, a.sig, a.livetime
Setting up Analysis for:
IC86_2011
Setting up IC86_2011...
Reading /data/ana/analyses/ps_tracks/version-003-p02/IC86_2011_MC.npy ...
Reading /data/ana/analyses/ps_tracks/version-003-p02/IC86_2011_exp.npy ...
Reading /data/ana/analyses/ps_tracks/version-003-p02/GRL/IC86_2011_exp.npy ...
Energy PDF Ratio Model...
* gamma = 4.0000 ...
Signal Acceptance Model...
* gamma = 4.0000 ...
Done.
0:00:14.625254 elapsed.
Generate modified data#
You can choose your workflow here: modify the data and then define a DataSpec
, or include the dataset modifications inside the DataSpec
. I’ll go with the latter, and define a version of IC86 that is magically better in that its angular reconstruction is twice as good. By inheriting from the original, we reuse anything we don’t specifically redefine: in particular, binning settings for PDFs.
[7]:
class IC86_2011_MagicallyBetter(cy.selections.PSDataSpecs.IC86_2011):
"""Better version of IC86-2011"""
def dataset_modifications(self, ds):
"""Improve angres by a factor of two."""
sig = copy.deepcopy(ds.sig)
data = copy.deepcopy(ds.data)
sig['xra'] = .5 * sig.xra
sig['xdec'] = .5 * sig.xdec
# this particular geometric code has a silly (dec,ra) convention, please forgive
sig['dec'], sig['ra'] = cy.coord.rotate_xaxis_to_source(
sig.true_dec, sig.true_ra, sig.xdec, sig.xra, latlon=True)
sigma_floor = np.radians(0.2)
sig['sigma'] = np.maximum(sigma_floor, .5 * sig.sigma)
data['sigma'] = np.maximum(sigma_floor, .5 * data.sigma)
ds.sig, ds.data = sig, data
_path_data = data
_path_sig = sig
_livetime = a.livetime
with time('ana setup: modified'):
ana2 = cy.get_analysis(cy.selections.Repository(), 'version-003-p02', IC86_2011_MagicallyBetter)
Setting up Analysis for:
IC86_2011_MagicallyBetter
Setting up IC86_2011_MagicallyBetter...
Reading /data/ana/analyses/ps_tracks/version-003-p02/GRL/IC86_2011_exp.npy ...
Energy PDF Ratio Model...
* gamma = 4.0000 ...
Signal Acceptance Model...
* gamma = 4.0000 ...
Done.
0:00:12.286926 elapsed.
Test performance#
[8]:
src = cy.sources(0, 0)
tr = cy.get_trial_runner(src=src, ana=ana, mp_cpus=10)
tr2 = cy.get_trial_runner(src=src, ana=ana2, mp_cpus=10)
[9]:
with time('bg 1'):
b = cy.dists.Chi2TSD(tr.get_many_fits(1000, seed=1))
Performing 1000 background trials using 10 cores:
1000/1000 trials complete.
0:00:02.381897 elapsed.
[10]:
sens_kw = dict(beta=0.9, n_sig_step=1, batch_size=500, tol=.03, seed=2)
[11]:
with time('sens 1'):
sens = tr.find_n_sig(b.median(), **sens_kw)
Start time: 2021-07-30 15:13:30.866028
Using 10 cores.
* Starting initial scan for 90% of 50 trials with TS >= 0.052...
n_sig = 1.000 ... frac = 0.74000
n_sig = 2.000 ... frac = 0.88000
n_sig = 3.000 ... frac = 0.90000
* Generating batches of 500 trials...
n_trials | n_inj 0.00 1.20 2.40 3.60 4.80 6.00 | n_sig(relative error)
500 | 45.4% 73.6% 85.4% 95.0% 95.0% 98.2% | 2.895 (+/- 4.3%) [chi2.cdf]
1000 | 47.4% 72.7% 85.8% 94.4% 96.1% 98.2% | 2.928 (+/- 3.3%) [chi2.cdf]
1500 | 48.6% 72.6% 84.9% 93.9% 96.0% 98.2% | 3.009 (+/- 2.4%) [chi2.cdf]
End time: 2021-07-30 15:13:54.452319
Elapsed time: 0:00:23.586291
0:00:23.587761 elapsed.
[12]:
with time('bg 2'):
b2 = cy.dists.Chi2TSD(tr2.get_many_fits(1000, seed=1))
Performing 1000 background trials using 10 cores:
1000/1000 trials complete.
0:00:02.341552 elapsed.
[13]:
with time('sens 2'):
sens2 = tr2.find_n_sig(b2.median(), **sens_kw)
Start time: 2021-07-30 15:14:04.353769
Using 10 cores.
* Starting initial scan for 90% of 50 trials with TS >= 0.022...
n_sig = 1.000 ... frac = 0.80000
n_sig = 2.000 ... frac = 0.90000
* Generating batches of 500 trials...
n_trials | n_inj 0.00 0.80 1.60 2.40 3.20 4.00 | n_sig(relative error)
500 | 50.2% 67.6% 85.0% 87.8% 94.8% 95.8% | 2.454 (+/- 4.9%) [chi2.cdf]
1000 | 51.1% 69.7% 84.0% 89.6% 94.7% 97.0% | 2.358 (+/- 3.2%) [chi2.cdf]
1500 | 49.3% 70.6% 84.1% 89.3% 94.9% 96.7% | 2.345 (+/- 2.6%) [chi2.cdf]
End time: 2021-07-30 15:14:23.955296
Elapsed time: 0:00:19.601527
0:00:19.605097 elapsed.
Do these compare about right?
[14]:
sens['n_sig'] / sens2['n_sig'], np.sqrt(2)
[14]:
(1.2831309784417606, 1.4142135623730951)
I guess I buy it.
[13]:
print(timer)
ana setup: original | 0:00:15.504567
ana setup: modified | 0:00:14.002988
bg 1 | 0:00:02.314820
sens 1 | 0:00:21.714302
bg 2 | 0:00:02.306961
sens 2 | 0:00:50.869679
------------------------------------
total | 0:01:46.713317