Scaling up: analysis on a cluster#

In this tutorial, we’ll offer some hints on how to scale up an analysis to a cluster-based workflow.

Introduction#

In previous tutorials, we’ve demonstrated how to perform various types of analysis using csky. However, all of these have been done the “quick and dirty” way. We haven’t explored how csky supports “high performance computing”, i.e. performing large batches of trials on a cluster.

Here we will be assuming that the user already knows how to perform individual tasks on their cluster(s) of choice, as well as how to work with scripts as opposed to notebooks more generally. The snippets found in this tutorial will be building blocks that can be assembled into per-task scripts, or separate commands within a single-script interface based on e.g. the click or fire modules. We take a conventional time-integrated all-sky scan as our testing scenario.

Setup#

Most scripts will start similarly to our other tutorial notebooks. Here we’ll set up some imports, and load up an Analysis. Since we aren’t actually going to use the cluster, we’ll keep the computational load down by working with just IC86-2011.

[1]:
# for cluster jobs, you may want to start by
# disabling interactive matplotlib graphics:
# import matplotlib
# matplotlib.use('Agg')

import numpy as np
import matplotlib.pyplot as plt
import os

import histlite as hl
import csky as cy

# set up timer
timer = cy.timing.Timer()
time = timer.time

# set up Analysis
ana_dir = cy.utils.ensure_dir('/data/user/mrichman/csky_cache/ana')
repo = cy.selections.repo
with time('ana setup'):
    ana = cy.get_analysis(repo, 'version-003-p02', cy.selections.PSDataSpecs.ps_2011, dir=ana_dir)
cy.CONF['ana'] = ana
Setting up Analysis for:
IC86_2011
Setting up IC86_2011...
Reading /data/ana/analyses/ps_tracks/version-003-p02/IC86_2011_MC.npy ...
Reading /data/ana/analyses/ps_tracks/version-003-p02/IC86_2011_exp.npy ...
Reading /data/ana/analyses/ps_tracks/version-003-p02/GRL/IC86_2011_exp.npy ...
<- /data/user/mrichman/csky_cache/ana/IC86_2011.subanalysis.version-003-p02.npy
Done.

0:00:01.503862 elapsed.

We will create a directory in which to store trials:

[2]:
trials_dir = cy.utils.ensure_dir('./trials/IC86_2011')

Since this tutorial is a notebook, we will make use of interactive plotting:

[3]:
%matplotlib inline
[4]:
cy.plotting.mrichman_mpl()

By the way: have a matplotlib configuration you’d like to share? Feel free to add it to plotting.py!

Finally, since we’re not distributing this tutorial’s calculations over a cluster, but rather will perform them sequentially on a cobalt, we will enable multiprocessing. For real cluster submissions, it’s best to split up the work in a way that does not rely on within-job multiprocessing.

[5]:
cy.CONF['mp_cpus'] = 10

Computing background trials#

IceCube backgrounds are generally declination-dependent. Thus the first step towards an all-sky scan is to run background trials on a grid of declinations. A real analysis might space this grid in increments of 1° or 2°, but here we will use a 10° spacing just for demonstration.

[6]:
dec_degs = np.r_[-85:85.1:10]
[7]:
decs = np.radians(dec_degs)
sindecs = np.sin(decs)

The grid therefore looks as follows in \(\sin(\delta)\):

[8]:
fig, ax = plt.subplots(figsize=(5, .5))
ax.plot(sindecs, np.zeros_like(sindecs), '.-')
ax.set_yticks([])
ax.set_xlabel(r'$\sin(\delta)$');
_images/05_scaling_up_20_0.png

Our function for computing background trials might look as follows:

[9]:
# we'll save, and later load, trials from this dir
bg_dir = cy.utils.ensure_dir('{}/bg'.format(trials_dir))

def do_background_trials(dec_deg, N=1000, seed=0):
    # get trial runner
    dec = np.radians(dec_deg)
    src = cy.sources(0, dec)
    tr = cy.get_trial_runner(src=src)
    # run trials
    trials = tr.get_many_fits(N, seed=seed, logging=False)
    # save to disk
    dir = cy.utils.ensure_dir('{}/dec/{:+04.0f}'.format(bg_dir, dec_deg))
    filename = '{}/trials__N_{:06d}_seed_{:04d}.npy'.format(dir, N, seed)
    print('->', filename)
    # notice: trials.as_array is a numpy structured array, not a cy.utils.Arrays
    np.save(filename, trials.as_array)

First we test this function out:

[10]:
do_background_trials(5)
-> ./trials/IC86_2011/bg/dec/+005/trials__N_001000_seed_0000.npy

It seems to work, so let’s clean up the test file and then do just enough trials to get a clue. This is where you want to submit cluster jobs to call do_background_trials!

[11]:
os.remove('./trials/IC86_2011/bg/dec/+005/trials__N_001000_seed_0000.npy')
[12]:
# just enough to demonstrate the data handling mechanisms we will be using
n_jobs = 2
N = 1000

# in real life you should keep track of compute time within your jobs
with time('bg trials'):
    for dec_deg in dec_degs:
        for seed in range(n_jobs):
            do_background_trials(dec_deg, N=N, seed=seed)
-> ./trials/IC86_2011/bg/dec/-085/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/-085/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/-075/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/-075/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/-065/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/-065/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/-055/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/-055/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/-045/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/-045/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/-035/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/-035/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/-025/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/-025/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/-015/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/-015/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/-005/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/-005/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/+005/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/+005/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/+015/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/+015/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/+025/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/+025/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/+035/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/+035/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/+045/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/+045/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/+055/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/+055/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/+065/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/+065/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/+075/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/+075/trials__N_001000_seed_0001.npy
-> ./trials/IC86_2011/bg/dec/+085/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/bg/dec/+085/trials__N_001000_seed_0001.npy

0:00:57.027624 elapsed.

Computing signal trials#

Signal trials are computed in similar fashion. However, now we want to keep track of the signal strength in addition to the declination.

[13]:
# we'll save, and later load, trials from this dir
sig_dir = cy.utils.ensure_dir('{}/sig'.format(trials_dir))

def do_signal_trials(dec_deg, n_sig, N=1000, seed=0):
    # get trial runner
    dec = np.radians(dec_deg)
    src = cy.sources(0, dec)
    tr = cy.get_trial_runner(src=src)
    # run trials
    trials = tr.get_many_fits(N, n_sig, poisson=True, seed=seed, logging=False)
    # save to disk
    dir = cy.utils.ensure_dir('{}/dec/{:+04.0f}/nsig/{:05.1f}'.format(sig_dir, dec_deg, n_sig))
    filename = '{}/trials__N_{:06d}_seed_{:04d}.npy'.format(dir, N, seed)
    print('->', filename)
    # notice: trials.as_array is a numpy structured array, not a cy.utils.Arrays
    np.save(filename, trials.as_array)

Once again, we start with a quick test:

[14]:
do_signal_trials(5, 10)
-> ./trials/IC86_2011/sig/dec/+005/nsig/010.0/trials__N_001000_seed_0000.npy

Then we clean up the test file and “submit” the “jobs”:

[15]:
os.remove('./trials/IC86_2011/sig/dec/+005/nsig/010.0/trials__N_001000_seed_0000.npy')

(skip ahead to working with background trials)

[16]:
# just enough to demonstrate the data handling mechanisms we will be using
N = 1000
n_jobs = 1
n_sigs = np.r_[2:10:2, 10:30.1:4]

with time('signal trials'):
    for dec_deg in dec_degs:
        for n_sig in n_sigs:
            for seed in range(n_jobs):
                do_signal_trials(dec_deg, n_sig, N=N, seed=seed)
-> ./trials/IC86_2011/sig/dec/-085/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-085/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-085/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-085/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-085/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-085/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-085/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-085/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-085/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-085/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-075/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-065/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-055/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-045/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-035/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-025/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-015/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/-005/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+005/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+015/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+025/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+035/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+045/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+055/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+065/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+075/nsig/030.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/002.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/004.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/006.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/008.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/010.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/014.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/018.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/022.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/026.0/trials__N_001000_seed_0000.npy
-> ./trials/IC86_2011/sig/dec/+085/nsig/030.0/trials__N_001000_seed_0000.npy

0:07:15.878014 elapsed.

Working with background trials#

Now that our trials are on disk, we can load them up to inspect the background properties and to compute the sensitivity and discovery potential. First, you should confirm in your shell that the saved files look as expected. Here we’ll pipe through head but in real life you might want to pipe through less or vim -R - instead.

[17]:
!find trials/IC86_2011/bg | sort | head -n20
trials/IC86_2011/bg
trials/IC86_2011/bg/dec
trials/IC86_2011/bg/dec/-005
trials/IC86_2011/bg/dec/+005
trials/IC86_2011/bg/dec/-005/trials__N_001000_seed_0000.npy
trials/IC86_2011/bg/dec/+005/trials__N_001000_seed_0000.npy
trials/IC86_2011/bg/dec/-005/trials__N_001000_seed_0001.npy
trials/IC86_2011/bg/dec/+005/trials__N_001000_seed_0001.npy
trials/IC86_2011/bg/dec/-015
trials/IC86_2011/bg/dec/+015
trials/IC86_2011/bg/dec/-015/trials__N_001000_seed_0000.npy
trials/IC86_2011/bg/dec/+015/trials__N_001000_seed_0000.npy
trials/IC86_2011/bg/dec/-015/trials__N_001000_seed_0001.npy
trials/IC86_2011/bg/dec/+015/trials__N_001000_seed_0001.npy
trials/IC86_2011/bg/dec/-025
trials/IC86_2011/bg/dec/+025
trials/IC86_2011/bg/dec/-025/trials__N_001000_seed_0000.npy
trials/IC86_2011/bg/dec/+025/trials__N_001000_seed_0000.npy
trials/IC86_2011/bg/dec/-025/trials__N_001000_seed_0001.npy
trials/IC86_2011/bg/dec/+025/trials__N_001000_seed_0001.npy

csky provides the cy.bk module for book keeping. First let’s load the background trials:

[18]:
def ndarray_to_Chi2TSD(trials):
    return cy.dists.Chi2TSD(cy.utils.Arrays(trials))
[19]:
with time('load bg trials'):
    bg = cy.bk.get_all(
        # disk location
        '{}/dec'.format(bg_dir),
        # filename pattern
        'trials*npy',
        # how to combine items within each directory
        merge=np.concatenate,
        # what to do with items after merge
        post_convert=ndarray_to_Chi2TSD)
36 files loaded.

0:00:02.611293 elapsed.

The result is that now bg is a dict containing the background distributions for each declination:

[20]:
bg
[20]:
{5: Chi2TSD(2000 trials, eta=0.582, ndof=1.038, median=0.041 (from fit 0.046)),
 15: Chi2TSD(2000 trials, eta=0.602, ndof=1.116, median=0.086 (from fit 0.077)),
 25: Chi2TSD(2000 trials, eta=0.601, ndof=1.074, median=0.070 (from fit 0.066)),
 35: Chi2TSD(2000 trials, eta=0.514, ndof=1.118, median=0.003 (from fit 0.003)),
 45: Chi2TSD(2000 trials, eta=0.498, ndof=1.037, median=0.000 (from fit 0.000)),
 55: Chi2TSD(2000 trials, eta=0.498, ndof=1.071, median=0.000 (from fit 0.000)),
 65: Chi2TSD(2000 trials, eta=0.593, ndof=0.990, median=0.059 (from fit 0.059)),
 75: Chi2TSD(2000 trials, eta=0.384, ndof=1.043, median=0.000 (from fit 0.000)),
 85: Chi2TSD(2000 trials, eta=0.298, ndof=0.721, median=0.000 (from fit 0.000)),
 -5: Chi2TSD(2000 trials, eta=0.492, ndof=1.085, median=0.000 (from fit 0.000)),
 -15: Chi2TSD(2000 trials, eta=0.450, ndof=1.123, median=0.000 (from fit 0.000)),
 -25: Chi2TSD(2000 trials, eta=0.514, ndof=1.058, median=0.002 (from fit 0.002)),
 -35: Chi2TSD(2000 trials, eta=0.472, ndof=1.090, median=0.000 (from fit 0.000)),
 -45: Chi2TSD(2000 trials, eta=0.513, ndof=0.996, median=0.001 (from fit 0.001)),
 -55: Chi2TSD(2000 trials, eta=0.532, ndof=1.238, median=0.017 (from fit 0.018)),
 -65: Chi2TSD(2000 trials, eta=0.537, ndof=1.138, median=0.016 (from fit 0.015)),
 -75: Chi2TSD(2000 trials, eta=0.537, ndof=1.094, median=0.011 (from fit 0.014)),
 -85: Chi2TSD(2000 trials, eta=0.240, ndof=0.896, median=0.000 (from fit 0.000))}

We can plot all of these out on a big grid:

[21]:
# just choose enough nrow x ncol to fit the whole grid
nrow, ncol = 5, 4
fig, aaxs = plt.subplots(nrow, ncol, figsize=(12,12), sharex=True, sharey=True, dpi=200)
axs = np.ravel(aaxs)
# keep track of which ax's we already used
used_axs = []
for (i, dec_deg) in enumerate(dec_degs):
    ax = axs[i]
    # plot histogram
    b = bg[dec_deg]
    h = b.get_hist(bins=40, range=(0, 20))
    hl.plot1d(ax, h, crosses=True)
    # plot chi2 fit to nonzero values
    norm = h.integrate().values
    ts = np.linspace(.1, h.range[0][1], 100)
    ax.plot(ts, norm * b.pdf(ts))
    # set limits and label dec
    ax.semilogy(nonposy='clip')
    ax.set_ylim(.3, 3e3)
    ax.text(20, 1e3, r'$\delta={:.0f}$'.format(dec_deg), ha='right', va='center')
    used_axs.append(ax)
# hide unused ax's
for ax in axs:
    if ax not in used_axs:
        ax.set_visible(False)
# add x and y labels
for ax in aaxs[-1]:
    if ax in used_axs:
        ax.set_xlabel(r'$\sin(\delta)$')
for ax in aaxs[:,0]:
    ax.set_ylabel(r'trials per bin')
plt.tight_layout()
_images/05_scaling_up_46_0.png

We can also plot a summary of the \(\chi^2\) fit parameters:

[22]:
etas = np.array([bg[d].eta for d in dec_degs])
ndofs = np.array([bg[d].ndof for d in dec_degs])
locs = np.array([bg[d].loc for d in dec_degs])
scales = np.array([bg[d].scale for d in dec_degs])
[23]:
fig, ax = plt.subplots()
ax.plot(sindecs, etas, label=r'$\eta$')
ax.plot(sindecs, ndofs, label='ndof')
ax.plot(sindecs, scales, label='scale')
ax.plot(sindecs, locs, label='loc')
ax.legend(ncol=2)
ax.set_xlabel(r'$\sin(\delta)$')
ax.set_ylabel(r'value')
plt.tight_layout()
_images/05_scaling_up_49_0.png

At this point in a real analysis, you will want to decide whether to go back and perform more trials at these or additional declinations. While the \(\chi^2\) fits above are generally decent, we haven’t really done enough trials to pin down the discovery potential accurately (or perhaps even the sensitivity).

Since this is just a tutorial, we press on.

Working with signal trials: sensitivity and discovery potential#

We briefly check the signal trial structure on disk:

[24]:
!find trials/IC86_2011/sig | sort | head -n20
trials/IC86_2011/sig
trials/IC86_2011/sig/dec
trials/IC86_2011/sig/dec/-005
trials/IC86_2011/sig/dec/+005
trials/IC86_2011/sig/dec/-005/nsig
trials/IC86_2011/sig/dec/+005/nsig
trials/IC86_2011/sig/dec/-005/nsig/002.0
trials/IC86_2011/sig/dec/+005/nsig/002.0
trials/IC86_2011/sig/dec/-005/nsig/002.0/trials__N_001000_seed_0000.npy
trials/IC86_2011/sig/dec/+005/nsig/002.0/trials__N_001000_seed_0000.npy
trials/IC86_2011/sig/dec/-005/nsig/004.0
trials/IC86_2011/sig/dec/+005/nsig/004.0
trials/IC86_2011/sig/dec/-005/nsig/004.0/trials__N_001000_seed_0000.npy
trials/IC86_2011/sig/dec/+005/nsig/004.0/trials__N_001000_seed_0000.npy
trials/IC86_2011/sig/dec/-005/nsig/006.0
trials/IC86_2011/sig/dec/+005/nsig/006.0
trials/IC86_2011/sig/dec/-005/nsig/006.0/trials__N_001000_seed_0000.npy
trials/IC86_2011/sig/dec/+005/nsig/006.0/trials__N_001000_seed_0000.npy
trials/IC86_2011/sig/dec/-005/nsig/008.0
trials/IC86_2011/sig/dec/+005/nsig/008.0
sort: write failed: standard output: Broken pipe
sort: write error

We load the signal trials in similar fashion:

[25]:
with time('load sig trials'):
    sig = cy.bk.get_all(sig_dir, 'trials*npy', merge=np.concatenate, post_convert=cy.utils.Arrays)
180 files loaded.

0:00:00.832596 elapsed.

Now we have a deeply nested dict. To get to one specific batch of trials, we need to descend through the layers:

[26]:
sig['dec'][5]['nsig'][10]
[26]:
Arrays(1000 items | columns: gamma, ns, ts)

csky provides a function for doing this conveniently, including fuzzy matching for numerical values:

[27]:
cy.bk.get_best(sig, 'dec', 5, 'nsig', 10)
[27]:
Arrays(1000 items | columns: gamma, ns, ts)

You don’t need to descend all the way to the end of the tree:

[28]:
cy.bk.get_best(sig, 'dec', 5, 'nsig')
[28]:
{2.0: Arrays(1000 items | columns: gamma, ns, ts),
 4.0: Arrays(1000 items | columns: gamma, ns, ts),
 6.0: Arrays(1000 items | columns: gamma, ns, ts),
 8.0: Arrays(1000 items | columns: gamma, ns, ts),
 10.0: Arrays(1000 items | columns: gamma, ns, ts),
 14.0: Arrays(1000 items | columns: gamma, ns, ts),
 18.0: Arrays(1000 items | columns: gamma, ns, ts),
 22.0: Arrays(1000 items | columns: gamma, ns, ts),
 26.0: Arrays(1000 items | columns: gamma, ns, ts),
 30.0: Arrays(1000 items | columns: gamma, ns, ts)}

Maybe now the motivation for including the extra dec and nsig levels becomes clear: it makes it explicit, both on disk and in the in-memory data structure, what each level actually means.

We can use dicts of this form to compute sensitivity and discovery potential fluxes. But first, let’s get another dict of per-dec trial runners:

[29]:
with time('setup trial runners for sens/disc'):
    trs = {d: cy.get_trial_runner(src=cy.sources(0, d, deg=True)) for d in dec_degs}

0:00:01.046077 elapsed.

Then we define a little helper function:

[30]:
@np.vectorize
def find_n_sig(dec_deg, beta=0.9, nsigma=None):
    # get signal trials, background distribution, and trial runner
    sig_trials = cy.bk.get_best(sig, 'dec', dec_deg, 'nsig')
    b = cy.bk.get_best(bg, dec_deg)
    tr = cy.bk.get_best(trs, dec_deg)
    # determine ts threshold
    if nsigma is not None:
        ts = b.isf_nsigma(nsigma)
    else:
        ts = b.median()
    # include background trials in calculation
    trials = {0: b.trials}
    trials.update(sig_trials)
    # get number of signal events
    # (arguments prevent additional trials from being run)
    result = tr.find_n_sig(ts, beta, max_batch_size=0, logging=False, trials=trials, n_bootstrap=1)
    # return flux
    return tr.to_E2dNdE(result, E0=100, unit=1e3)

In your own analysis, you might define a slightly different helper function with options to return the number of events or the relative error rather than the flux; options to choose different E0 and/or unit; or options to consider different scenarios you may have computed such as different spectral indices, spectral cutoffs, source extensions, etc.

Now we call the function to compute the sensitivity and discovery potential on our grid of declinations:

[31]:
with time('compute sensitivity'):
    fluxs_sens = find_n_sig(dec_degs)
with time('compute discovery potential'):
    fluxs_disc = find_n_sig(dec_degs, beta=0.5, nsigma=5)

0:00:01.763108 elapsed.

0:00:01.817958 elapsed.

Finally, we make the plot:

[32]:
fig, ax = plt.subplots()
# '.-' dot-plus-line style not attractive, but useful for early-stage plotting
ax.semilogy(sindecs, fluxs_disc, '.-', color='C1', label=r'Discovery Potential ($5\sigma$)')
ax.semilogy(sindecs, fluxs_sens, '.-', color='C0', label=r'Sensitivity')
ax.set_xlabel(r'$\sin(\delta)$')
ax.set_ylabel(r'$E^2\cdot dN/dE\ \ [\text{TeV}\,\text{cm}^{-2}\,\text{s}]$')
ax.legend()
ax.grid()
plt.tight_layout()
_images/05_scaling_up_71_0.png

In real life, this is a moment for solemn contemplation. What does this plot tell us?

  • Is the declination dependence as expected for this selection? (yes) (approximately)

  • Are these values in the expected regime? (yes)

  • In particular, does this compare/contrast as expected with precedent? (yes) (see IC86-2011 plots)

  • Did we do enough background trials? (no) (visible fluctuations, possibly under-estimated discovery potential)

  • Is our declination grid sufficiently dense? (no) (behavior not well resolved near equator)

Before pressing ahead with an all-sky scan, the answers to all of these should be “yes”.

Since this is just a tutorial, we press on.

All-sky scan: p-values#

csky provides SkyScanTrialRunner for performing all-sky scans. To unblind one properly, we need not just a map of TS over the whole sky, but the corresponding p-values based on declination-dependent background distributions. Specifically, SkyScanTrialRunner accepts an argument ts_to_p=func, where it is expected that func(dec_in_radians, ts) -> p.

In general, a proper all-sky scan is performed on a healpix grid with pixels that are small compared to the angular resolution. Thus the scan grid should be dense compared to the required background distribution calculation grid. However, this means that you need to decide how to interpolate between declinations.

csky offers one possible solution: cy.dists.ts_to_p(). We define a trivial wrapper:

[33]:
def ts_to_p(dec, ts):
    # our background characterization was done on a grid of dec in degrees
    key = np.degrees(dec)
    # find and return the result
    return cy.dists.ts_to_p(bg, key, ts)

For example, consider a couple of arbitrary declinations and possible TS values:

[34]:
ts_to_p(np.array([np.pi/4, -np.pi/4]),
        np.array([10, .01]))
[34]:
array([0.00189439, 0.47481927])

What is cy.dists.ts_to_p doing behind the scenes? For any given declination, it finds the nearest smaller and larger background distribution declinations and computes p-values based on each one. Then it linearly interpolates — if the given declination is closer to the nearest smaller distribution, the final p-value will be closer to the one obtained from that distribution. Of course, if a given declination is beyond the smallest or largest background distribution declination, then that endpoint distribution alone is used. Because this method is strictly an interpolation, it ensures that the p-value map is no less smooth than the TS map.

Other methods are possible: for example, S. Coenders fit a smoothing spline to the ndof and \(\eta\) parameters in his 7yr analysis. I have found that this latter approach can be unstable in the sense that, at some declinations, the combination of independently reasonable background distribution fit parameters do not in fact give a good fit to the real background distribution at that declination. This can induce artifacts in the final p-value map.

All this may feel a bit abstract. Let’s go ahead and unblind the all-sky scan, to see how this plays out.

NOTE: In real life with fresh data, this step should definitely be done without TRUTH=True. Test on scrambled trials until unblinding approval! But we are safe here because this dataset was unblinded, and even publicly released, long ago.

[35]:
rad85 = np.radians(85)
sstr = cy.get_sky_scan_trial_runner(nside=128, min_dec=-rad85, max_dec=rad85, ts_to_p=ts_to_p)
[36]:
with time('unblind all-sky scan'):
    scan = sstr.get_one_scan(TRUTH=True, mp_cpus=15, logging=True)
Scanning 195880 locations using 15 cores:
     195880/195880 coordinates complete.

0:02:37.720980 elapsed.

Take a look at the shape of scan:

[37]:
scan.shape
[37]:
(4, 196608)

This is basically four healpix maps: \(-\log_{10}(p)\), \(\text{TS}\), \(\hat{n}_s\), and \(\hat\gamma\). Let’s plot them all.

[38]:
fig, axs = plt.subplots (2, 2, figsize=(10, 8), subplot_kw=dict (projection='aitoff'))
axs = np.ravel(axs)

# mask for regions with nonzero ns
mask = scan[2] > 0

# p and TS maps with classic skymap colorscheme
sp = cy.plotting.SkyPlotter(pc_kw=dict(cmap=cy.plotting.skymap_cmap))
ax = axs[0]
mesh, cb = sp.plot_map(ax, np.where(mask, scan[0], 0), n_ticks=2)
cb.set_label(r'$-\log_{10}(p)$')
ax = axs[1]
mesh, cb = sp.plot_map(ax, scan[1], n_ticks=2)
cb.set_label(r'TS')

# ns and gamma maps with viridis
sp = cy.plotting.SkyPlotter(pc_kw=dict(cmap='viridis'))
ax = axs[2]
mesh, cb = sp.plot_map(ax, np.where(mask, scan[2], np.nan), n_ticks=2)
cb.set_label(r'$\hat{n}_s$')
ax = axs[3]
mesh, cb = sp.plot_map(ax, np.where(mask, scan[3], np.nan), n_ticks=2)
cb.set_label(r'$\hat\gamma$')

# show GP, GC, and grid
for ax in axs:
    kw = dict(color='.5', alpha=.5)
    sp.plot_gp(ax, lw=.5, **kw)
    sp.plot_gc(ax, **kw)
    ax.grid(**kw)
plt.tight_layout()
_images/05_scaling_up_90_0.png

As you can see, the p-values are largely but not perfectly correlated with TS – as expected, since the background distributions vary with declination:

[39]:
fig, ax = plt.subplots(figsize=(3, 3))
hl.plot2d(ax, hl.hist((scan[1], scan[0]), bins=100), log=True)
ax.set_xlabel(r'TS')
ax.set_ylabel(r'$-\log_{10}(p)$')
plt.tight_layout()
_images/05_scaling_up_92_0.png

All-sky scan: on the cluster#

This is pretty much the end of the road for this tutorial. Similar methods to those above can be used to run all-sky scans on the cluster. The analyzer needs to make her own determination regarding how much information to save and how. For example, SkyScanTrialRunner can simply return information about the hottest spot for each trial, much as an ordinary TrialRunner returns the best fit rather than a scan over a likelihood space:

[40]:
sstr_16 = cy.get_sky_scan_trial_runner(nside=16, min_dec=-rad85, max_dec=rad85, ts_to_p=ts_to_p)
[41]:
ss_trials = sstr_16.get_many_fits(10, mp_cpus=10)
Performing 10 background trials using 10 cores:
         10/10 trials complete.
[42]:
ss_trials.as_dataframe
[42]:
dec gamma mlog10p ns ra ts
0 -0.894583 2.275326 3.399392 8.279833 5.256126 12.130737
1 -0.572419 3.316402 3.472688 10.036256 2.847068 10.729564
2 -0.041679 3.145972 3.664444 14.406262 1.668971 13.456195
3 -1.054780 3.922119 4.188652 10.515437 5.262168 15.164125
4 -0.523599 4.000000 4.966873 13.138896 2.699806 17.995175
5 0.000000 3.433455 3.499141 17.676977 2.405282 13.733677
6 -0.622827 2.423386 3.530386 10.082906 6.135923 10.614528
7 0.840223 2.826246 3.134401 13.705542 3.534292 11.802156
8 -0.840223 2.624530 3.388785 12.279441 2.524494 12.484693
9 -0.083430 2.085589 3.940884 13.425646 0.539961 12.689423

However, you might want to see the maps for yourself to get a feel for whether they function as expected. You might need more than just hottest spot information if you plan to apply some sort of population analysis or a custom spatial prior method. And anyways, modern analyses use much more data, even better angular resolution, sometimes time-dependent likelihoods, all of which mean the required compute time is increased (in the case of improved resolution, the problem is that higher pixel density is required). So most likely, the question isn’t how many scans you can do per job, but how many jobs you will need for each scan.

csky doesn’t yet offer a dedicated, built-in scan-splitting approach, but this can be achieved by setting min_dec and max_dec values such that jobs with equal random seed are broken up into declination bands.

In any case, regardless of the specific approach chosen, the final step in a traditional all sky scan is to compute a post-trials p-value. This is done by finding the fraction of background-only skies with a hottest spot of greater than or equal to the hottest \(-\log_{10}(p)\) in the real unblinded map.

Before unblinding, sanity checks include but may not be limited to:

  • Is nside large enough? (values should vary smoothly from pixel to pixel)

  • Is the hottest-spot \(\sin(\delta)\) uniformly distributed?

  • Can the post-trials correction distribution be obtained in reasonable compute time?

Timing summary#

[43]:
print(timer)
ana setup                         | 0:00:02.993551
bg trials                         | 0:00:57.027624
signal trials                     | 0:07:15.878014
load bg trials                    | 0:00:02.611293
load sig trials                   | 0:00:00.832596
setup trial runners for sens/disc | 0:00:01.046077
compute sensitivity               | 0:00:01.763108
compute discovery potential       | 0:00:01.817958
unblind all-sky scan              | 0:02:37.720980
--------------------------------------------------
total                             | 0:11:01.691201

Remarks#

In this tutorial, we’ve offered some tips on how to scale up an analysis from interactive exploration to serious cluster computing. We’ve looked specifically at the use case of a time-integrated all-sky scan with declination-dependent background. However, the methods used here should be widely applicable for single-hypothesis analyses (stacking, template, special-case follow-ups, etc.) as well as for time-dependent analyses.

I (M. Richman) explicitly encourage users to discuss implementation details such as scripting command line interfaces, working with specific clusters, etc. on Slack under #csky! And by all means: if you come up with different, maybe even more elegant ways of dealing with trial data on disk, let us know — it’s possible that your methods would make a nice addition to cy.bk.