pybdt.ml module¶
Wrapper for C++ backend.
- class pybdt.ml.BDTLearner(feature_names=[], weight_name='', bg_weight_name='', _=None)¶
Bases:
Learner
Train boosted decision trees.
- property beta¶
The AdaBoost scaling factor.
- property frac_random_events¶
The fraction of events to use for training each tree.
Set to 1.0 to use every event for every tree.
- property num_trees¶
The number of individual decision trees to train.
- property quiet¶
Whether to silence the training progress bar.
- set_defaults()¶
Reset default BDTLearner (and internal DTLearner) properties.
- property use_purity¶
Whether to use decision tree leaf purity information.
If this option is set to True, purity information will be used during training as described in J. Zhu, H. Zou, S. Rosset, T. Hastie, “Multi-class AdaBoost”, 2009.
- class pybdt.ml.BDTModel(feature_names=[], dtmodels=[], alphas=[], _=None)¶
Bases:
Model
Represent a boosted decision tree classifier.
- property alphas¶
The alphas, or weights, for each decision tree.
- property dtmodels¶
The DTModels that make up this BDTModel.
- event_variable_importance(event, sep_weight=True, tree_weight=True)¶
Get a dictionary of variable importance values.
- Parameters:
event (dict) – A mapping from variable names to float values.
sep_weight (bool) – Whether to weight nodes where a variable is used by separation gain achieved rather than weighting all nodes equally.
tree_weight (bool) – Whether to trees according to their performance on the training set rather than weighting all trees equally.
- Returns:
dict with variable name keys and float values from 0 to 1.
- get_subset_bdtmodel(n_i, n_f)¶
Get a BDTModel using DTModels number n_i thru n_f.
- Parameters:
The parameters of this method follow the indexing convention of Python’s builtin range(i,j).
- get_subset_bdtmodel_list(dtmodel_indices)¶
Get a BDTModel using DTModels number n_i thru n_f.
- get_trimmed_bdtmodel(threshold)¶
Get a BDTModel using only DTModels that differ enough from the preceeding one.
- Parameters:
threshold (float) – The minimum percent change in alpha values of consecutive trees required in order to keep a given tree
Warning
This method may not be useful, and should be considered experimental.
- property n_dtmodels¶
The number of DTModels in this BDTModel.
- variable_importance(sep_weight=True, tree_weight=True)¶
Get a dictionary of variable importance values.
- Parameters:
- Returns:
dict with variable name keys and float values from 0 to 1.
- class pybdt.ml.CostComplexityPruner(strength=None, _=None)¶
Bases:
Pruner
Prune trees by eliminating nodes with the worst information-added to complexity-added ratio.
- static gain(node)¶
The weighted Gini separation gain of this node.
- Parameters:
node (
DTNode
.) – The node.
- property strength¶
The pruning strength.
Once the pruning sequence is computed, this is the percentage (0-100) of the prune operations which are actually executed.
- class pybdt.ml.DTLearner(feature_names=[], weight_name='', bg_weight_name='', _=None)¶
Bases:
Learner
Train single decision trees.
- property linear_cuts¶
True).
- Type:
Space cuts linearly (default
- property max_depth¶
The maximum depth to which to train each individual tree.
- property min_split¶
The minimum number of entries in a node which warrants further splitting.
- property num_cuts¶
The number of cuts to try at each potential split.
- property num_random_variables¶
The number of variables to consider using at each node.
Set to 0 to use every variable at every node.
- property separation_type¶
The separation type to use (one of ‘cross_entropy’, ‘gini’, or ‘misclass_error’: default is ‘gini’).
Warning
As of this writing, only ‘gini’ is known to be well-tested.
- set_defaults()¶
Reset default DTLearner properties.
- class pybdt.ml.DTModel(feature_names=[], root=None, _=None)¶
Bases:
Model
Represent a decision tree.
- event_variable_importance(event, sep_weight=True)¶
Get a dictionary of variable importance values.
- variable_importance(sep_weight=True)¶
Get a dictionary of variable importance values.
- Parameters:
sep_weight (bool) – Whether to weight nodes where a variable is used by separation gain achieved rather than weighting all nodes equally.
- Returns:
dict with variable name keys and float values from 0 to 1.
- class pybdt.ml.DTNode(w_sig, w_bg, n_sig, n_bg, sep_index, sep_gain=None, feature_id=None, feature_val=None, left=None, right=None, _=None)¶
Bases:
object
Represent a node in a decision tree.
- property feature_id¶
The id of the feature for this cut.
If this is a leaf, feature_id is +1 or -1 for signal or background, respectively.
- property feature_name¶
The name of the feature for this cut.
- property feature_val¶
The cut value for the feature specified by feature_id at this node.
- property is_leaf¶
Whether this node is a leaf.
- property left¶
The node for feature < feature_val.
- property max_depth¶
The maximum depth of the tree below this node.
- property n_bg¶
The number of training background events in this node.
- property n_leaves¶
The number of leaves below (and including) this node.
- property n_sig¶
The number of training signal events in this node.
- property n_total¶
The number of training signal + background events in this node.
- prune()¶
Prune the tree at this node.
This method prunes the tree at this node. The node becomes a leaf. If the purity is greater than 50%, it is a signal leaf; otherwise it is a background leaf.
- property purity¶
The purity of this node.
- property right¶
The node for feature >= feature_val.
- property sep_gain¶
The separation gain from this node.
- property sep_index¶
The separation index at this node.
- property tree_size¶
The size of the tree below (and including) this node.
- property w_bg¶
The sum of background weight in this node.
- property w_sig¶
The sum of signal weight in this node.
- property w_total¶
The sum of signal and background weight in this node.
- class pybdt.ml.DataSet(data, subset='all', _=None)¶
Bases:
object
A pybdt-friendly representation of a set of events.
- eval(expr, names={})¶
Evaluate an expression in terms of variables in this DataSet.
- Parameters:
- Returns:
numpy.ndarray — one element per event
When expr is evaluated, each variable stored in the dataset will be available. If the dataset has a livetime set, ‘livetime’ will also be available.
Other allowed identifiers are np (Numpy) and scipy, in addition to anything specified in the names parameter.
- get_subset(idx)¶
Get a subset of this dataset.
- Parameters:
idx (array of bools) – The subset of samples to keep.
- Returns:
pybdt.ml.Dataset.
- property livetime¶
The livetime of this DataSet (or -1 if never specified).
- property n_events¶
The number of rows in this DataSet.
- property n_features¶
The number of columns in this DataSet.
- property names¶
The names of the features stored by this DataSet.
- to_dict()¶
Get a dictionary with all data from the DataSet.
- class pybdt.ml.ErrorPruner(strength=None, _=None)¶
Bases:
Pruner
Prune trees by eliminating the nodes which least improve the estimated error.
Warning
As of this writing, this pruning method is not yet well-tested and should be considered unsupported.
- node_error(node)¶
The expected error of this node (affected by strength parameter).
- Parameters:
node (
DTNode
.) – The node.
- property strength¶
The pruning strength.
Once the pruning sequence is computed, this is the percentage (0-100) of the prune operations which are actually executed.
- class pybdt.ml.Learner(_)¶
Bases:
object
Train classification models.
- train(signal_dataset, background_dataset)¶
Train using the given DataSets.
- train_given_weights(signal_dataset, background_dataset, signal_weights, background_weights)¶
Train using the given DataSets and the given weights.
- class pybdt.ml.Model(_)¶
Bases:
object
Classify events based on some past training by a Learner.
Learners ultimately return Models upon training. The Model can be used to classify events using the score() methods.
- property feature_names¶
The names of the event features used by this Model.
- score(data, use_purity=False, quiet=False)¶
Obtain the score for a set of events.
- Parameters:
This convenience method calls
Model.score_DataSet()
,Model.score_dict()
orModel.score_event()
as appropriate.
- score_DataSet(ds, use_purity=False, quiet=False)¶
Obtain the score for a DataSet object.
- score_dict(data, use_purity=False, quiet=False)¶
Obtain the score for a set of events.
- score_event(event, use_purity=False)¶
Obtain the score for a single event.
- class pybdt.ml.MultiModel1D(column, bins, bdts)¶
Bases:
Model
A collection of BDTs, one for each bin along a single axis.
- get_cut(cut_values)¶
A
MultiModel1DCut
for this MultiModel1D.- Parameters:
cut_values (array-like) – The per-bin cut values.
- score_dict(data)¶
Obtain the score for a set of events.
- class pybdt.ml.MultiModel1DCut(multi_bdtmodel_1d, cut_values)¶
Bases:
object
A cut which takes one value per MultiModel1D bin.
- decision(thing, scores=None)¶
Return the cut decision for thing, possibly given scores.
- Parameters:
thing (
DataSet
or dict, or bin column array) – If DataSet or dict, seeModel.score()
; otherwise, this is an array of values by which the MultiModel1D is binned.scores – array-like
scores – The score or scores for the event or events (score or scores are calculated if not given).
- class pybdt.ml.SameLeafPruner(_=None)¶
Bases:
Pruner
Prune trees where adjacent leaves yield the same class.
- class pybdt.ml.VineLearner(vine_feature, vine_feature_min, vine_feature_max, vine_feature_width, vine_feature_step, learner, _=None)¶
Bases:
Learner
Train VineModels.
- property learner¶
The underlying learner used for each vine bin.
- property quiet¶
Whether to silence the training progress bar.
- property vine_feature¶
Whether to silence the training progress bar.
- property vine_feature_max¶
Whether to silence the training progress bar.
- property vine_feature_min¶
Whether to silence the training progress bar.
- property vine_feature_step¶
Whether to silence the training progress bar.
- property vine_feature_width¶
Whether to silence the training progress bar.
- pybdt.ml.get_epsilon()¶
Get the global epsilon value.
- Returns:
The value used to trim purity from [0,1] to [epsilon,1-epsilon] when using purity for training or scoring
- pybdt.ml.set_epsilon(eps)¶
Set the global epsilon value.
- Parameters:
eps (float) – The value to use to trim purity from [0,1] to [epsilon,1-epsilon] when using purity for training or scoring
- pybdt.ml.unwrapped(py_object)¶
Get a pure-Python wrapped instance.
- Parameters:
cpp_object (object) – A pure-Python pybdt class instance.
- Returns:
A C++ pybdt class instance.
- pybdt.ml.wrapped(cpp_object)¶
Get a pure-Python wrapped instance.
- Parameters:
cpp_object (Boost.Python.instance) – A C++ pybdt class instance.
- Returns:
A pure-Python class instance.