Statement of the problem¶
Machine learning and classification is a very broad topic. The current implementation of pybdt is restricted to addressing one commen problem in experimental or observational physics applications: given some observed event, how do we determine whether the event consisted of a signal we are searching for, as opposed to some background? In more general terms, pybdt addresses binary classification of events. In this user manual, we will restrict our attention to binary classification.
Classification models are generated by a process referred to as training. Training requires an ensemble of known signal events and an ensemble of known background events. These are referred to as the signal training sample and background training sample. Each event consists of values for several variables, plus a weight. Classification models may then be used to obtain a score for each event. In pybdt, scores range from -1 to +1, where -1 is very background-like and +1 is very signal-like.