Présentation PowerPoint - Indico

Présentation PowerPoint - Indico

Multivariate Data Analysis with TMVA Peter Speckmayer (*) (CERN) LCD Seminar, CERN, April 14, 2010 * On behalf of the present core developer team: A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E. v. Toerne, H. Voss ( ) And the contributors: Tancredi Carli (CERN, Switzerland), Asen Christov (Universitt Freiburg, Germany), Krzysztof Danielowski (IFJ and AGH/UJ, Krakow, Poland), Dominik Dannheim (CERN, Switzerland), Sophie Henrot-Versille (LAL Orsay, France), Matthew Jachowski (Stanford University, USA), Kamil Kraszewski (IFJ and AGH/UJ, Krakow, Poland), Attila Krasznahorkay Jr. (CERN, Switzerland, and Manchester U., UK), Maciej Kruk (IFJ and AGH/UJ, Krakow, Poland), Yair Mahalalel (Tel Aviv University, Israel), Rustem Ospanov (University of Texas, USA), Xavier Prudent (LAPP Annecy, France), Arnaud Robert (LPNHE Paris, France), Doug Schouten (S. Fraser University, Canada), Fredrik Tegenfeldt (Iowa University, USA, until Aug 2007), Jan Therhaag (Universitt Bonn, Germany), Alexander Voigt (CERN, Switzerland), Kai Voss (University of Victoria, Canada), Marcin Wolter (IFJ PAN Krakow, Poland), Andrzej Zemla (IFJ PAN Krakow, Poland). On the web: http://tmva.sf.net/ (home), https://twiki.cern.ch/twiki/bin/view/TMVA/WebHome (tutorial) Top Workshop, CERN, April 14., LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 1 Outline Introduction:

the reasons why we need sophisticated data analysis algorithms the classification/(regression) problem what is Multivariate Data Analysis and Machine Learning a little bit of statistics Classifiers in TMVA Cuts Kernel Methods and Likelihood Estimators Linear Fisher Discriminant Neural Networks Support Vector Machines BoostedDecision Trees General boosting Category classifier TMVA Using TMVA Toy examples Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA

2 Literature / Software packages ... a short/biased selection Literature T.Hastie, R.Tibshirani, J.Friedman, The Elements of Statistical Learning, Springer 2001 C.M.Bishop, Pattern Recognition and Machine Learning, Springer 2006 Software packages for Mulitvariate Data Analysis/Classification individual classifier software e.g. JETNET C.Peterson, T. Rognvaldsson, L.Loennblad attempts to provide all inclusive packages StatPatternRecognition: I.Narsky, arXiv: physics/0507143 http://www.hep.caltech.edu/~narsky/spr.html TMVA: Hcker,Speckmayer,Stelzer,Therhaag, v.Toerne,Voss, arXiv: physics/0703039 http:// tmva.sf.net or every ROOT distribution (not necessarily the latest TMVA version though) WEKA: http://www.cs.waikato.ac.nz/ml/weka/ Huge data analysis library available in R: http://www.r-project.org/ Conferences: PHYSTAT, ACAT, CHEP Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 3 Event Classification in High-Energy Physics (HEP) Most HEP analyses require discrimination of signal from background:

Event level (Higgs searches, ) Cone level (Tau-vs-jet reconstruction, ) Track level (particle identification, ) Lifetime and flavour tagging (b-tagging, ) Parameter estimation (CP violation in B system, ) etc. The multivariate input information used for this has various sources Kinematic variables (masses, momenta, decay angles, ) Event properties (jet/lepton multiplicity, sum of charges, ) Event shape (sphericity, Fox-Wolfram moments, ) Detector response (silicon hits, dE/dx, Cherenkov angle, shower profiles, muon hits, ) etc. Traditionally few powerful input variables were combined; new methods allow to use up to 100 and more variables w/o loss of classification power e.g. MiniBooNE: NIMA 543 (2005), or D0 single top: Phys.Rev. D78, 012005 (2008) Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 4 Regression How to estimate a functional behaviour from a set of measurements? Energy deposit in a the calorimeter, distance between overlapping photons, Entry location of the particle in the calorimeter or on a silicon pad, Constant ? Linear function ? Nonlinear ?

f(x) f(x) f(x) x x x Seems trivial? human eye has good pattern recognition What if we have many input variables? Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 5 Regression model functional behaviour Assume for example D-variables that somehow characterize the shower in your calorimeter. Monte Carlo or testbeam data sample with measured cluster observables + known particle energy = calibration function (energy == surface in D+1 dimensional space) f(x) 2-D example events generated according:

underlying distribution 1-D example f(x,y ) y x x better known: (linear) regression fit a known analytic function e.g. the above 2-D example reasonable function would be: f(x) = ax2+by2+c what if we dont have a reasonable model ? need something more general: e.g. piecewise defined splines, kernel estimators, decision trees to approximate f(x) Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 6 Event Classification Suppose data sample with two types of events: H0, H1 We have found discriminating input variables x1, x2, What decision boundary should we use to select events of type H1? Rectangular cuts? x2 A linear boundary?

H1 H0 x2 H1 A nonlinear one? x2 H0 x1 Low variance (stable), high bias methods H1 H0 x1 x1 High variance, small bias methods How can we decide this in an optimal way ? Let the machine learn it ! Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 7 Multivariate Classification RN

R {C1,C2} position of the cut depends on the type of study y(x) to one classifier separate into classes output* multiple input choose a cut value on variables the classifier y *Cut classifier is an exception: Direct mapping from RN {Signal,Background} 8 Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 8 Multivariate Classification RN R {C1,C2}

position of the cut depends on the type of study y(x) Distributions of y(x): PDFS(y) and PDFB(y) to one y(x) = const: surface defining theclassifier decision boundary.separate into classes output* Overlapmultiple of PDFS(y) and PDFB(y) affects separation power, purity input choose a cut value on variables the classifier y *Cut classifier is an exception: Direct mapping from RN {Signal,Background} 9 Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 9

Event Classification P(Class=C|x) (or simply P(C|x)) : probability that the event class is of type C, given the measured observables x = {x1,.,xD} y(x) Probability density distribution according to the measurements x and the given mapping function P(Class C y ) Posterior probability Prior probability to observe an event of class C, i.e., the relative abundance of signal versus background P(y C ) P(C) P(y) Overall probability density to observe the actual Classes measurement y(x), i.e., P( y) P(y C ) P(C) C Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 10 Bayes Optimal Classification

P ( y | C ) P ( C )x={x ,.,x }: measured observables 1 D P ( C l a s s = C | y ) = P ( y ) y = y(x) + Minimum error in misclassification if C chosen such that it has maximum P(C|y) to select S(ignal) over B(ackground), place decision on: [ Or any monotonic function of P(S|y) / P(B|y) ] Posterior

odds ratio P ( S | y )P ( y | S ) P ( S ) c P ( B | y )P ( y | B )P ( B ) Likelihood ratio as discriminating function y(x) c determines efficiency and purity

Prior odds ratio of choosing a signal event (relative probability of signal vs. bkg) Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 11 Any Decision Involves a Risk Decide to treat an event as Signal or Background Type-1 error: (false positive) classify event as Class C even though it is not (accept a hypothesis although it is not true) (reject the null-hypothesis although it would have been the correct one) loss of purity (in the selection of signal events) Type-2 error: (false negative) fail to identify an event from Class C as such (reject a hypothesis although it would have been true) (fail to reject the null-hypothesis/accept null hypothesis although it is false) Trying to select signal events: (i.e. try to disprove the null-hypothesis stating it were only a background event) ac c tru ept ly a is: s:

Signal Background Signal Type-2 error Background Type-1 error loss of efficiency (in selecting signal events) A: region of the outcome of the test where you accept the event as signal: Significance : Type-1 error rate: (=p-value): = background selection efficiency P ( x | B)dx should be small miss rate : Type-2 error rate: Power: 1- = signal selection efficiency P( x | S )dx should be small A

!A Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 12 Neyman-Pearson Lemma y(x) P(x | S) P(x | B) Neyman-Peason: Neyman-Peason: The The Likelihood Likelihood ratio ratio used used as as selection selection criterion criterion y(x) y(x) gives gives for for each each selection selection efficiency efficiency the the best

best possible possible background background rejection. rejection. (1933) (1933) 1 1- ebackgr. Likelihood Ratio : i.e. i.e. itit maximises maximises the the area area under under the the Receiver Receiver Operation Operation Characteristics Characteristics (ROC) (ROC) curve curve few false positives lim many missed it i giv n en b y ROC like c liho urve od be

rati tte o r g oo ra n do m gu dc la s cla sif es sin g ica ss tio n ific at i on many false positives

few missed 0 0 esignal 1 Varying y(x)>cut moves the working point (efficiency and purity) along the ROC curve How to choose cut? need to know prior probabilities (S, B abundances) Measurement of signal cross section: maximum of S/(S+B) or equiv. (ep) Discovery of a signal : maximum of S/(B) Precision measurement: high purity (p) Trigger selection:high efficiency (e) Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 13 Neyman-Pearson Lemma lim it i

giv n en b y ROC like c liho urve od rati o if discriminating function y(x)=true likelihood ratio somewhere on the ROC curve y(x)true likelihood ratio different, point on 1 1- ebackgr. optimal working point for specific analysis lies y(x ) y(x) might be better for a specific working point than y(x) and vice versa 0 0 y(x ) esignal 1 Note:

for the determination of your working point (e.g. S/ (B)) you need the prior S and B probabilities! number of events/luminosity Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 14 Realistic Event Classification Unfortunately, the true probability densities functions are typically unknown: Neyman-Pearsons lemma doesnt really help us Use MC simulation, or more generally: set of known (already classified) events Use these training events to: Try to estimate the functional form of P(x|C) from which the likelihood ratio can be obtained e.g. D-dimensional histogram, Kernel densitiy estimators, MC-based matrix-element methods, Find a discrimination function y(x) and corresponding decision boundary (i.e. hyperplane* in the feature space: y(x) = const) that optimally separates signal from background e.g. Linear Discriminator, Neural Networks, supervised (machine) learning * hyperplane in the strict sense goes through the origin. Here is meant an affine set to be precise. Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker:

Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 15 Realistic Event Classification Unfortunately, the true probability densities functions are typically unknown: Neyman-Pearsons lemma doesnt really help us Use MC simulation, or more generally: set of known (already classified) events Use these training events to: Of course, there is no magic in here. We still need to: Try estimate functional form of P(x|C) from which the likelihood ratio to Choose thethe discriminating variables can be obtained Choose the class of models (linear, non-linear, flexible or less flexible) e.g. D-dimensional histogram, Kernel densitiy estimators, MC-based matrix-element methods, Tune the learning parameters bias vs. variance trade off Find a discrimination function y(x) and corresponding decision boundary Check generalisation properties (i.e.

hyperplane* in the feature space: y(x) = const) that optimally separates Consider trade off between statistical and systematic uncertainties signal from background e.g. Linear Discriminator, Neural Networks, supervised (machine) learning * hyperplane in the strict sense goes through the origin. Here is meant an affine set to be precise. Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 16 What is TMVA ROOT: is the analysis framework used by most (HEP)-physicists Idea: rather than just implementing new MVA techniques and making them available in ROOT (i.e., like TMulitLayerPercetron does): Have one common platform / interface for high-end multivariate classifiers

Have common data pre-processing capabilities Train and test all classifiers on same data sample and evaluate consistently Provide common analysis (ROOT scripts) and application framework Provide access with and without ROOT, through macros, C++ executables or python Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 17 Multivariate Analysis Methods Examples for classifiers and regression methods Rectangular cut optimisation Projective and multidimensional likelihood estimator k-Nearest Neighbor algorithm Fisher, Linear and H-Matrix discriminants Function discriminants

Artificial neural networks Boosted decision trees RuleFit Support Vector Machine Examples for preprocessing methods: Decorrelation, Principal Value Decomposition, Gaussianisation Examples for combination methods: Boosting, Categorisation Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 18

D D aa tt aa P P rr ee pp rr oo cc ee ss ss ii nn gg Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 19 Data Preprocessing: Decorrelation Commonly realised for all methods in TMVA Removal of linear correlations by rotating input variables Cholesky decomposition: determine square-root C of covariance matrix C, i.e., C = CC Transform original (x) into decorrelated variable space (x) by: x = C-1x Principal component analysis Variable hierarchy: linear transformation projecting on axis to achieve largest variance xkPC i event ) v variables PC of variable k xv i event ) - xv v v( k ) , k variables Sample means

Eigenvector Matrix of eigenvectors V obeys relation: C V D V thus PCA eliminates correlations correlation matrix diagonalised square root of C Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 20 Data Preprocessing: Decorrelation original original SQRT SQRTderorr. derorr. PCA PCAderorr. derorr. Note that decorrelation is only complete, if Correlations are linear Input variables are Gaussian distributed Not very accurate conjecture in general Top Workshop,

CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 21 Gaussian-isation Improve decorrelation by pre-Gaussianisation of variables First: Rarity transformation to achieve uniform distribution: xk i event ) xkflat i event ) pk xk ) dxk , k variables - Rarity transform of variable k Measured value PDF of variable k The integral can be solved in an unbinned way by event counting, or by creating non-parametric PDFs (see later for likelihood section) Second: make Gaussian via inverse error function: erf x ) 2

x 2 -t e dt 0 xkGauss i event ) 2 erf - 1 2 xkflat i event ) - 1 , k variables ) Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 22 Gaussian-isation Background Signal Gaussianised - -Gaussianised Background Signal-Original -Original Gaussianised Gaussianised We cannot simultaneously gaussianise both signal and background !

Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 23 How to apply the Preprocessing Transformation ? Any type of preprocessing will be different for signal and background But: for a given test event, we do not know the species ! Not so good solution: choose one or the other, or a S/B mixture. As a result, none of the transformations will be perfect. for most of the methods Good solution: for some methods it is possible to test both S and B hypotheses with their transformations, and to compare them. Example, projective likelihood ratio: y L i event ) pkS xk (i event ) ) k variables pkS xk (i event ) ) k variables pkB xk (i event ) )

k variables signal transformation y Ltrans i event ) p T xk (i event ) S k k variables k variables pkS T S xk (i event ) ) S k variables ) background transformation pkB T B xk (i event )

Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA ) 24 TT hh ee C C ll aa ss ss ii ff ii ee rr ss Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 25 Rectangular Cut Optimisation Simplest method: cut in rectangular variable volume xcut i event ) 0,1 x i v variables

v event ) xv ,min , xv ,max ) Cuts usually benefit from prior decorrelation of cut variables Technical challenge: how to find optimal cuts ? MINUIT fails due to non-unique solution space TMVA uses: Monte Carlo sampling, Genetic Algorithm, Simulated Annealing Huge speed improvement of volume search by sorting events in binary tree Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 26 Projective Likelihood Estimator (PDE Approach) Much liked in HEP: probability density estimators for each input variable combined in likelihood estimator Likelihood ratio for event ievent y L i event ) PDFs discriminating variables

signal k p PDE introduces fuzzy logic xk (i event )) k variables U p x ( i ) k k event ) U species k variables Species: signal, background types Ignores correlations between input variables Optimal approach if correlations are zero (or linear decorrelation) Otherwise: significant performance loss Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith

withTMVA TMVA 27 PDE Approach: Estimating PDF Kernels Technical challenge: how to estimate the PDF shapes 3 ways: parametric fitting (function) nonparametric fitting Difficult to automate for arbitrary PDFs Easy to automate, can create artefacts/suppress information event counting Automatic, unbiased, but suboptimal We have chosen to implement nonparametric fitting in TMVA Binned shape interpolation using spline functions and adaptive smoothing original distribution is Gaussian Unbinned adaptive kernel density estimation (KDE) with Gaussian smearing TMVA performs automatic validation of goodness-of-fit Top Workshop, CERN,

April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 28 Multidimensional PDE Approach Use a single PDF per event class (sig, bkg), which spans Nvar dimensions PDE Range-Search: count number of signal and background events in vicinity of test event preset or adaptive volume defines vicinity Carli-Koblitz, NIM A501, 576 (2003) y PDERS i event ,V ) 0.86 x2 H1 test event H0 x1 Improve yPDERS estimate within V by using various Nvar-D kernel estimators Enhance speed of event counting in volume by binary tree search Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate

Multivariate DataAnalysis Analysiswith withTMVA TMVA 29 Multidimensional PDE Approach Use a single PDF per event class (sig, bkg), which spans Nvar dimensions PDE Range-Search: count number of signal and background events in vicinity of test event preset or adaptive volume defines vicinity x2 k-Nearest Neighbor Carli-Koblitz, NIM A501, 576 (2003) y PDERS i event ,V ) 0.86 H1 Better than searching within a volume (fixed or floating), count adjacent reference events till statistically significant number reached test event Method intrinsically adaptive 0 Very fast search with H kd-tree event sorting x1 Improve yPDERS estimate within V by using various Nvar-D kernel estimators Enhance speed of event counting in volume by binary tree search Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer

A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 30 Fishers Linear Discriminant Analysis (LDA) Well known, simple and elegant classifier x2 x2 LDA determines axis in the input variable hyperspace such that a projection of events onto this axis pushes signal and background as far away from each other as possible, while confining events of same class in close vicinity to each other H1 H1 H0 H0 x1 x1 Classifier response couldnt be simpler: Bias y Fi i event ) F0 Fisher coefficients xk i event ) Fk

k variables Compute Fisher coefficients from signal and background covariance matrices Fisher requires distinct sample means between signal and background Optimal classifier (Bayes limit) for linearly correlated Gaussian-distributed variables Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 31 Fishers Linear Discriminant Analysis (LDA) Well known, simple and elegant classifier x2 x2 LDA determines axis in the input variable hyperspace such that a projection of events onto this axis pushes signal and background as far away from each other as possible, while confining events of same class in close vicinity to each other Function discriminant analysis (FDA) H1 H1 H0 H0 x1 x1

Fit anyresponse user-defined function input variables requiring that signal Classifier couldnt beofsimpler: events return 1 and background 0 Bias Fisher coefficients Parameter fitting: Genetics Alg., MINUIT, MC and combinations y Fi i event ) F0 of Fisher xk i eventbut ) Fcan k Easy reproduction result, add nonlinearities k variables Very transparent discriminator Compute Fisher coefficients from signal and background covariance matrices Fisher requires distinct sample means between signal and background Optimal classifier (Bayes limit) for linearly correlated Gaussian-distributed variables Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA

TMVA 32 Nonlinear Analysis: Artificial Neural Networks Feed-forward Multilayer Perceptron Achieve nonlinear classifier response by activating output nodes using nonlinear weights 1 input layer w11 1 .. . Nvar discriminating input variables k hidden layers i .. . .. . j w ij .. . 1

.. . Mk 2 output classes (signal and background) ( k 1) x1,2 M1 N xi(0) 1..Nvar ... 1 w1j 1 ouput layer x (k ) j Mk - 1 A w 0( kj ) w ij( k ) xi( k - 1) i 1

(Activation function) with: A( x ) 1 e - x ) -1 Weight adjustment using analytical back-propagation Three different implementations in TMVA (all are Multilayer Perceptrons) TMlpANN: MLP: CFMlpANN: Interface to ROOTs MLP implementation TMVAs own MLP implementation for increased speed and flexibility ALEPHs Higgs search ANN, translated from FORTRAN Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 33 Decision Trees Sequential application of cuts splits the data into nodes, where the final nodes (leafs) classify an

event as signal or background Growing a decision tree: Start with Root node Split training sample according to cut on best variable at this node Splitting criterion: e.g., maximum Gini-index: purity (1 purity) Continue splitting until min. number of events or max. purity reached Classify leaf node according to majority of events, or give weight; unknown test events are classified accordingly Why not multiple branches (splits) per node ? Fragments data too quickly; also: multiple splits per node = series of binary node splits Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 34 Decision Trees Sequential application of cuts splits the data into nodes, where the final nodes (leafs) classify an event as signal or background Classify leaf node according to majority of events, or give Decision tree before pruningtest events are classified accordingly weight; unknown

Decision tree after pruning Bottom-up pruning of a decision tree Remove statistically insignificant nodes to reduce tree overtraining Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 35 Boosted Decision Trees (BDT) Data mining with decision trees is popular in science (so far mostly outside of HEP) Advantages: Easy to interpret Immune against outliers Weak variables are ignored (and dont (much) deteriorate performance) Shortcomings: Instability: small changes in training sample can dramatically alter the tree structure Sensitivity to overtraining ( requires pruning) Boosted decision trees: combine forest of decision trees, with differently weighted events in each tree (trees can also be weighted), by majority vote e.g., AdaBoost: incorrectly classified events receive larger weight in next decision tree Bagging (instead of boosting): random event weights, re-sampling with replacement Boosting or bagging are means to create set of basis functions: the final classifier is linear combination (expansion) of these functions improves stability ! Top Workshop,

CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 36 Predictive Learning via Rule Ensembles (RuleFit) Following RuleFit approach by Friedman-Popescu Friedman-Popescu, Tech Rep, Stat. Dpt, Stanford U., 2003 Model is linear combination of rules, where a rule is a sequence of cuts RuleFit classifier normalised discriminating event variables rules (cut sequence rm=1 if all cuts satisfied, =0 otherwise) MR nR y RF x ) a0 am rm x bk x k m 1 ) Sum of rules

k 1 Linear Fisher term The problem to solve is Create rule ensemble: use forest of decision trees Fit coefficients am, bk: gradient direct regularization minimising Risk (Friedman et al.) Pruning removes topologically equal rules (same variables in cut sequence) One of the elementary cellular automaton rules (Wolfram 1983, 2002). It specifies the next color in a cell, depending on its color and its immediate neighbors. Its rule outcomes are encoded in the binary representation 30=00011110 2. Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 37 Support Vector Machine (SVM) x2 Best separation: maximum distance (margin) between closest events (support) to hyperplane Linear decision boundary If data non-separable add misclassification cost parameter to minimisation function support vectors

Separable data Linear case: find hyperplane that best separates signal from background op tim al hy pe rp l an e margin Non-linear cases: Transform variables into higher dim. space where a linear boundary can fully separate the data Explicit transformation not required: use kernel functions to approximate scalar products between transformed vectors in the higher dim. space Choose Kernel and fit the hyperplane using the techniques developed for linear case Non-separable data x1 x3 x2 (x1,x2)

Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA x1 x1 38 Generalised Classifier Boosting Principle (just as in BDT): multiple training cycles, each time wrongly classified events get a higher event weight Training Sample classifier C(0)(x) re-weight Weighted Sample classifier C(1)(x) re-weight Weighted Sample classifier C(2)(x) NClassifier

y(x) i (i) 1 - ferr log (i) ferr (i) C (x) re-weight Weighted Sample classifier C(m)(x) Response is weighted sum of each classifier response Boosting will be interesting especially for Methods like Cuts, MLP, and SVM Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 39

Categorising Classifiers Multivariate training samples often have distinct sub-populations of data A detector element may only exist in the barrel, but not in the endcaps A variable may have different distributions in barrel, overlap, endcap regions Ignoring this dependence creates correlations between variables, which must be learned by the classifier Classifiers such as the projective likelihood, which do not account for correlations, significantly loose performance if the sub-populations are not separated Categorisation means splitting the data sample into categories defining disjoint data samples with the following (idealised) properties: Events belonging to the same category are statistically indistinguishable Events belonging to different categories have different properties In TMVA: All categories are treated independently for training and application (transparent for user), but evaluation is done for the whole data sample Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 40 U U ss ii nn gg T TM MV VA A A typical TMVA analysis consists of two main steps: 1. Training phase: training, testing and evaluation of classifiers using data samples with known signal and background composition

2. Application phase: using selected trained classifiers to classify unknown data samples Illustration of these steps with toy data samples T MVA tutorial Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 41 A Simple Example for Training void TMVClassification( ) { TFile* outputFile = TFile::Open( "TMVA.root", "RECREATE" ); create Factory TMVA::Factory *factory = new TMVA::Factory( "MVAnalysis", outputFile,"!V"); TFile *input = TFile::Open("tmva_example.root"); factory->AddSignalTree ( (TTree*)input->Get("TreeS"), 1.0 ); factory->AddBackgroundTree ( (TTree*)input->Get("TreeB"), 1.0 ); give training/test trees factory->AddVariable("var1+var2", 'F'); factory->AddVariable("var1-var2", 'F'); factory->AddVariable("var3", 'F');

factory->AddVariable("var4", 'F'); register input variables factory->PrepareTrainingAndTestTree("", "NSigTrain=3000:NBkgTrain=3000:SplitMode=Random:!V" ); factory->BookMethod( TMVA::Types::kLikelihood, "Likelihood", "!V:!TransformOutput:Spline=2:NSmooth=5:NAvEvtPerBin=50" ); select MVA methods factory->BookMethod( TMVA::Types::kMLP, "MLP", "!V:NCycles=200:HiddenLayers=N+1,N:TestRate=5" ); factory->TrainAllMethods(); factory->TestAllMethods(); factory->EvaluateAllMethods(); train, test and evaluate outputFile->Close(); delete factory; } Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA T MVA tutorial 42 A Simple Example for an Application void TMVClassificationApplication( ) {

create Reader TMVA::Reader *reader = new TMVA::Reader("!Color"); Float_t var1, var2, var3, var4; reader->AddVariable( "var1+var2", &var1 ); reader->AddVariable( "var1-var2", &var2 ); reader->AddVariable( "var3", &var3 ); reader->AddVariable( "var4", &var4 ); register the variables reader->BookMVA( "MLP classifier", "weights/MVAnalysis_MLP.weights.txt" ); book classifier(s) TFile *input = TFile::Open("tmva_example.root"); TTree* theTree = (TTree*)input->Get("TreeS"); // set branch addresses for user TTree for (Long64_t ievt=3000; ievtGetEntries();ievt++) { theTree->GetEntry(ievt); var1 = userVar1 + userVar2; var2 = userVar1 - userVar2; var3 = userVar3; var4 = userVar4; Double_t out = reader->EvaluateMVA( "MLP classifier" ); prepare event loop compute input variables calculate classifier output // do something with it } delete reader; T MVA tutorial } Top Workshop, CERN,

April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 43 Data Preparation Data input format: ROOT TTree or ASCII Supports selection of any subset or combination or function of available variables Supports application of pre-selection cuts (possibly independent for signal and bkg) Supports global event weights for signal or background input files Supports use of any input variable as individual event weight Supports various methods for splitting into training and test samples: Block wise Randomly Alternating User defined training and test trees Preprocessing of input variables (e.g., decorrelation) Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 44

A Toy Example (idealized) Use data set with 4 linearly correlated Gaussian distributed variables: ---------------------------------------Rank : Variable : Separation ---------------------------------------1 : var4 : 0.606 2 : var1+var2 : 0.182 3 : var3 : 0.173 4 : var1-var2 : 0.014 --------------------------------------- Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 45 Preprocessing the Input Variables Decorrelation of variables before training is useful for this example Note that in cases with non-Gaussian distributions and/or nonlinear correlations decorrelation may do more harm than any good Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis

Analysiswith withTMVA TMVA 46 Preprocessing the Input Variables Decorrelation of variables before training is useful for this example Note that in cases with non-Gaussian distributions and/or nonlinear correlations decorrelation may do more harm than any good Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 47 MVA Evaluation Framework TMVA is not only a collection of classifiers, but an MVA framework After training, TMVA provides ROOT evaluation scripts (through GUI) Plot all signal (S) and background (B) input variables with and without pre-processing Correlation scatters and linear coefficients for S & B Classifier outputs (S & B) for test and training samples (spot overtraining) Classifier Rarity distribution Classifier significance with optimal cuts B rejection versus S efficiency Classifier-specific plots: Likelihood reference distributions Classifier PDFs (for probability output and Rarity)

Network architecture, weights and convergence Rule Fitting analysis plots Visualise decision trees Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 48 Evaluating the Classifier Training (I) Projective likelihood PDFs, MLP training, BDTs, average no. of nodes before/after pruning: 4193 / 968 Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 49 Evaluating the Classifier Training (II) Check for overtraining: classifier output for test and training samples

Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 50 Evaluating the Classifier Training (II) Check for overtraining: classifier output for test and training samples Remark on overtraining Occurs when classifier training has too few degrees of freedom because the classifier has too many adjustable parameters for too few training events Sensitivity to overtraining depends on classifier: e.g., Fisher weak, BDT strong Compare performance between training and test sample to detect overtraining Actively counteract overtraining: e.g., smooth likelihood PDFs, prune decision trees, Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 51 Evaluating the Classifier Training (III) Parallel Coordinates (ROOT class)

Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 52 Evaluating the Classifier Training (III) Parallel Coordinates (ROOT class) Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 53 Evaluating the Classifier Training (IV) There is no unique way to express the performance of a classifier several benchmark quantities computed by TMVA Signal eff. at various background effs. (= 1 rejection) when cutting on classifier output 2 The Separation:

1 yS ( y ) - y B ( y )) dy 2 y S ( y ) y B ( y ) y Rarity implemented (background flat): R( y ) y ( y )dy - Other quantities see Users Guide Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 54 Evaluating the Classifier Training (V) Optimal cut for each classifiers Determine the optimal cut (working point) on a classifier output Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA

TMVA 55 Evaluating the Classifiers Training (VI) (taken from TMVA output) Better variable Input Variable Ranking ------------------- Fisher Fisher Fisher Fisher Fisher Fisher Fisher Fisher Fisher : : : : : : : : : Ranking result (top variable is best ranked) --------------------------------------------Rank : Variable : Discr. power --------------------------------------------1 : var4 : 2.175e-01 2 : var3 : 1.718e-01 3 : var1

: 9.549e-02 4 : var2 : 2.841e-02 --------------------------------------------- How discriminating is a variable ? Classifier correlation and overlap ------------- Factory Factory Factory Factory Factory Factory : : : : : : Inter-MVA overlap matrix (signal): -----------------------------Likelihood Fisher Likelihood: +1.000 +0.667 Fisher: +0.667 +1.000 ------------------------------ Do classifiers select the same events as signal and background ? If not, there is something to gain ! Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker:

Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 56 Better classifier Evaluating the Classifiers Training (VII) Check for overtraining (taken from TMVA output) Evaluation results ranked by best signal efficiency and purity (area) -----------------------------------------------------------------------------MVA Signal efficiency at bkg eff. (error): | SepaSignifiMethods: @B=0.01 @B=0.10 @B=0.30 Area | ration: cance: -----------------------------------------------------------------------------Fisher : 0.268(03) 0.653(03) 0.873(02) 0.882 | 0.444 1.189 MLP : 0.266(03) 0.656(03) 0.873(02) 0.882 | 0.444 1.260 LikelihoodD : 0.259(03) 0.649(03) 0.871(02) 0.880 | 0.441 1.251 PDERS : 0.223(03) 0.628(03) 0.861(02) 0.870 | 0.417 1.192 RuleFit : 0.196(03) 0.607(03) 0.845(02) 0.859 | 0.390

1.092 HMatrix : 0.058(01) 0.622(03) 0.868(02) 0.855 | 0.410 1.093 BDT : 0.154(02) 0.594(04) 0.838(03) 0.852 | 0.380 1.099 CutsGA : 0.109(02) 1.000(00) 0.717(03) 0.784 | 0.000 0.000 Likelihood : 0.086(02) 0.387(03) 0.677(03) 0.757 | 0.199 0.682 -----------------------------------------------------------------------------Testing efficiency compared to training efficiency (overtraining check) -----------------------------------------------------------------------------MVA Signal efficiency: from test sample (from traing sample) Methods: @B=0.01 @B=0.10 @B=0.30 -----------------------------------------------------------------------------Fisher : 0.268 (0.275) 0.653 (0.658) 0.873 (0.873) MLP : 0.266 (0.278) 0.656 (0.658) 0.873 (0.873) LikelihoodD : 0.259 (0.273) 0.649 (0.657) 0.871 (0.872) PDERS : 0.223 (0.389) 0.628 (0.691) 0.861 (0.881) RuleFit : 0.196 (0.198) 0.607 (0.616) 0.845 (0.848)

HMatrix : 0.058 (0.060) 0.622 (0.623) 0.868 (0.868) BDT : 0.154 (0.268) 0.594 (0.736) 0.838 (0.911) CutsGA : 0.109 (0.123) 1.000 (0.424) 0.717 (0.715) Likelihood : 0.086 (0.092) 0.387 (0.379) 0.677 (0.677) ----------------------------------------------------------------------------- Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 57 M M oo rr ee TT oo yy E E xx aa m m pp ll ee ss Top Workshop, CERN, April 14.,LPSC,

2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 58 More Toys: Linear-, Cross-, Circular Correlations Illustrate the behaviour of linear and nonlinear classifiers Linear correlations Linear correlations Circular correlations (same for signal and background) (opposite for signal and background) (same for signal and background) Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 59 Weight Variables by Classifier Output

How well do the classifier resolve the various correlation patterns ? Linear correlations Cross-linear correlations Circular correlations (same for signal and background) (opposite for signal and background) (same for signal and background) Likelihood Likelihood PDERS Fisher - D Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 60 Weight Variables by Classifier Output How well do the classifier resolve the various correlation patterns ? Linear correlations Cross-linear correlations Circular correlations

(same for signal and background) (opposite for signal and background) (same for signal and background) Likelihood Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 61 Weight Variables by Classifier Output How well do the classifier resolve the various correlation patterns ? Linear correlations Cross-linear correlations Circular correlations (same for signal and background) (opposite for signal and background) (same for signal and background) Likelihood Likelihood -D

Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 62 Weight Variables by Classifier Output How well do the classifier resolve the various correlation patterns ? Linear correlations Cross-linear correlations Circular correlations (same for signal and background) (opposite for signal and background) (same for signal and background) Likelihood Likelihood PDERS- D Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis

Analysiswith withTMVA TMVA 63 Weight Variables by Classifier Output How well do the classifier resolve the various correlation patterns ? Linear correlations Cross-linear correlations Circular correlations (same for signal and background) (opposite for signal and background) (same for signal and background) Likelihood Likelihood PDERS Fisher MLP - D Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 64

Weight Variables by Classifier Output How well do the classifier resolve the various correlation patterns ? Linear correlations Cross-linear correlations Circular correlations (same for signal and background) (opposite for signal and background) (same for signal and background) Likelihood Likelihood PDERS Fisher MLP BDT - D Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 65 Final Classifier Performance Background rejection versus signal efficiency curve: Linear Example

Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 66 Final Classifier Performance Background rejection versus signal efficiency curve: Linear Cross Example Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 67 Final Classifier Performance Background rejection versus signal efficiency curve: Linear

Circular Cross Example Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 68 Event Distribution The Schachbrett Toy (chess board) Performance achieved without parameter adjustments: nearest Neighbour and BDTs are best out of the box After some parameter tuning, also SVM und ANN(MLP) perform Theoretical maximum Events weighted by SVM response Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis

Analysiswith withTMVA TMVA 69 Categorising Classifiers Categorising Classifiers Lets try our standard example of 4 Gaussian-distributed input variables: Now, var4 depends on a new variable eta (which may not be used for classification) for |eta| > 1.3 the Signal and Background Gaussian means are shifted w.r.t. |eta| < 1.3 |eta| > 1.3 Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA |eta| < 1.3 70 Categorising Classifiers Categorising Classifiers Lets try our standard example of 4 Gaussian-distributed input variables: Now, var4 depends on a new variable eta (which may not be used for classification) for |eta| > 1.3 the Signal and Background Gaussian means are shifted w.r.t. |eta| < 1.3

Recover optimal performance after splitting into categories The category technique is heavily used in multivariate likelihood fits, eg, RooFit (RooSimultaneousPdf) Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 71 Summary Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 72 No Single Best Classifier Classifiers Criteria

Performance no / linear correlations nonlinear correlations Training Speed Response Robust -ness Overtraining Weak input variables Curse of dimensionality Transparency Cuts Likelihood PDERS / k-NN

/ H-Matrix Fisher MLP BDT RuleFit SVM

The properties of the Function discriminant (FDA) depend on the chosen function Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 73 Summary MVA s for Cl assifi cat ion and R eg ressi on Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA The most imp ort an t classifi er s i mp lement ed in TMVA: reco ns tru ctin g t he P DF an d u se Lik elih oo d Rati o: Ne ares t n eig hb our (M ultid im ens io nal L ik elih oo d) Na ve- Bay es ian c l as s ifie r (1 dim (p roj ec t ive) L ik eli ho od)

fi ttin g d ire ctl y th e d eci sio n b ou nda ry: L in ear dis c ri mi nan t (F is h er) Ne uron al Net wo rk Su ppo rt V ec t or M a c hi ne Bo os te d D ec is i on Tres s In tr od u ction to TMVA Train ing Test ing /Ev alu ati on Toy examples 74 TMVA Development and Distribution TMVA is a now shipped with ROOT, project page on sourceforge Home page . SF project page . Mailing list ... Tutorial TWiki . http://tmva.sf.net/ http://sf.net/projects/tmva http://sf.net/mail/?group_id=152074 https://twiki.cern.ch/twiki/bin/view/TMVA/WebHome Active project fast response time on feature requests Currently 6 core developers, and ~25 contributors >3500 downloads since March 2006 (not accounting SVN checkouts and ROOT users) Written in C++, relying on core ROOT functionality Integrated and distributed with ROOT since ROOT v5.11/03 Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate

DataAnalysis Analysiswith withTMVA TMVA 75 In In development development Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 76 Multi-Class Classification Signal Background Binary classification: two classes, signal and background Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith

withTMVA TMVA 77 Multi-Class Classification Class 3 Class 1 Class 2 Class 4 Class 6 Class 5 Multi-class classification natural extension for many classifiers Top Workshop, CERN, April 14., LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 78 A A (( bb rr ii ee ff )) W W oo rr dd oo nn S S yy ss tt ee m m aa tt ii cc ss & & II rr rr ee ll ee vv aa nn tt II nn pp uu tt V

V aa rr ii aa bb ll ee ss Top Workshop, CERN, April 14., LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 79 Treatment of Systematic Uncertainties Assume strongest variable var4 suffers from systematic uncertainty Calibration uncertainty may shift the central value and hence worsen the discrimination power of var4 Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 80 Treatment of Systematic Uncertainties

Assume strongest variable var4 suffers from systematic uncertainty (at least) Two ways to deal with it: 1. Ignore the systematic in the training, and evaluate systematic error on classifier output - Drawbacks: var4 appears stronger in training than it might be suboptimal performance Classifier response will strongly depend on var4 2. Train with shifted (= weakened) var4, and evaluate systematic error on classifier output - Cures previous drawbacks If classifier output distributions can be validated with data control samples, the second drawback is mitigated, but not the first one (the performance loss) ! Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 81 Classifieroutput outputdistributions distributionsfor

forsignal signalonly only Classifier Treatment of Systematic Uncertainties 1st Way Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 82 Classifieroutput outputdistributions distributionsfor forsignal signalonly only Classifier Treatment of Systematic Uncertainties 21ndst Way Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker:

Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 83 Stability with Respect to Irrelevant Variables Toy example with 2 discriminating and 4 non-discriminating variables ? Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 84 Stability with Respect to Irrelevant Variables Toy example with 2 discriminating and 4 non-discriminating variables ? use useonly onlytwo twodiscriminant discriminant variables in classifiers variables in classifiers Top Workshop,

CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 85 Stability with Respect to Irrelevant Variables Toy example with 2 discriminating and 4 non-discriminating variables ? use only discriminant two useall only all discriminant twodiscriminant discriminant variables in classifiers variables in classifiers Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith

withTMVA TMVA 86 Minimisation Robust global minimum finder needed at various places in TMVA Brute force method: Monte Carlo Sampling Sample entire solution space, and chose solution providing minimum estimator Good global minimum finder, but poor accuracy Default solution in HEP: (T)Minuit/Migrad [ How much longer do we need to suffer . ? ] Gradient-driven search, using variable metric, can use quadratic Newton-type solution Poor global minimum finder, gets quickly stuck in presence of local minima Specific global optimisers implemented in TMVA: Genetic Algorithm: Simulated Annealing: solution biology-inspired optimisation algorithm slow cooling of system to avoid freezing in local TMVA allows to chain minimisers For example, one can use MC sampling to detect the vicinity of a global minimum, and then use Minuit to accurately converge to it. Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA

87 Minimizers Minuit Genetic Algorithm Monte Carlo Simulated Annealing Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 88 How does linear decorrelation affect strongly nonlinear cases ? Original correlations SQRT decorrelation Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate

Multivariate DataAnalysis Analysiswith withTMVA TMVA 89 Code Flow for Training and Application Phases T MVA tutorial Top Workshop, CERN, April 14.,LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 90 C C oo pp yy rr ii gg hh tt ss & & C C rr ee dd ii tt ss TMVA is open source software Use & redistribution of source permitted according to terms in BSD license Several similar data mining efforts with rising importance in most fields of science and industry Important for HEP: Parallelised MVA training and evaluation pioneered by Cornelius package (BABAR) Also frequently used: StatPatternRecognition package by I. Narsky (Cal Tech) Many implementations of individual classifiers exist Acknowledgments: The fast development of TMVA would not have been possible without the contribution and feedback from many developers and users to whom we are indebted. We thank in particular the CERN Summer students Matt Jachowski (Stanford) for the implementation of TMVA's new MLP neural network, Yair Mahalalel (Tel Aviv) and three genius Krakow mathematics

students for significant improvements of PDERS, the Krakow student Andrzej Zemla and his supervisor Marcin Wolter for programming a powerful Support Vector Machine, as well as Rustem Ospanov for the development of a fast k-NN algorithm. We thank Doug Schouten (S. Fraser U) for improving the BDT, Jan Therhaag (Bonn) for a reimplementation of LD including regression, and Eckhard v. Toerne (Bonn) for improving the Cuts evaluation. Many thanks to Dominik Dannheim, Alexander Voigt and Tancredi Carli (CERN) for the implementation of the PDEFoam approach. We are grateful to Doug Applegate, Kregg Arms, Ren Brun and the ROOT team, Zhiyi Liu, Elzbieta Richter-Was, Vincent Tisserand and Alexei Volk for helpful conversations. Top Workshop, CERN, April 14., LPSC, 2010 Oct 1820, 2007P. Speckmayer A. Hoecker: Multivariate Multivariate DataAnalysis Analysiswith withTMVA TMVA 91

Recently Viewed Presentations

  • B3 Gentica, ambiente e estilos de vida CienTIC

    B3 Gentica, ambiente e estilos de vida CienTIC

    A saúde e a sobrevivência de um indivíduo dependem da interação entre a sua informaçãogenética, o meioambiente onde está inserido e o estilodevida que pratica.. 2 /5. Saúde. Material genético? Meio ambiente? Estilo de vida?
  • Diapositiva 1 - University Carlo Cattaneo

    Diapositiva 1 - University Carlo Cattaneo

    Milano, settembre 2010 * Il retromarketing intende orientare la creazione di prodotti e servizi verso una reintegrazione del passato, e non verso una rottura con esso; cerca, inoltre, di legittimare tale reintegrazione con tutti i mezzi possibili di autentificazione. Bernard...
  • Letecká informační služba - rlp.cz

    Letecká informační služba - rlp.cz

    Letecká informační služba pro GA 2017 Seminář pro všeobecné letectví 11. 2. 2017, Jeneč Marek Dočkal / vedoucí Střediska LIS Obsah LIS - produkty přístup k NOTAM VFR příručka ČR a AIP ČR mapa 1:500 000 AIM - prezentace dat...
  • Glencoe Biology - Rochester Community Schools

    Glencoe Biology - Rochester Community Schools

    All echinoderms have radial symmetry as adults. 27.1 Echinoderm Characteristics Chapter 27 Water-vascular System Echinoderms and Invertebrate Chordates The water-vascular system is a system of fluid-filled, closed tubes that work together to enable echinoderms to move and get food.
  • He was also the first person of European

    He was also the first person of European

    tall tales. He escaped to the West to be free. He later married the daughter of the Crow chief and was accepted into the tribe. William Henry Ashley-He started a fur company of Americans ... PowerPoint Presentation Last modified by:
  • Victorian Curriculum: Mathematics F-6

    Victorian Curriculum: Mathematics F-6

    Victorian Curriculum F-10. Released in September 2015 as a central component of the Education State. Provides a stable foundation for the development and implementation of whole-school teaching and learning programs. The Victorian Curriculum F-10 incorporates the Australian Curriculum and reflects...
  • Oral Health Prevention strategies for better living and

    Oral Health Prevention strategies for better living and

    Unfortunately, cavernous sinus thrombosis had formed during this bad dental infection, causing quick deterioration of his health and ultimately death four days after hospital admission. Cavernous sinus thrombosis is caused by severe infection of the sinus cavity, which in this...
  • Civil Registration System - .:: Directorate of Census ...

    Civil Registration System - .:: Directorate of Census ...

    Registration of births/deaths is the primary responsibility of the State Governments. National Population Policy mandates complete registration. Complete and up to date CRS can yield indicators at different levels of aggregations which is not possible from any sample survey.