Muon and electron tag and probe measurements and plotting. The Framework is used to measure several correction factors including
-
Selection Efficiency of the DoubleMuon trigger used for Tau-Embedding samples
-
Selection Efficiency of the Muon ID used for Tau-Embedding samples
-
Muon ID Scale Factors
-
Muon Isolation Scale Factors
-
Muon Trigger Scale Factors
-
Electron ID Scale Factors
-
Electron Isolation Scale Factors
-
Electron Trigger Scale Factors
The following description of the workflow is using CROWN ntuples and was written in 06/2023. Several steps have to be performed, in order to obtain the finished correction factors:
- Production of Input NTuples using CROWN and Kingmaker
- Merging of intput Ntuples into one file for data, embedding and MC
- Creation of histograms for the tag and probe variables, based on the given Configuration
- Fitting of the histograms and creation of the final correction factors
- Plotting of the results
- Export of the Correction Factors in a valid correctionlib json file
To setup Kingmaker an CROWN, run
git clone --recursive git@github.com:KIT-CMS/KingMaker.git
cd KingMaker
source setup.sh KingMaker
This will set up an instance of Kingmaker, including a checkout of CROWN. In order to add the CROWN configuration needed for the tag and probe measurements, run
git clone git@github.com:KIT-CMS/TauAnalysis-CROWN.git CROWN/analysis_configurations/tau
Finally, the TagAndProbe Framework is setup using the checkout.sh
script in the scripts
folder. This will create a CMSSW environment and checkout the TagAndProbe Framework into the src
folder. The script can be run using
wget https://raw.githubusercontent.com/KIT-CMS/TagAndProbe/master/scripts/checkout.sh
source checkout.sh
The configuration used for the TagAndProbe NTuples is located in the collection of Tau Configurations for CROWN. They can be found in the https://github.com/KIT-CMS/TauAnalysis-CROWN repository. For the creation of the TagAndProbe ntuples, the tauembedding_tagandprobe.py
configuration is used. Additional information on how CROWN configurations work and how they are used can be found in the CROWN documentation. Adapt the configuration according to your needs.
After the configuration is setup, the user has to define the samples to be processed. The most convenient way to do so is to use the sample_manager
from KingMaker. First make sure, that the KingMaker environment is setup correctly in your shell (The setup is done using source setup.sh KingMaker
), then run the Manager using
sample_manager
which will give you an interactive CLI to manage your samples. To create a new production file, select Create a production file
The database contains 581 samples, split over 4 era(s) and 22 sampletype(s)
? What do you want to do? (Use arrow keys)
○ Add a new sample
○ Edit a sample (not implemented yet)
○ Delete a sample
○ Find samples (by nick)
○ Find samples (by DAS name)
○ Print details of a sample
» ○ Create a production file
○ Update genweight
○ Save and Exit
○ Exit without Save
Answer: Create a production file
In the next step, select the eras you want to use using the arrow keys and space bar
Select eras to be added (Use arrow keys to move, <space> to select, <a> to toggle, <i> to invert)
○ 2016postVFP
○ 2016preVFP
● 2018
» ● 2017
and then selection the sampleypes you want to process. For the TagAndProbe measurement, this should be embedding
, data
and dyjets
. Finally, chose a filename, and the manager will create a .txt file containing all selected samples. Double check, that the file contains all samples you want to process.
To produce the ntuples, run
law run ProduceSamples --analysis tau --config tauembedding_tagandprobe --sample-list your_samplesfile.txt --production-tag your_productiontag --workers 100 --scopes mm,ee
The different options are:
--analysis
: The analysis to be run. This is used to select the correct configuration file. The analysis is alwaystau
for the TagAndProbe ntuples.--config
: The configuration to be used. This is the name of the configuration file without the.py
extension. For the TagAndProbe ntuples, this istauembedding_tagandprobe
.--sample-list
: The name of the file containing the samples to be processed. This is the file created by the sample manager.--production-tag
: The production tag to be used. This is used to create the output directory. The samples will be stored on the grid, using this tag as identifier.--worker
: The number of workers to be used. One worker is able to submit and track one sample at a time. To submit and process all samples in parallel, use a high number of workers.--scopes
: The scopes to be used. This is a comma separated list of scopes. For the TagAndProbe ntuples, this should bemm
for Emebdding Selection Corrections and Muon Scale Factors andee
for Electron Scale Factors.
law
will now process all samples listed in the sample-list
file. This will take some time, depending on the number of samples and workers used. After the processing is finished, the output files will be stored on the grid. The default in KingMaker uses the T3 GridKa storage, a different storage location can be configured in the KingMaker configuration file.
After the NTuples are produced, the merge_outputs script via
./merge_outputs.sh your_productiontag (muon|electron) your_era
The script will collect the samples from the grid and merge them into one file per sample. The merged files will be stored in /ceph/${USER}/Run2UltraLegacy/scalefactors/${your_productiontag}/${your_era}UL
. After the merging, the set_inputfiles.yaml file has to be adapted to use the merged files as input.
The input files
The Configurations can be found in the configurations folder. The configuration files are named according to the era and channel, they are used for. They contain a list of different scale factors to measure. The configuration for each scale factor may look like the example shown below:
Iso_pt_eta_bins:
name: "Iso_pt_eta_bins"
var: "m_vis(100,50,150)"
tag: " iso_tag < 0.15 && fsr_photon_veto_tag < 0.5 && fsr_photon_veto_probe < 0.5 && dz_tag < 0.2 && dxy_tag < 0.045 && id_medium_tag && trg_IsoMu27_tag && pt_tag > 28 && id_medium_probe"
probe: "iso_probe < 0.15"
binvar_x: "pt_probe"
bins_x:
[
10., 15., 20., 22., 24., 26., 28., 30., 32., 34., 36., 38., 40., 45., 50., 60., 80., 100., 200., 1000.,
]
binvar_y: "abs(eta_probe)"
bins_y: [0, 0.9, 1.2, 2.1, 2.4]
# Settings for fits
BKG: "Exponential"
SIG: "DoubleVCorr"
# Settings for plotting
TITLE: "Iso_{#mu,rel} < 0.15"
y_range: [0.6, 1.0]
ratio_y_range: [0.8, 1.2]
info: "Scale factor for the muon isolation iso_mu < 0.15 (binned in eta). Scale factor is provided for Embedding (emb) and Simulation (mc)"
header: "Scale factor for the muon isolation iso_mu < 0.15 (binned in eta)"
The different required settings are:
name
: The name of the scale factor. This is used to name the output files.var
: The variable used for the histogram. This is the variable used for the fit. Usually, the mass of the dilepton system calledm_vis
is used.tag
: The selection string used to define the tag lepton. Internally, the terms_tag
and_probe
are replaced by_1
and_2
to ensure, that every lepton can be used as tag and probe.probe
: The selection string used to define the probe lepton. This is the quantity, for which the efficiency is measured.binvar_x
: The variable used for the x-axis of the histogram. This is usually thept
of the probe lepton.bins_x
: The binning used for the x-axis of the histogram. This is a list of bin edges.binvar_y
: The variable used for the y-axis of the histogram. This is usually theeta
of the probe lepton.bins_y
: The binning used for the y-axis of the histogram. This is a list of bin edges.BKG
: The background model used for the fit. This is the name of the background model as defined in the ./scripts/fitTagAndProbe_script.py file.SIG
: The signal model used for the fit. This is the name of the signal model as defined in the ./scripts/fitTagAndProbe_script.py file.TITLE
: The title of the histogram. This is used for plotting.y_range
: The range of the y-axis of the efficiency plot.ratio_y_range
: The range of the y-axis of the ratio plot.info
: A short description of the scale factor. This is used for the correctionlib file.header
: A short description of the scale factor. This is used for the correctionlib file.
Quantities, which not requiere leg-switching procedure are stored in <file>_double_object_quantities.yaml
files and are currently used for e.g. the efficience of the DZ
filter of trg_Mu17TrkMu8_DZ_Mu17
trigger.
Finally, the scale factors are calculated using
channel="embeddingselection" # "muon" or "electron"
era="2016postVFPUL" # "2016preVFPUL", "2017UL", ...
python3 scripts/TagAndProbe.py --channel $channel --era $era
# In case of embeddingselection only: Addition of measured dz quantity of trg_Mu17TrkMu8_DZ_Mu17
if [[ "$channel" == embeddingselection ]]
then
python3 scripts/TagAndProbe.py --channel $channel --era $era --no-leg-switching --mode="UPDATE"
fi
python3 scripts/runTagAndProbeFits.py --channel $channel --era $era --fit --plot
the resulting fits and efficiency plots are all stored in the output
folder. They should be checked carefully, to and all previous steps should be repeated if necessary.
It is also possible to run multiple eras and channels in parallel, by modifying channels
, eras
, output_dir
in run_scalefactor_creation.sh
In the last step, the scale factors are stored in correctionlib files. This is done using the translate_to_crosspog_json.py script. The script can be run using
python3 scripts/translate_to_crosspog_json.py -e your_era -c (embeddingselection|muon|electron) -o output
Congratulations, you have created the scale factors for the UL Run2 Legacy analysis!
One can extract scale factors from the ROOT workspace and write it to the correction lib format. The code is based on KIT Tag and Probe repoditory
Before the execution, one should activate CROWN's tau environment, e.g
source /work/${USER}/CROWN/init.sh tau
Execution:
python3 create_crosspog_json_workspace.py -e your_era -c muon -o output
Due to the drop in DZ efficiency of the used double muon trigger (HLT_Mu17_TrkIsoVVL_Mu8_TrkIsoVVL_DZ
) that was not accounted for, an additional measurement of this efficiency is required. Unfortunately, the information of this efficiency is not available at NanoAOD level, requiring to create a custom NanoAOD (discussed here) or a FriendTree (to be updated) containing this filter information as a qualitybit.
This unfolding correction is applied only on data, a MC production is not needed.
For the NanoAOD creation, you require CMSSW_10_6_35
with PhysicsTools
package (git cms-addpkg PhysicsTools/NanoAOD
). grid-control is advisable, if the production should be performed non locally.
Modify the 2016 muon trigger table in CMSSW_10_6_35/src/PhysicsTools/NanoAOD/python/triggerObjects_cff.py
by adding the two lines:
"4096*filter('hltDiMuonGlb17Glb8RelTrkIsoFiltered0p4') + " \ # -> "bit 12"
"8192*filter('hltDiMuonGlb17Glb8RelTrkIsoFiltered0p4DzFiltered0p2')" # -> "bit 13"
Compile and execute the according run config. Starting from here: MiniAODNanoAOD->ReReco-Run2016 link of your choice ->ReqMgr->ConfigCacheID. There you need to modify the execution string according to your needs to
RECO --conditions 106X_dataRun2_v35 --customise Configuration/DataProcessing/Utils.addMonitoring --datatier NANOAOD --era Run2_2016,run2_nanoAOD_106Xv2 --eventcontent NANOAOD --filein _placeholder_.root --fileout file:ReReco-Run2016G-DoubleMuon-UL2016_MiniAODv2_NanoAODv9-00002.root --nThreads 2 --no_exec --python_filename ReReco-Run2016G-DoubleMuon-UL2016_MiniAODv2_NanoAODv9-00002_0_cfg.py --scenario pp --step NANO --data -n 10000
If grid-control is used the generated .py
file needs to be extended with
This
####@FILE_NAMES@, @SKIP_EVENTS@, @MAX_EVENTS@
from IOMC.RandomEngine.RandomServiceHelper import RandomNumberServiceHelper
randSvc = RandomNumberServiceHelper(process.RandomNumberGeneratorService)
randSvc.populate()
print(
"Generator random seed: %s"
% process.RandomNumberGeneratorService.generator.initialSeed
)
def customise_for_gc(process):
import FWCore.ParameterSet.Config as cms
from IOMC.RandomEngine.RandomServiceHelper import RandomNumberServiceHelper
try:
maxevents = int(__MAX_EVENTS__)
process.maxEvents = cms.untracked.PSet(
input=cms.untracked.int32(max(-1, maxevents))
)
except Exception:
pass
# Dataset related setup
try:
primaryFiles = [__FILE_NAMES__]
if not hasattr(process, "source"):
print("creating input source for grid-control")
process.source = cms.Source(
"PoolSource",
skipEvents=cms.untracked.uint32(__SKIP_EVENTS__),
fileNames=cms.untracked.vstring(primaryFiles),
)
else:
print("input source already exists, adapting it for grid-control")
process.source.skipEvents = cms.untracked.uint32(__SKIP_EVENTS__)
process.source.fileNames = cms.untracked.vstring(primaryFiles)
try:
secondaryFiles = [__FILE_NAMES2__]
process.source.secondaryFileNames = cms.untracked.vstring(secondaryFiles)
except Exception:
pass
try:
lumirange = [__LUMI_RANGE__]
if len(lumirange) > 0:
process.source.lumisToProcess = cms.untracked.VLuminosityBlockRange(
lumirange
)
process.maxEvents = cms.untracked.PSet(input=cms.untracked.int32(-1))
except Exception:
pass
except Exception:
pass
if hasattr(process, "RandomNumberGeneratorService"):
randSvc = RandomNumberServiceHelper(process.RandomNumberGeneratorService)
randSvc.populate()
print(
"Generator random seed: %s"
% process.RandomNumberGeneratorService.generator.initialSeed
)
process.AdaptorConfig = cms.Service(
"AdaptorConfig",
enable=cms.untracked.bool(True),
stats=cms.untracked.bool(True),
)
# Generator related setup
try:
if hasattr(process, "generator") and process.source.type_() != "PoolSource":
process.source.firstLuminosityBlock = cms.untracked.uint32(
1 + __GC_JOB_ID__
)
print(
"Generator random seed: %s"
% process.RandomNumberGeneratorService.generator.initialSeed
)
except Exception:
pass
# Print GlobalTag for DBS3 registration - output is taken from edmConfigHash
try:
print("globaltag:%s" % process.GlobalTag.globaltag.value())
except Exception:
pass
return process
process = customise_for_gc(process)
An possible grid-control config-file might look like this:
[global]
task = CMSSW
backend = condor
;;backend = cream
;;cmdargs = -G -c -m 0
workdir create = True
;; ADAPT
workdir = /work/${USER}/GridControlNanoAODProduction/DoubleMuon_Run2016G-UL/workdir
[jobs]
in flight = 1000
wall time = 05:00:00
memory = 3500
max retry = 1
[CMSSW]
dataset refresh = 4:00:00
project area = $CMSSW_BASE/
;; ADAPT
config file = ReReco-Run2016G-DoubleMuon-UL2016_MiniAODv2_NanoAODv9-00002_0_cfg.py
instrumentation = False
dataset splitter = EventBoundarySplitter
events per job = 100000
;; ADAPT
dataset = DoubleMuon_Run2016G-UL : list:/work/${USER}/GridControlNanoAODProduction/DoubleMuon_Run2016G-UL/filelist-DoubleMuon_Run2016G-UL.txt # filelist via i.e. dasgoclient
se runtime = True
partition lfn modifier = <xrootd>
depends = glite
parameter factory = ModularParameterFactory
partition lfn modifier dict =
<xrootd> => root://cms-xrd-global.cern.ch//
<xrootd:eu> => root://xrootd-cms.infn.it//
<xrootd:us> => root://cmsxrootd.fnal.gov//
<xrootd:desy> => root://dcache-cms-xrootd.desy.de:1094/
<srm:nrg> => srm://cmssrm-kit.gridka.de:8443/srm/managerv2?SFN=/pnfs/gridka.de/cms/disk-only/
<dcap:nrg> => dcap://dcnrgdcap.gridka.de:22125//pnfs/gridka.de/dcms/disk-only/
<xrootd:nrg> => root://cmsxrootd-redirectors.gridka.de/
<dcap:gridka> => dcap://dccmsdcap.gridka.de:22125//pnfs/gridka.de/cms/disk-only/
<xrootd:gridka> => root://cmsxrootd.gridka.de//
<dcap:aachen> => dcap://grid-dcap-extern.physik.rwth-aachen.de/pnfs/physik.rwth-aachen.de/cms/
[storage]
;; ADAPT
se output files = ReReco-Run2016G-DoubleMuon-UL2016_MiniAODv2_NanoAODv9-00002.root
;; ADAPT
se output pattern = ReReco-Run2016G-DoubleMuon-UL2016_MiniAODv2_NanoAODv9-00002/@FOLDER@/@XBASE@_@GC_JOB_ID@.@XEXT@
;; ADAPT
se path = srm://cmssrm-kit.gridka.de:8443/srm/managerv2?SFN=/pnfs/gridka.de/cms/disk-only/store/user/${USER}/gc_storage/PrivateNanoAODProduction/ReReco-Run2016G-DoubleMuon-UL2016_MiniAODv2_NanoAODv9-00002
scratch space used = 7500
[condor]
JDLData = Requirements=(TARGET.ProvidesCPU==True)&&(TARGET.ProvidesIO==True) +REMOTEJOB=True accounting_group=cms.higgs request_disk=8000 universe=docker docker_image=mschnepf/slc7-condocker
proxy = VomsProxy
[constants]
GC_GLITE_LOCATION = /cvmfs/grid.cern.ch/umd-c7ui-latest/etc/profile.d/setup-c7-ui-example.sh
[parameters]
parameters = transform('FOLDER', 'GC_JOB_ID % 100 + 1')
Adjust at least the ;;Adapt
marked lines according to your needs. For details on how to set up and execute grid-control please refer to grid-control documentation and examples