Specifying Datasets

You Will Learn:

  • What dataset source types are supported by ServiceX

  • How to define a dataset in Python and YAML for each source type

Physics analyses use a wide range of data types stored in a wide range of locations. The storage location determines the dataset definition, while the data type requires no special configuration. Four dataset source types are currently accepted.

Dataset Storage Options

Each of these will show how to define datasets in both Python and YAML formats. For all Python examples, dataset must be imported from servicex.

See also

For how datasets are added to a sample, see Samples.

Rucio

This dataset declaration looks up a dataset using a query to the Rucio data management system. The request is assumed to be for a Rucio dataset or container.

"Dataset": servicex.dataset.Rucio("my.rucio.dataset.name")
Dataset: !Rucio my.rucio.dataset.name

EOS

For files stored on EOS, two access methods are available. For discrete file selection, FileList is recommended; for entire directories or wildcard patterns, XRootD is the appropriate dataset type.

Danger

The ServiceX instance must have permissions to read these files; in particular if generic members of your experiment can’t access the files, ServiceX will probably not be able to either.

FileList

"Dataset": servicex.dataset.FileList(["root://eospublic.cern.ch//eos/opendata/mystuff/file1.root", "root://eospublic.cern.ch//eos/opendata/mystuff/file2.root"])

XRootD

Added in version 3.0.1.

"Dataset": servicex.dataset.XRootD("root://eospublic.cern.ch//eos/opendata/mystuff/*")

FileList

Dataset: !FileList ["root://eospublic.cern.ch//eos/opendata/mystuff/file1.root", "root://eospublic.cern.ch//eos/opendata/mystuff/file2.root"]

XRootD

Added in version 3.0.1.

Dataset: !XRootD root://eospublic.cern.ch//eos/opendata/mystuff/*

CERN Open Data Portal

Datasets from the CERN Open Data Portal are referenced by their numeric record ID.

"Dataset": servicex.dataset.CERNOpenData(179)
Dataset: !CERNOpenData 179

Network Accessible Files

Files accessible via HTTP or XRootD protocols can be provided directly as a list of URLs.

Danger

The ServiceX instance must have permissions to read these files; in particular if generic members of your experiment can’t access the files, ServiceX will probably not be able to either.

"Dataset": servicex.dataset.FileList(["http://server/file1.root", "root://server/file2.root"])
Dataset: !FileList ["http://server/file1.root", "root://server/file2.root"]