title | author | date | output | vignette | editor_options | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
The DataPackageR YAML configuration file. |
Greg Finak <gfinak@fredhutch.org> |
2018-10-24 |
|
%\VignetteIndexEntry{DataPackageR YAML configuration.} %\VignetteEngine{knitr::rmarkdown} %\usepackage[utf8]{inputenc} %\usepackage{graphicx}
|
|
Data package builds are controlled using the datapackager.yml
file.
This file is created in the package source tree when the user creates a package using datapackage_skeleton()
.
It is automatically populated with the names of the code_files
and data_objects
the passed in to datapackage_skeleton.
The structure of a correctly formatted datapackager.yml
file is shown below:
configuration:
files:
subsetCars.Rmd:
enabled: yes
objects: cars_over_20
render_root:
tmp: '450393'
The main section of the file is the configuration:
section.
It has three properties:
-
files:
The files (
R
orRmd
) to be processed by DataPackageR. They are processed in the order shown. Users running multi-script workflows with dependencies between the scripts need to ensure the files are processed in the correct order.Here
subsetCars.Rmd
is the only file to process. The name is transformed to an absolute path within the package.Each file itself has just one property:
enabled:
A logicalyes
,no
flag indicating whether the file should be rendered during the build, or whether it should be skipped. This is useful for 'turning off' long running processing tasks if they have not changed. Disabling processing of a file will not overwrite existing documentation or data objecs created during previous builds.
-
objects:
The names of the data objects created by the processing files, to be stored in the package. These names are compared against the objects created in the render environment by each file. They names must match.
-
render_root:
The directory where the
Rmd
orR
files will be rendered. Defaults to a randomly named subdirectory oftempdir()
. Allows workflows that use multiple scripts and create file system artifacts to function correctly by simply writing to and reading from the working directory.
The structure of the YAML is simple enough to understand but complex enough that it can be a pain to edit by hand.
DataPackageR provides a number of API calls to construct, read, modify, and write the yaml config file.
Make an r object representing a YAML config file.
The YAML config shown above was created by:
# Note this is done by the datapackage_skeleton.
# The user doesn't usually need to call
# construct_yml_config()
yml <- DataPackageR::construct_yml_config(
code = "subsetCars.Rmd",
data = "cars_over_20"
)
Read a yaml config file from a package path into an r object.
Read the YAML config file from the mtcars20
example.
# returns an r object representation of
# the config file.
mtcars20_config <- yml_find(
file.path(tempdir(),"mtcars20")
)
List the objects
in a config read by yml_find
.
yml_list_objects(yml)
cars_over_20
List the files
in a config read by yml_find
.
yml_list_files(yml)
subsetCars.Rmd
Disable compilation of named files in a config read by yml_find
.
yml_disabled <- yml_disable_compile(
yml,
filenames = "subsetCars.Rmd")
configuration:
files:
subsetCars.Rmd:
enabled: no
objects: cars_over_20
render_root:
tmp: '912178'
Enable compilation of named files in a config read by yml_find
.
yml_enabled <- yml_enable_compile(
yml,
filenames = "subsetCars.Rmd")
configuration:
files:
subsetCars.Rmd:
enabled: yes
objects: cars_over_20
render_root:
tmp: '912178'
Add named files to a config read by yml_find
.
yml_twofiles <- yml_add_files(
yml,
filenames = "anotherFile.Rmd")
configuration:
files:
subsetCars.Rmd:
enabled: yes
anotherFile.Rmd:
enabled: yes
objects: cars_over_20
render_root:
tmp: '912178'
Add named objects to a config read by yml_find
.
yml_twoobj <- yml_add_objects(
yml_twofiles,
objects = "another_object")
configuration:
files:
subsetCars.Rmd:
enabled: yes
anotherFile.Rmd:
enabled: yes
objects:
- cars_over_20
- another_object
render_root:
tmp: '912178'
Remove named files from a config read by yml_find
.
yml_twoobj <- yml_remove_files(
yml_twoobj,
filenames = "anotherFile.Rmd")
configuration:
files:
subsetCars.Rmd:
enabled: yes
objects:
- cars_over_20
- another_object
render_root:
tmp: '912178'
Remove named objects from a config read by yml_find
.
yml_oneobj <- yml_remove_objects(
yml_twoobj,
objects = "another_object")
configuration:
files:
subsetCars.Rmd:
enabled: yes
objects: cars_over_20
render_root:
tmp: '912178'
Write a modified config to its package path.
yml_write(yml_oneobj, path = "path_to_package")
The yml_oneobj
read by yml_find()
carries an attribute
that is the path to the package. The user doesn't need to pass a path
to yml_write
if the config has been read by yml_find
.