Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweaks for best practices and miscellaneous cleanup #108

Merged
merged 11 commits into from
Apr 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 2 additions & 8 deletions .Rbuildignore
Original file line number Diff line number Diff line change
@@ -1,21 +1,15 @@
^CRAN-RELEASE$
^Meta$
^doc$
^.*\.Rproj$
^\.Rproj\.user$
^README\.Rmd$
^README-.*\.png$
^docs$
README.html
^codecov\.yml$
^\.travis\.yml$
^CODE_OF_CONDUCT\.md$
^appveyor\.yml$
bibliography.bib
^codemeta\.json$
^\.github$
cran-comments.md
^revdep$
^$
^cran-comments\.md$
^DataPackageR\.Rproj$
^CRAN-SUBMISSION$
^LICENSE\.md$
10 changes: 3 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
Meta
doc
.Rproj.user
.Rhistory
.RData
README.html
preprocessData.Rproj
codecov.yml
.travis.yml
/revdep/.cache.rds
check/
.DS_Store
.httr-oauth
inst/doc
2 changes: 0 additions & 2 deletions CRAN-RELEASE

This file was deleted.

5 changes: 3 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,9 @@ Suggests:
testthat,
covr,
data.tree,
URL: https://docs.ropensci.org/DataPackageR/ (website)
https://github.com/ropensci/DataPackageR/
URL:
https://github.com/ropensci/DataPackageR/,
https://docs.ropensci.org/DataPackageR/
BugReports: https://github.com/ropensci/DataPackageR/issues
SystemRequirements: pandoc (>= 1.12.3) - http://pandoc.org
Language: en-US
4 changes: 2 additions & 2 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
YEAR: 2018
COPYRIGHT HOLDER: Greg Finak
YEAR: 2024
COPYRIGHT HOLDER: Greg Finak
21 changes: 21 additions & 0 deletions LICENSE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# MIT License

Copyright (c) 2024 Greg Finak

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
88 changes: 88 additions & 0 deletions R/DataPackageR-package.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
#' DataPackageR
#'
#' A framework to automate the processing, tidying and packaging of raw data into analysis-ready
#' data sets as R packages.
#'
#' DataPackageR will automate running of data processing code,
#' storing tidied data sets in an R package, producing
#' data documentation stubs, tracking data object finger prints (md5 hash)
#' and tracking and incrementing a "DataVersion" string
#' in the DESCRIPTION file of the package when raw data or data
#' objects change.
#' Code to perform the data processing is passed to DataPackageR by the user.
#' The user also specifies the names of the tidy data objects to be stored,
#' documented and tracked in the final package. Raw data should be read from
#' "inst/extdata" but large raw data files can be read from sources external
#' to the package source tree.
#'
#' Configuration is controlled via the config.yml file created at the package root.
#' Its properties include a list of R and Rmd files that are to be rendered / sourced and
#' which read data and do the actual processing.
#' It also includes a list of r object names created by those files. These objects
#' are stored in the final package and accessible via the \code{data()} API.
#' The documentation for these objects is accessible via "?object-name", and md5
#' fingerprints of these objects are created and tracked.
#'
#' The Rmd and R files used to process the objects are transformed into vignettes
#' accessible in the final package so that the processing is fully documented.
#'
#' A DATADIGEST file in the package source keeps track of the data object fingerprints.
#' A DataVersion string is added to the package DESCRIPTION file and updated when these
#' objects are updated or changed on subsequent builds.
#'
#' Once the package is built and installed, the data objects created in the package are accessible via
#' the \code{data()} API, and
#' Calling \code{datapackage_skeleton()} and passing in R / Rmd file names, and r object names
#' constructs a skeleton data package source tree and an associated \code{config.yml} file.
#'
#' Calling \code{package_build()} sets the build process in motion.
#' @examples
#' # A simple Rmd file that creates one data object
#' # named "tbl".
#' if(rmarkdown::pandoc_available()){
#' f <- tempdir()
#' f <- file.path(f,"foo.Rmd")
#' con <- file(f)
#' writeLines("```{r}\n tbl = data.frame(1:10) \n```\n",con=con)
#' close(con)
#'
#' # construct a data package skeleton named "MyDataPackage" and pass
#' # in the Rmd file name with full path, and the name of the object(s) it
#' # creates.
#'
#' pname <- basename(tempfile())
#' datapackage_skeleton(name=pname,
#' path=tempdir(),
#' force = TRUE,
#' r_object_names = "tbl",
#' code_files = f)
#'
#' # call package_build to run the "foo.Rmd" processing and
#' # build a data package.
#' package_build(file.path(tempdir(), pname), install = FALSE)
#'
#' # "install" the data package
#' devtools::load_all(file.path(tempdir(), pname))
#'
#' # read the data version
#' data_version(pname)
#'
#' # list the data sets in the package.
#' data(package = pname)
#'
#' # The data objects are in the package source under "/data"
#' list.files(pattern="rda", path = file.path(tempdir(),pname,"data"), full = TRUE)
#'
#' # The documentation that needs to be edited is in "/R"
#' list.files(pattern="R", path = file.path(tempdir(), pname,"R"), full = TRUE)
#' readLines(list.files(pattern="R", path = file.path(tempdir(),pname,"R"), full = TRUE))
#' # view the documentation with
#' ?tbl
#' }
#' @name DataPackageR-package
#' @keywords internal
'_PACKAGE'

## usethis namespace: start
## usethis namespace: end
NULL
85 changes: 0 additions & 85 deletions R/processData.R
Original file line number Diff line number Diff line change
@@ -1,88 +1,3 @@
#' DataPackageR
#'
#' A framework to automate the processing, tidying and packaging of raw data into analysis-ready
#' data sets as R packages.
#'
#' DataPackageR will automate running of data processing code,
#' storing tidied data sets in an R package, producing
#' data documentation stubs, tracking data object finger prints (md5 hash)
#' and tracking and incrementing a "DataVersion" string
#' in the DESCRIPTION file of the package when raw data or data
#' objects change.
#' Code to perform the data processing is passed to DataPackageR by the user.
#' The user also specifies the names of the tidy data objects to be stored,
#' documented and tracked in the final package. Raw data should be read from
#' "inst/extdata" but large raw data files can be read from sources external
#' to the package source tree.
#'
#' Configuration is controlled via the config.yml file created at the package root.
#' Its properties include a list of R and Rmd files that are to be rendered / sourced and
#' which read data and do the actual processing.
#' It also includes a list of r object names created by those files. These objects
#' are stored in the final package and accessible via the \code{data()} API.
#' The documentation for these objects is accessible via "?object-name", and md5
#' fingerprints of these objects are created and tracked.
#'
#' The Rmd and R files used to process the objects are transformed into vignettes
#' accessible in the final package so that the processing is fully documented.
#'
#' A DATADIGEST file in the package source keeps track of the data object fingerprints.
#' A DataVersion string is added to the package DESCRIPTION file and updated when these
#' objects are updated or changed on subsequent builds.
#'
#' Once the package is built and installed, the data objects created in the package are accessible via
#' the \code{data()} API, and
#' Calling \code{datapackage_skeleton()} and passing in R / Rmd file names, and r object names
#' constructs a skeleton data package source tree and an associated \code{config.yml} file.
#'
#' Calling \code{build_package()} sets the build process in motion.
#' @examples
#' # A simple Rmd file that creates one data object
#' # named "tbl".
#' if(rmarkdown::pandoc_available()){
#' f <- tempdir()
#' f <- file.path(f,"foo.Rmd")
#' con <- file(f)
#' writeLines("```{r}\n tbl = data.frame(1:10) \n```\n",con=con)
#' close(con)
#'
#' # construct a data package skeleton named "MyDataPackage" and pass
#' # in the Rmd file name with full path, and the name of the object(s) it
#' # creates.
#'
#' pname <- basename(tempfile())
#' datapackage_skeleton(name=pname,
#' path=tempdir(),
#' force = TRUE,
#' r_object_names = "tbl",
#' code_files = f)
#'
#' # call package_build to run the "foo.Rmd" processing and
#' # build a data package.
#' package_build(file.path(tempdir(), pname), install = FALSE)
#'
#' # "install" the data package
#' devtools::load_all(file.path(tempdir(), pname))
#'
#' # read the data version
#' data_version(pname)
#'
#' # list the data sets in the package.
#' data(package = pname)
#'
#' # The data objects are in the package source under "/data"
#' list.files(pattern="rda", path = file.path(tempdir(),pname,"data"), full = TRUE)
#'
#' # The documentation that needs to be edited is in "/R"
#' list.files(pattern="R", path = file.path(tempdir(), pname,"R"), full = TRUE)
#' readLines(list.files(pattern="R", path = file.path(tempdir(),pname,"R"), full = TRUE))
#' # view the documentation with
#' ?tbl
#' }
#' @name DataPackageR-package
'_PACKAGE'


.validate_render_root <- function(x) {
# catch an error if it doesn't exist
render_root <-
Expand Down
File renamed without changes.
6 changes: 3 additions & 3 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ DataPackageR is used to reproducibly process raw data into packaged, analysis-re
<!-- badges: start -->
[![CRAN](https://www.r-pkg.org/badges/version/DataPackageR)]( https://CRAN.R-project.org/package=DataPackageR)
[![R-CMD-check](https://github.com/ropensci/DataPackageR/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/DataPackageR/actions)
[![Coverage status](https://codecov.io/gh/ropensci/DataPackageR/branch/master/graph/badge.svg)](https://codecov.io/github/ropensci/DataPackageR?branch=master)
[![Coverage status](https://codecov.io/gh/ropensci/DataPackageR/branch/main/graph/badge.svg)](https://app.codecov.io/github/ropensci/DataPackageR?branch=main)
[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![](https://badges.ropensci.org/230_status.svg)](https://github.com/ropensci/software-review/issues/230)
[![DOI](https://zenodo.org/badge/29267435.svg)](https://doi.org/10.5281/zenodo.1292095)
Expand Down Expand Up @@ -75,9 +75,9 @@ You have diverse raw data sets that you need to preprocess and tidy in order to:

- **Package size limits.**

R packages have a 5MB size limit, at least on CRAN. BioConductor has explicit [data package](https://www.bioconductor.org/developers/package-guidelines/#package-types) types that can be larger and use git LFS for very large files.
R packages have a 10MB size limit, at least on [CRAN](https://cran.r-project.org/web/packages/policies.html). BioConductor [ExperimentHub](http://contributions.bioconductor.org/data.html#data) may be able to support larger data packages.

Sharing large volumes of raw data in an R package format is still not ideal, and there are public biological data repositories better suited for raw data: e.g., [GEO](https://www.ncbi.nlm.nih.gov/geo/), [SRA](https://www.ncbi.nlm.nih.gov/sra), [ImmPort](https://www.immport.org:443/shared/immport-open/public/home/home), [ImmuneSpace](https://immunespace.org/), [FlowRepository](https://flowrepository.org/).
Sharing large volumes of raw data in an R package format is still not ideal, and there are public biological data repositories better suited for raw data: e.g., [GEO](https://www.ncbi.nlm.nih.gov/geo/), [SRA](https://www.ncbi.nlm.nih.gov/sra), [ImmPort](https://www.immport.org/), [ImmuneSpace](https://immunespace.org/), [FlowRepository](http://flowrepository.org/).

Tools like [datastorr](https://github.com/ropenscilabs/datastorr) can help with this and we hope to integrate the into DataPackageR in the future.

Expand Down
11 changes: 4 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ analysis-ready data sets.
[![CRAN](https://www.r-pkg.org/badges/version/DataPackageR)](https://CRAN.R-project.org/package=DataPackageR)
[![R-CMD-check](https://github.com/ropensci/DataPackageR/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/DataPackageR/actions)
[![Coverage
status](https://codecov.io/gh/ropensci/DataPackageR/branch/master/graph/badge.svg)](https://codecov.io/github/ropensci/DataPackageR?branch=master)
status](https://codecov.io/gh/ropensci/DataPackageR/branch/main/graph/badge.svg)](https://app.codecov.io/github/ropensci/DataPackageR?branch=main)
[![Project Status: Active – The project has reached a stable, usable
state and is being actively
developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
Expand Down Expand Up @@ -82,18 +82,15 @@ purpose is to contain, access, and / or document data sets.

- **Package size limits.**

R packages have a 5MB size limit, at least on CRAN. BioConductor has
explicit [data
package](https://www.bioconductor.org/developers/package-guidelines/#package-types)
types that can be larger and use git LFS for very large files.
R packages have a 10MB size limit, at least on [CRAN](https://cran.r-project.org/web/packages/policies.html). BioConductor [ExperimentHub](http://contributions.bioconductor.org/data.html#data) may be able to support larger data packages.

Sharing large volumes of raw data in an R package format is still
not ideal, and there are public biological data repositories better
suited for raw data: e.g., [GEO](https://www.ncbi.nlm.nih.gov/geo/),
[SRA](https://www.ncbi.nlm.nih.gov/sra),
[ImmPort](https://www.immport.org:443/shared/immport-open/public/home/home),
[ImmPort](https://www.immport.org/),
[ImmuneSpace](https://immunespace.org/),
[FlowRepository](https://flowrepository.org/).
[FlowRepository](http://flowrepository.org/).

Tools like [datastorr](https://github.com/traitecoevo/datastorr)
can help with this and we hope to integrate the into DataPackageR in
Expand Down
Loading
Loading