ropensci · slager · Apr 1, 2024 · Mar 31, 2024 · Mar 31, 2024 · Mar 31, 2024
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -1,21 +1,15 @@
-^CRAN-RELEASE$
 ^Meta$
 ^doc$
 ^.*\.Rproj$
 ^\.Rproj\.user$
 ^README\.Rmd$
-^README-.*\.png$
-^docs$
-README.html
 ^codecov\.yml$
-^\.travis\.yml$
 ^CODE_OF_CONDUCT\.md$
-^appveyor\.yml$
 bibliography.bib
 ^codemeta\.json$
 ^\.github$
-cran-comments.md
 ^revdep$
-^$
 ^cran-comments\.md$
 ^DataPackageR\.Rproj$
+^CRAN-SUBMISSION$
+^LICENSE\.md$
diff --git a/.gitignore b/.gitignore
@@ -1,11 +1,7 @@
-Meta
-doc
 .Rproj.user
 .Rhistory
 .RData
-README.html
-preprocessData.Rproj
-codecov.yml
-.travis.yml
 /revdep/.cache.rds
-check/
+.DS_Store
+.httr-oauth
+inst/doc
diff --git a/CRAN-RELEASE b/CRAN-RELEASE
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -66,8 +66,9 @@ Suggests:
     testthat,
     covr,
     data.tree,
-URL: https://docs.ropensci.org/DataPackageR/ (website)
-    https://github.com/ropensci/DataPackageR/
+URL:
+    https://github.com/ropensci/DataPackageR/,
+    https://docs.ropensci.org/DataPackageR/
 BugReports: https://github.com/ropensci/DataPackageR/issues
 SystemRequirements: pandoc (>= 1.12.3) - http://pandoc.org
 Language: en-US
diff --git a/LICENSE b/LICENSE
@@ -1,2 +1,2 @@
-YEAR: 2018
-COPYRIGHT HOLDER: Greg Finak
+YEAR: 2024
+COPYRIGHT HOLDER: Greg Finak
diff --git a/LICENSE.md b/LICENSE.md
@@ -0,0 +1,21 @@
+# MIT License
+
+Copyright (c) 2024 Greg Finak
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/R/DataPackageR-package.R b/R/DataPackageR-package.R
@@ -0,0 +1,88 @@
+#' DataPackageR
+#'
+#' A framework to automate the processing, tidying and packaging of raw data into analysis-ready
+#' data sets as R packages.
+#'
+#' DataPackageR will automate running of data processing code,
+#' storing tidied data sets in an R package, producing
+#' data documentation stubs, tracking data object finger prints (md5 hash)
+#' and tracking and incrementing a "DataVersion" string
+#' in the DESCRIPTION file of the package when raw data or data
+#' objects change.
+#' Code to perform the data processing is passed to DataPackageR by the user.
+#' The user also specifies the names of the tidy data objects to be stored,
+#' documented and tracked in the final package. Raw data should be read from
+#' "inst/extdata" but large raw data files can be read from sources external
+#' to the package source tree.
+#'
+#' Configuration is controlled via the config.yml file created at the package root.
+#' Its properties include a list of R and Rmd files that are to be rendered / sourced and
+#' which read data and do the actual processing.
+#' It also includes a list of r object names created by those files. These objects
+#' are stored in the final package and accessible via the \code{data()} API.
+#' The documentation for these objects is accessible via "?object-name", and md5
+#' fingerprints of these objects are created and tracked.
+#'
+#' The Rmd and R files used to process the objects are transformed into vignettes
+#' accessible in the final package so that the processing is fully documented.
+#'
+#' A DATADIGEST file in the package source keeps track of the data object fingerprints.
+#' A DataVersion string is added to the package DESCRIPTION file and updated when these
+#' objects are updated or changed on subsequent builds.
+#'
+#' Once the package is built and installed, the data objects created in the package are accessible via
+#' the \code{data()} API, and
+#' Calling \code{datapackage_skeleton()} and passing in R / Rmd file names, and r object names
+#' constructs a skeleton data package source tree and an associated \code{config.yml} file.
+#'
+#' Calling \code{package_build()} sets the build process in motion.
+#' @examples
+#' # A simple Rmd file that creates one data object
+#' # named "tbl".
+#' if(rmarkdown::pandoc_available()){
+#' f <- tempdir()
+#' f <- file.path(f,"foo.Rmd")
+#' con <- file(f)
+#' writeLines("```{r}\n tbl = data.frame(1:10) \n```\n",con=con)
+#' close(con)
+#'
+#' # construct a data package skeleton named "MyDataPackage" and pass
+#' # in the Rmd file name with full path, and the name of the object(s) it
+#' # creates.
+#'
+#' pname <- basename(tempfile())
+#' datapackage_skeleton(name=pname,
+#'    path=tempdir(),
+#'    force = TRUE,
+#'    r_object_names = "tbl",
+#'    code_files = f)
+#'
+#' # call package_build to run the "foo.Rmd" processing and
+#' # build a data package.
+#' package_build(file.path(tempdir(), pname), install = FALSE)
+#'
+#' # "install" the data package
+#' devtools::load_all(file.path(tempdir(), pname))
+#'
+#' # read the data version
+#' data_version(pname)
+#'
+#' # list the data sets in the package.
+#' data(package = pname)
+#'
+#' # The data objects are in the package source under "/data"
+#' list.files(pattern="rda", path = file.path(tempdir(),pname,"data"), full = TRUE)
+#'
+#' # The documentation that needs to be edited is in "/R"
+#' list.files(pattern="R", path = file.path(tempdir(), pname,"R"), full = TRUE)
+#' readLines(list.files(pattern="R", path = file.path(tempdir(),pname,"R"), full = TRUE))
+#' # view the documentation with
+#' ?tbl
+#' }
+#' @name DataPackageR-package
+#' @keywords internal
+'_PACKAGE'
+
+## usethis namespace: start
+## usethis namespace: end
+NULL
diff --git a/R/processData.R b/R/processData.R
@@ -1,88 +1,3 @@
-#' DataPackageR
-#'
-#' A framework to automate the processing, tidying and packaging of raw data into analysis-ready
-#' data sets as R packages.
-#'
-#' DataPackageR will automate running of data processing code,
-#' storing tidied data sets in an R package, producing
-#' data documentation stubs, tracking data object finger prints (md5 hash)
-#' and tracking and incrementing a "DataVersion" string
-#' in the DESCRIPTION file of the package when raw data or data
-#' objects change.
-#' Code to perform the data processing is passed to DataPackageR by the user.
-#' The user also specifies the names of the tidy data objects to be stored,
-#' documented and tracked in the final package. Raw data should be read from
-#' "inst/extdata" but large raw data files can be read from sources external
-#' to the package source tree.
-#'
-#' Configuration is controlled via the config.yml file created at the package root.
-#' Its properties include a list of R and Rmd files that are to be rendered / sourced and
-#' which read data and do the actual processing.
-#' It also includes a list of r object names created by those files. These objects
-#' are stored in the final package and accessible via the \code{data()} API.
-#' The documentation for these objects is accessible via "?object-name", and md5
-#' fingerprints of these objects are created and tracked.
-#'
-#' The Rmd and R files used to process the objects are transformed into vignettes
-#' accessible in the final package so that the processing is fully documented.
-#'
-#' A DATADIGEST file in the package source keeps track of the data object fingerprints.
-#' A DataVersion string is added to the package DESCRIPTION file and updated when these
-#' objects are updated or changed on subsequent builds.
-#'
-#' Once the package is built and installed, the data objects created in the package are accessible via
-#' the \code{data()} API, and
-#' Calling \code{datapackage_skeleton()} and passing in R / Rmd file names, and r object names
-#' constructs a skeleton data package source tree and an associated \code{config.yml} file.
-#'
-#' Calling \code{build_package()} sets the build process in motion.
-#' @examples
-#' # A simple Rmd file that creates one data object
-#' # named "tbl".
-#' if(rmarkdown::pandoc_available()){
-#' f <- tempdir()
-#' f <- file.path(f,"foo.Rmd")
-#' con <- file(f)
-#' writeLines("```{r}\n tbl = data.frame(1:10) \n```\n",con=con)
-#' close(con)
-#'
-#' # construct a data package skeleton named "MyDataPackage" and pass
-#' # in the Rmd file name with full path, and the name of the object(s) it
-#' # creates.
-#'
-#' pname <- basename(tempfile())
-#' datapackage_skeleton(name=pname,
-#'    path=tempdir(),
-#'    force = TRUE,
-#'    r_object_names = "tbl",
-#'    code_files = f)
-#'
-#' # call package_build to run the "foo.Rmd" processing and
-#' # build a data package.
-#' package_build(file.path(tempdir(), pname), install = FALSE)
-#'
-#' # "install" the data package
-#' devtools::load_all(file.path(tempdir(), pname))
-#'
-#' # read the data version
-#' data_version(pname)
-#'
-#' # list the data sets in the package.
-#' data(package = pname)
-#'
-#' # The data objects are in the package source under "/data"
-#' list.files(pattern="rda", path = file.path(tempdir(),pname,"data"), full = TRUE)
-#'
-#' # The documentation that needs to be edited is in "/R"
-#' list.files(pattern="R", path = file.path(tempdir(), pname,"R"), full = TRUE)
-#' readLines(list.files(pattern="R", path = file.path(tempdir(),pname,"R"), full = TRUE))
-#' # view the documentation with
-#' ?tbl
-#' }
-#' @name DataPackageR-package
-'_PACKAGE'
-
-
 .validate_render_root <- function(x) {
   # catch an error if it doesn't exist
   render_root <-

diff --git a/R/01.R → R/zzz.R b/R/01.R → R/zzz.R
diff --git a/README.Rmd b/README.Rmd
@@ -23,7 +23,7 @@ DataPackageR is used to reproducibly process raw data into packaged, analysis-re
 <!-- badges: start -->
 [![CRAN](https://www.r-pkg.org/badges/version/DataPackageR)]( https://CRAN.R-project.org/package=DataPackageR)
 [![R-CMD-check](https://github.com/ropensci/DataPackageR/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/DataPackageR/actions)
-[![Coverage status](https://codecov.io/gh/ropensci/DataPackageR/branch/master/graph/badge.svg)](https://codecov.io/github/ropensci/DataPackageR?branch=master)
+[![Coverage status](https://codecov.io/gh/ropensci/DataPackageR/branch/main/graph/badge.svg)](https://app.codecov.io/github/ropensci/DataPackageR?branch=main)
 [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
 [![](https://badges.ropensci.org/230_status.svg)](https://github.com/ropensci/software-review/issues/230)
 [![DOI](https://zenodo.org/badge/29267435.svg)](https://doi.org/10.5281/zenodo.1292095)
@@ -75,9 +75,9 @@ You have diverse raw data sets that you need to preprocess and tidy in order to:
 
 - **Package size limits.**
 
-  R packages have a 5MB size limit, at least on CRAN. BioConductor has explicit [data package](https://www.bioconductor.org/developers/package-guidelines/#package-types) types that can be larger and use git LFS for very large files. 
+  R packages have a 10MB size limit, at least on [CRAN](https://cran.r-project.org/web/packages/policies.html). BioConductor [ExperimentHub](http://contributions.bioconductor.org/data.html#data) may be able to support larger data packages.
 
-  Sharing large volumes of raw data in an R package format is still not ideal, and there are public biological data repositories better suited for raw data: e.g.,  [GEO](https://www.ncbi.nlm.nih.gov/geo/), [SRA](https://www.ncbi.nlm.nih.gov/sra), [ImmPort](https://www.immport.org:443/shared/immport-open/public/home/home), [ImmuneSpace](https://immunespace.org/), [FlowRepository](https://flowrepository.org/).
+  Sharing large volumes of raw data in an R package format is still not ideal, and there are public biological data repositories better suited for raw data: e.g.,  [GEO](https://www.ncbi.nlm.nih.gov/geo/), [SRA](https://www.ncbi.nlm.nih.gov/sra), [ImmPort](https://www.immport.org/), [ImmuneSpace](https://immunespace.org/), [FlowRepository](http://flowrepository.org/).
 
   Tools like [datastorr](https://github.com/ropenscilabs/datastorr) can help with this and we hope to integrate the into DataPackageR in the future.
 

diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ analysis-ready data sets.
 [![CRAN](https://www.r-pkg.org/badges/version/DataPackageR)](https://CRAN.R-project.org/package=DataPackageR)
 [![R-CMD-check](https://github.com/ropensci/DataPackageR/workflows/R-CMD-check/badge.svg)](https://github.com/ropensci/DataPackageR/actions)
 [![Coverage
-status](https://codecov.io/gh/ropensci/DataPackageR/branch/master/graph/badge.svg)](https://codecov.io/github/ropensci/DataPackageR?branch=master)
+status](https://codecov.io/gh/ropensci/DataPackageR/branch/main/graph/badge.svg)](https://app.codecov.io/github/ropensci/DataPackageR?branch=main)
 [![Project Status: Active – The project has reached a stable, usable
 state and is being actively
 developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
@@ -82,18 +82,15 @@ purpose is to contain, access, and / or document data sets.
 
 -   **Package size limits.**
 
-    R packages have a 5MB size limit, at least on CRAN. BioConductor has
-    explicit [data
-    package](https://www.bioconductor.org/developers/package-guidelines/#package-types)
-    types that can be larger and use git LFS for very large files.
+    R packages have a 10MB size limit, at least on [CRAN](https://cran.r-project.org/web/packages/policies.html). BioConductor [ExperimentHub](http://contributions.bioconductor.org/data.html#data) may be able to support larger data packages.
 
     Sharing large volumes of raw data in an R package format is still
     not ideal, and there are public biological data repositories better
     suited for raw data: e.g., [GEO](https://www.ncbi.nlm.nih.gov/geo/),
     [SRA](https://www.ncbi.nlm.nih.gov/sra),
-    [ImmPort](https://www.immport.org:443/shared/immport-open/public/home/home),
+    [ImmPort](https://www.immport.org/),
     [ImmuneSpace](https://immunespace.org/),
-    [FlowRepository](https://flowrepository.org/).
+    [FlowRepository](http://flowrepository.org/).
 
     Tools like [datastorr](https://github.com/traitecoevo/datastorr)
     can help with this and we hope to integrate the into DataPackageR in