Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package for managing R project directories/repositories #17

Open
ataustin opened this issue Mar 3, 2020 · 14 comments
Open

package for managing R project directories/repositories #17

ataustin opened this issue Mar 3, 2020 · 14 comments

Comments

@ataustin
Copy link

ataustin commented Mar 3, 2020

This is only the seed of an idea.

Walking into someone else's project directory can be daunting, given differences in organization philosophies. Equally challenging is reorganizing your own projects when you have a change in how you wish to organize your files.

I'd like to have a package that analyzes a directory and creates a high-level visual aid of the contents. The package could also help users reorganize their files by creating a mapping from the existing structure to a new desired structure. I'm imagining some kind of text file as a blueprint that specifies the mapping. The blueprint is then used to reorganize the files, and can also be used to undo the process if desired. Upon performing the reorganization, the package might also search scripts for file paths (relative or absolute) and change them to match the new structure so that code continues to run without further alteration.

@emilyriederer
Copy link
Collaborator

This is very intriguing! It almost sounds like a structure-based versus style-based parallel to lintr.

@jdblischak
Copy link

Sounds cool! Some suggestions:

  • Since this will require lots of file/directory manipulation, the fs package will be useful for ensuring the package works across operating systems
  • The function tree from the package cli is a good option for visualizing the filesystem

@eddelbuettel
Copy link

eddelbuettel commented Mar 4, 2020

I think this is hard without a goal metric to compare against. We have dozens of packages setting up preferred package / directory structure. Do you want to / need to pick one?

fs is nice but base R does all that. I once gave a lightning talk showing how the existing functions like file.info() already return a list which makes pulling the into data.table a breeze after which grouping is of course "free" and "fast". R for System Admin lightning talk slides

@ataustin
Copy link
Author

ataustin commented Mar 4, 2020

@emilyriederer I love the analogy to lintr! Never thought of it that way, but this package could try to enforce a chosen structure. Might be tricky but at the least, a good thought exercise.

@jdblischak Awesome suggestions! I wonder whether this package idea would dovetail in any way into your workflowr? I feel silly for having never known about that package before.

@eddelbuettel I definitely think that defining the scope of the package would give shape the needed dependencies. I'm glad there's some interest in this idea though -- I never know when the problems I'm trying to solve are way too niche.

@alistaire47
Copy link

pkgnet's function network mapping might be a useful starting point? ...but still a hard problem

@jdblischak
Copy link

I feel silly for having never known about that package before.

@ataustin Don't be so hard on yourself! There are more than 15K packages on CRAN :-)

I wonder whether this package idea would dovetail in any way into your workflowr?

Potentially. I have a vignette on migrating an existing project to workflowr. If you develop a method to make that transition easier, we can add it to the vignette.

@eddelbuettel
Copy link

@jdblischak Well having you here answers my earlier question "which of many competing alternatives gets picked" ;-)

@eddelbuettel
Copy link

@alistaire47 pkgnet is indeed nice, but note that the starting point a non-packaged directory. Does it ingest that?

@jdblischak
Copy link

fs is nice but base R does all that.

@eddelbuettel How can I compute a relative path using base R? I do this a lot in workflowr, and I imagine it would come up in a project that involves reorganizing files and directories.

> fs::path_rel("a/b/c/d", start = "a/b/z")
../c/d

@eddelbuettel
Copy link

Ahh, so you are projecting from your existing requirements (which may or may not have the need for the noted dependency) to a new and (as of now, mostly unspec'ed) "project". Now I see how one ends up with a dozen dependendies in a package. Learning something new every day 😉

That path_rel is indeed an interesting problem I never had (esp not find a "shortest to print" output). What matters to me is that R has basically all existing POSIX functionality it itself needs to work reliably on three OSs with a lot of file ops during package build, test, installation, removal, ....

That feature set is usually good enough for me to normalize a path, get info, expand a glob, etc pp. But I'd have to think about this intersection problem. But ex ante I am not sure it must be solved here. For new project X we may just move/create/delete/renamed/... files below a starting point. Which works with and without fs so 🤷‍♂️

@eddelbuettel
Copy link

eddelbuettel commented Mar 4, 2020

@jdblischak Got curious, found these two quick SO hits:

That puts it squarely in the "do I reimplement something potentially with new bugs just to avoid a dependency" versus "do I add a dependency though it increase my overall 'surface' of exposure to change and potential breakage". No clear winners there. As always, it depends (and fs is a fine package).

@jdblischak
Copy link

That puts it squarely in the "do I reimplement something potentially with new bugs just to avoid a dependency" versus "do I add a dependency though it increase my overall 'surface' of exposure to change and potential breakage". No clear winners there. As always, it depends (and fs is a fine package).

I've done both :-)

For workflowr, I first wrote it using base R: https://github.com/jdblischak/workflowr/blob/v0.8.0/R/utility.R#L85

Then I switched to R.utils. Then after many hours of debugging errors caused by normalizePath() and unlink(), I switched to fs for all my path computations. It's not perfect either, but Jim is very responsive to updating fs whenever I find something unexpected.

Given my experience with normalizePath() and unlink(), I think that fs is a reasonable dependency for any package that does a lot of file manipulation.

@alistaire47
Copy link

@alistaire47 pkgnet is indeed nice, but note that the starting point a non-packaged directory. Does it ingest that?

Nope! But some of its code for calculating network metrics on functions might be a useful starting point for figuring out how to map and rearrange directories of code. The lack of guarantees about structure you get with a package make this task a lot harder, though.

@ataustin
Copy link
Author

ataustin commented Mar 6, 2020

This doesn't eliminate the need for dependencies, but in my current work I build file paths using the rprojroot package to get absolute file paths to the top level of some project directory, then relative file paths underneath that using file.path which ostensibly works in an OS-independent manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants