Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New package idea: Call Git directly from R #22

Open
jdblischak opened this issue Mar 3, 2020 · 1 comment
Open

New package idea: Call Git directly from R #22

jdblischak opened this issue Mar 3, 2020 · 1 comment

Comments

@jdblischak
Copy link

Proposal

Create a new R package that directly calls the Git executable. The idea would be to support the main subcommands with their most commonly-used options.

Why?

Why would we want to do this when the R packages git2r and gert already exist? I have used git2r extensively (and gert follows a similar strategy), and I have found a various shortcomings due to the fact that they use libgit2, a minimal Git library, and not Git itself:

  1. libgit2 is known to be much slower than Git. See git_status_list is slower than git status libgit2/libgit2#4230 and Potential performance improvements libgit2/libgit2#5038
  2. Using the libgit2 API requires the user to be more familiar with the underlying data structures of Git than should be required. For example, let's say you want to know which files were modified in a given commit. With Git, it's as simple as git log --stat. Whereas in libgit2, you have to compare the tree pointed to by the commit to the tree pointed to by the previous commit (and have fun handling the edge case where one of those is a merge commit!). I think that if you know how to accomplish something with Git, you should be able to accomplish it from the R wrapper.
  3. Installation is more complicated. You either have to compile libgit2 from source or install it via a package manager (e.g. apt install libgit2-dev). If an R user uses Git so often that they want to take the time to automate Git commands in an R script, it is pretty likely that they already have Git installed on their machine.
  4. Pushing and pulling from remote repositories, e.g. GitHub, requires using authentication via HTTPS or SSH. With Git, this usually "just works". Since libgit2 is a minimal library that you likely compiled yourself, it can be tricky to connect it to the required system libraries to accomplish this. The git2r Issue tracker (and workflowr's, since it wraps git2r) is full of failed attempts to push/pull from a remote repository.

How?

I think this package would be ideal a hackathon-style event because 1) the endpoints are clear, and 2) it is easy to split up tasks between people. After planning the overall strategy, individuals or small teams can each take on the task of wrapping one of the Git subcommands.

Various Git commands have options for making the results more machine-parseable, e.g. git log --format=raw and git status --porcelain.

We can call the Git executable with system2() and shQuote(), or alternatively use the sys package.

If the R script is being run outside of the Git repository, we can use the -C argument to provide the path, e.g. git -C path/to/repo status.

Challenges

  1. How to securely pass a password from R to Git executable? Maybe we can use the credentials package, which is what gert uses.
  2. How to represent the results from Git in R? Should we prefer data frames as much as possible or use a more object-oriented approach?
@eddelbuettel
Copy link

eddelbuettel commented Mar 3, 2020

I dunno. Put me down as not liking system() or system2() when push comes to shove. And doesn't the world have enough git interfaces already?

But by all means. It will be a lot of work, and even more work to come feature close to what is already out there and used every day...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants