Skip to content

Commit

Permalink
Updates to various files
Browse files Browse the repository at this point in the history
  • Loading branch information
larsvilhuber committed Dec 31, 2023
1 parent babfa9c commit f17375d
Show file tree
Hide file tree
Showing 8 changed files with 114 additions and 4 deletions.
3 changes: 3 additions & 0 deletions 01_run_it_again.md → 01-run_it_again.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

The very first test is that your code must run, beginning to end, top to bottom, without error, and ideally without any user intervention. This should in principle (re)create all figures, tables, and numbers you include in your paper.

## TL;DR

This is pretty much the most basic test of reproducibility. If you cannot run your code, you cannot reproduce your results, nor can anybody else. So just re-run the code.

## Exceptions

Expand Down
File renamed without changes.
42 changes: 42 additions & 0 deletions 03-creating_log_files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
(creating-log-files)=
# Creating log files

In order to document that you have actually run your code, a log file, a transcript, or some other evidence, may be useful. It may even be required by certain journals.

## TL;DR

- Log files are a way to document that you have run your code.
- In particular for code that runs for a very long time, or that uses data that cannot be shared, log files may be the only way to document basic reproducibility.

## Overview

Most statistical software has ways to keep a record that it has run, with the details of that run. Some make it easier than others. In some cases, you may need to instruct your code to be "verbose", or to "log" certain events. In other cases, you may need to use a command-line option to the software to create a log file.

```{warning}
I do note that we are typically only looking to document what the statistical code does, at a high level. We are not looking to document system calls, fine-grained data access, etc. Computer scientists and IT security mavens may be interested in such details, but economists are typically not.
```

## Examples

### Explicit log files

We start by describing how to explicitly generate log files as part of the statistical processing code.

::::{tab-set}


:::{tab-item} Stata

global logdir "${rootdir}/logs"
cap mkdir "$logdir"`
local c_date = c(current_date)
local cdate = subinstr("`c_date'", " ", "_", .)
local c_time = c(current_time)
local ctime = subinstr("`c_time'", ":", "_", .)
local globallog = "$logdir/logfile_`cdate'-`ctime'-`c(username)'.log"
log using "`globallog'", name(global) replace text

:::

::::

File renamed without changes.
8 changes: 7 additions & 1 deletion 29-new-computer.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# Use a new computer

Some authors may have a fresh, or extra, computer lying around. Use that to download the replication package, and see if it runs.
The ultimate isolated environment is an otherwise untouched computer. Some authors, or the IT departments in the institution that authors are affiliated with, may have a fresh, new, recently imaged computer lying around, with the relevant software (say, Stata or Python) already installed.

```{tip}
If you ever change institutions, employers, or simply buy a new laptop - this might be you!
```

Use such an "unblemished" computer to download the replication package, and see if it runs. Keep tabs of what additional configuration steps you need to do that you had not thought of.
29 changes: 28 additions & 1 deletion 30-docker.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,38 @@
# Use of containers


## TL;DR

- Containers are a way to simulate a "computer within a computer", which can be used to run code in an isolated environment. They are relatively lightweight, and are starting to be used as part of replication packages in economics.
- They do not work in all situations, and require some more advanced technical skills.
- Using containers to test for reproducibility is easier, and should be considered as part of a toolkit.
- Several online services make such testing (and development) easy.

## Overview

Coming soon.

Containers can be shared via online systems (Docker Hub, Singularity Hub, etc.), or via files (`.tar` files, etc.). While the former is convenient, the latter is more robust for archival purposes.

```{warning}
Commercial container sharing services regularly purge containers from their services if they are not actively used, or if a subscription is not maintained. While the core infrastructure containers, such as for Python or R, are likely to be maintained for a long time, commercial companies can change their preservation policies at any time, with little warning.
```


## Examples

```bash
docker run -it --rm \
-v "$(pwd)":/project \
-w /project \
dataeditors/stata17:2023-08-29 \
-b do main.do
```
```

## Additional resources

- [Docker](https://www.docker.com/) is a free, open-source container manager, which allows users to create containers using "recipes" (called `Dockerfiles`). While the underlying technology is usually Linux, [Docker Desktop](https://www.docker.com/products/docker-desktop) (commercial, free for most academic uses) allows users to run containers on Windows, macOS, and Linux.
- [OrbStack](https://www.orbstack.com/) is a container manager for macOS (commercial, free for typical academic usage). It is compatible with Docker.
- [Apptainer](https://www.apptainer.io/), formerly known as [Singularity](https://sylabs.io/singularity/), free, open-source container manager. It can use Docker images, but has its own syntax for "recipes". It is fundamentally Linux based, and available on many university HPC clusters.

Various other container managers are available for both Linux and Windows (Azure) based clouds (`podman`, etc.). They should all be able to run Docker containers.
30 changes: 30 additions & 0 deletions 40-virtual_machines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
(virtual-machines)=
# Virtual Machines


## TL;DR

- Virtual machines are a way to create a "computer within a computer", which can be used to run code in an isolated environment. They are not usually part of replication packages in economics, and I do not suggest you use them as such, but they can be used to test replication packages.

## Overview

For sake of completeness, we will mention that you can also achieve the same outcome as using a brand-new computer by using a virtual machine, on your own system. Virtual machines are routinely used in computer science and other domains (including as class assignments in CS courses). Basic software is free, and there are standards on sharing virtual machine files (the specifications and the actual contents).

```{warning}
Virtual machines are not typically used in economics, and in particular not as a key component of replication packages. They are presented here primarily as an advanced tool to **test** replication packages.
```

## Examples

None at this point.

## Additional resources

- [Oracle VirtualBox](https://www.virtualbox.org/) is a free, open-source virtual machine software, originally developed by Sun Microsystems. It is available for Windows, macOS, and Linux.
- [VMWare Workstation Player](https://www.vmware.com/products/workstation-player.html) is commercial virtual machine software, with a free "player" version for Windows and Linux

Naturally, one would like to have virtual machines be reproducibly created, and this is possible using tools such as:

- [Vagrant](https://www.vagrantup.com/) is a free, open-source virtual machine manager, which allows users to create virtual machines using "recipes", similar to Dockerfiles. It is available for Windows, macOS, and Linux.
- [Multipass](https://multipass.run/) is a free, open-source virtual machine manager. While it can only handle creating Linux VMs, the tool itself is available for Windows, macOS, and Linux.

6 changes: 4 additions & 2 deletions _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@ root: index
parts:
- caption: Simple ways to test replication packages
chapters:
- file: 01_hands_off_running
- file: 02_environments
- file: 01-run_it_again
- file: 02-hands_off_running
- file: 10-environments
- caption: More complex ways to test replication packages
chapters:
- file: 29-new-computer
- file: 30-docker
- file: 40-virtual_machines

0 comments on commit f17375d

Please sign in to comment.