Skip to content

Commit

Permalink
Various updates
Browse files Browse the repository at this point in the history
  • Loading branch information
larsvilhuber committed Jan 1, 2024
1 parent f17375d commit c8db2d1
Show file tree
Hide file tree
Showing 6 changed files with 105 additions and 6 deletions.
21 changes: 21 additions & 0 deletions 01-run_it_again.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,24 @@ What happens when some of these re-runs are very long? See later in this chapter
### Making the code run takes you a very long time

While the code, once set to run, can do so on its own, *you* might need to spend a lot of time getting all the various pieces to run. This should be a warning sign: if it takes you a long time to get it to run, or to manually reproduce the results, it might take others even longer. Furthermore, it may suggest that you haven't been able to re-run your own code very often, which can be correlated with fragility or even lack of reproducibility. We address this partially in the [next section](hands-off-running).

## What this does

This ensures

- that your code runs without problem, after all the debugging.

## What this does not do

This does not ensure


- that your code runs without manual intervention.
- that your code generates a log file that you can inspect, and that you could share with others.
- that it will run on somebody else's computer
- because it does not guarantee that all the software is there
- because it does not guarantee that all the directories for input or output are there
- because many intermediate files might be present that are not in the replication package
- because it does not guarantee that all the directory names are correctly adjusted everywhere in your code
- that it actually produces all the outputs
- because some outputs might be present from test runs
58 changes: 53 additions & 5 deletions 02-hands_off_running.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,22 @@
(hands-off-running)=
# Hands-off running

The very first test is that your code must run, beginning to end, top to bottom, without error, and ideally without any user intervention. This should in principle (re)create all figures, tables, and numbers you include in your paper.
Let's ramp it up a bit. Your code must run, beginning to end, top to bottom, without error, and without any user intervention. This should in principle (re)create all figures, tables, and numbers you include in your paper.

Many users may not be set up to run in one single top-to-bottom run. It helps to have a `main file` that runs all the other files, or all code, in the correct order, but is not a pre-requisite.

```{warning}
We have seen users who appear to highlight code and to run it interactively, in pieces, using the program file as a kind of notepad. This is not reproducible, and should be avoided. It is fine for debugging.
```

## Examples
## TL;DR

- Create a "main" file that runs all the other files in the correct order.
- Run this file, without user intervention.
- It should run without error.

## Creating a main or master script

In order to be able to enable "hands-off running", the main script is key. I will show here a few simple examples for single-software replication packages. We will discuss more complex examples in one of the next chapters.


::::{tab-set}
Expand Down Expand Up @@ -159,10 +165,52 @@ run(fullfile(rootdir, '04_figures.m'))
run(fullfile(rootdir, '05_appendix.m'))
```

Follow instructions here to run MATLAB without a GUI, in hands-off mode, creating a log file.
Run this script, and it should run all the other ones. Note that there are various other ways to achieve a similar goal, for instance, by treating each MATLAB file as a function.

:::

:::{tab-item} Bash

Bash is a cross-platform terminal interpreter that many users may have encountered if using Git on Windows ("Git Bash"). It is also installed by default on macOS and Linux. It can be used to run command line versions of most statistical software, and is thus a good candidate for a main script. Note that it does introduce an additional dependency - the replicator now needs to have Bash installed, and it is not entirely platform agnostic when calling other software, as those calls may be different on different platforms, though that is a problem afflicting any multi-software main script. In particular, on most Windows machines, the statistical software is not in the `%PATH%` by default, and thus may need to be called with the full path to the executable.

We will discuss software search paths more fully in the [section on environments](environments).


```bash
# main.bash
# This is a simple example of a main file in Python
# It runs all the other files in the correct order

# Set the root directory
rootdir=$(pwd)
# equivalent:
# rootdir=$PWD

# Run the data preparation file
# Example for calling Stata
# "stata-mp" must be in your path!
stata-mp -b do "01_data_prep.do"

# Run the analysis file
# "python" must be in your path, and it must be the desired Python version!
python 02_analysis.py

# Run the table file
# "Rscript" must be in your path.
Rscript 03_tables.R

# Run the figure file
Rscript "04_figures.R"

# Run the appendix file
# Here, we use MATLAB. Running MATLAB is *never* platform-independent.
# Linux:
matlab -nodisplay -r "addpath(genpath('.')); 05_appendix"
# Windows:
#start matlab -nosplash -minimize -r "addpath(genpath('.')); 05_appendix"
```

:::

::::

Expand All @@ -172,12 +220,12 @@ This ensures

- that your code runs without problem, after all the debugging.
- that your code runs without manual intervention.
- that your code generates a log file that you can inspect, and that you could share with others.

## What this does not do

This does not ensure

- that your code generates a log file that you can inspect, and that you could share with others.
- that it will run on somebody else's computer
- because it does not guarantee that all the software is there
- because it does not guarantee that all the directories for input or output are there
Expand Down
28 changes: 28 additions & 0 deletions 03-creating_log_files.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ We start by describing how to explicitly generate log files as part of the stati

:::{tab-item} Stata

```stata
global logdir "${rootdir}/logs"
cap mkdir "$logdir"`
local c_date = c(current_date)
Expand All @@ -35,7 +36,34 @@ local c_time = c(current_time)
local ctime = subinstr("`c_time'", ":", "_", .)
local globallog = "$logdir/logfile_`cdate'-`ctime'-`c(username)'.log"
log using "`globallog'", name(global) replace text
```

:::

:::{tab-item} R

```R
# This will only log output ("stdout") and warnings/messages ("stderr"), but not the commands themselves!

logfile.name <- paste0("logfile_", Sys.Date(),"-",format(as.POSIXct(Sys.time()), format = "%H_%M"),"-",Sys.info()["user"], ".log")
globallog <- file(file.path(rootdir,logfile.name), open = "wt")
# Send output to logfile
sink(globallog, split=TRUE)
sink(globallog, type = "message")

## revert output back to the console
sink(type = "message")
sink()
close(globallog)
```

:::

:::{tab-item} MATLAB

```matlab
% The "diary" function should achieve this. Not a MATLAB expert!
```
:::

::::
Expand Down
1 change: 1 addition & 0 deletions 10-environments.md
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
(environments)=
# Using environments
2 changes: 1 addition & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
title: Self-checking replication packages
author: Lars Vilhuber
logo: assets/ssde-logo.jpeg
copyright: 2023
copyright: "2023"
exclude_patterns : ["*venv*",".git*",".devcontainer"]

# Force re-execution of notebooks on each build.
Expand Down
1 change: 1 addition & 0 deletions _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ parts:
chapters:
- file: 01-run_it_again
- file: 02-hands_off_running
- file: 03-creating_log_files
- file: 10-environments
- caption: More complex ways to test replication packages
chapters:
Expand Down

0 comments on commit c8db2d1

Please sign in to comment.