forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathDocumenting-packages.rmd
210 lines (144 loc) · 9.95 KB
/
Documenting-packages.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
---
title: Documenting packages
layout: default
---
# Package level documentation
Package level documentation exists to help the user understand how to use the package as a whole. Function level documentation is useful once you know exactly what you want to do, and you've discovered the appropriate function: once you've done all that, function-level documentation helps you use the function appropriately. Package-level documentation gets you to point where you can do that: it explains what the package does as a whole, breaks the functions down into useful categories, and shows you how to combine the functions (and maybe functions from other packages) to solve problems.
There are XXX components to package level documentation. None of them are compulsory, but the more you make, the easier it will be for others to use your package.
* `README`, `NEWS`, `CITATION` are specially formatted plain-text
documentation that provide a brief overview of your package, what's changed
recently, and how to cite your package.
* Vignettes provide long form pdf documentation
* Demos are pure R files that provide case studies linking together multiple
components of the package
* It's also a good idea to include "function" documentation for your package -
that's described in the following chapter.
## `README`
The `README` file lives in the package directory. It should be fairly short (3-4 paragraphs) and answer the following questions:
* Why should someone use your package?
* How does it compare to other existing solutions?
* What are the main functions?
If you're using github, this will appear on the package home page. I also recommend using it when you announce a new version of your package.
Some examples from our packages follow. Note that most of these use markdown, http://daringfireball.net/projects/markdown/, a plain text formatting language to add headings, basic text formatting and bullets. A brief introduction to markdown is included in the appendix. If you use markdown, you should call you readme file `README.md`.
### `plyr`
plyr is a set of tools for a common set of problems: you need to __split__
up a big data structure into homogeneous pieces, __apply__ a function to
each piece and then __combine__ all the results back together. For
example, you might want to:
* fit the same model each patient subsets of a data frame
* quickly calculate summary statistics for each group
* perform group-wise transformations like scaling or standardising
It's already possible to do this with base R functions (like split and the
apply family of functions), but plyr makes it all a bit easier with:
* totally consistent names, arguments and outputs
* convenient parallelisation through the foreach package
* input from and output to data.frames, matrices and lists
* progress bars to keep track of long running operations
* built-in error recovery, and informative error messages
* labels that are maintained across all transformations
Considerable effort has been put into making plyr fast and memory
efficient, and in many cases plyr is as fast as, or faster than, the
built-in equivalents.
A detailed introduction to plyr has been published in JSS: "The
Split-Apply-Combine Strategy for Data Analysis",
http://www.jstatsoft.org/v40/i01/. You can find out more at
http://had.co.nz/plyr/, or track development at
http://github.com/hadley/plyr. You can ask questions about plyr (and data
manipulation in general) on the plyr mailing list. Sign up at
http://groups.google.com/group/manipulatr.
### `stringr`
Strings are not glamorous, high-profile components of R, but they do play
a big role in many data cleaning and preparations tasks. R provides a
solid set of string operations, but because they have grown organically
over time, they can be inconsistent and a little hard to learn.
Additionally, they lag behind the string operations in other programming
languages, so that some things that are easy to do in languages like Ruby
or Python are rather hard to do in R. The `stringr` package aims to remedy
these problems by providing a clean, modern interface to common string
operations.
More concretely, `stringr`:
* Processes factors and characters in the same way.
* Gives functions consistent names and arguments.
* Simplifies string operations by eliminating options that you don't need
95% of the time.
* Produces outputs than can easily be used as inputs. This includes
ensuring that missing inputs result in missing outputs, and zero length
inputs result in zero length outputs.
* Completes R's string handling functions with useful functions from
other programming languages.
## `NEWS`
The `NEWS` file should list all changes that have occurred since the last release of the package.
The following sample shows the `NEWS` file from the `stringr` package.
stringr 0.5
===========
* new `str_wrap` function which gives `strwrap` output in a more
convenient format
* new `word` function extract words from a string given user defined
separator (thanks to suggestion by David Cooper)
* `str_locate` now returns consistent type when matching empty string
(thanks to Stavros Macrakis)
* new `str_count` counts number of matches in a string.
* `str_pad` and `str_trim` receive performance tweaks - for large vectors
this should give at least a two order of magnitude speed up
* str_length returns NA for invalid multibyte strings
* fix small bug in internal `recyclable` function
`NEWS` has a special format, but it's not well documented. The basics are:
* The information for each version should start with the name of the package
and its version number, followed by a line of `=`s.
* Each change should be listed with a bullet. If a bullet continues over
multiple lines, the second and subsequent lines need to be indented by at
least two spaces. (I usually add a blank line between each bullet to make it
easier to read.)
* If you have many changes, you can use subheadings to divide them into
sections. A subheading should be all upper case and flush left.
* I use markdown formatting inside the bullets. This doesn't help the
formatting in R, but is useful if you want to publish the `NEWS` file
elsewhere.
You can use `devtools::show_news()` to display the `NEWS` using R's built-in parser and check that it appears correctly. `show_news()` defaults to showing just the news for the most recent version of the package. You can override this by using argument `latest = FALSE`.
## `CITATION`
The `CITATION` file lives in the `inst` directory and is intimately connected to the `citation()` function which tells you how to cite R and R packages. Calling `citation()` without any arguments tells you how to cite base R:
To cite R in publications use:
R Core Team (2012). R: A language and environment for statistical
computing. R Foundation for Statistical Computing, Vienna, Austria.
ISBN 3-900051-07-0, URL http://www.R-project.org/.
A BibTeX entry for LaTeX users is
@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2012},
note = { {ISBN} 3-900051-07-0},
url = {http://www.R-project.org/},
}
We have invested a lot of time and effort in creating R, please cite it
when using it for data analysis. See also ‘citation("pkgname")’ for
citing R packages.
This is generated from a `CITATION` file that looks like this:
bibentry("Manual",
title = "R: A Language and Environment for Statistical Computing",
author = person("R Core Team"),
organization = "R Foundation for Statistical Computing",
address = "Vienna, Austria",
year = version$year,
note = "{ISBN} 3-900051-07-0",
url = "http://www.R-project.org/",
mheader = "To cite R in publications use:",
mfooter =
paste("We have invested a lot of time and effort in creating R,",
"please cite it when using it for data analysis.",
"See also", sQuote("citation(\"pkgname\")"),
"for citing R packages.", sep = " ")
)
As you can see, it's pretty simple: you only need to learn one new function, `bibentry()`. The most important arguments, are `bibtype` (the first argument, which can "Article", "Book", "PhDThesis" and so on), and then the standard bibliographic information like `title,`, `author`, `year`, `publisher`, `journal`, `volume`, `issue`, `pages` and so on (they are all described in detail in `?bibEntry`). The header (`mheader`) and footer (`mfooter`) are optional, and are useful places for additional exhortations.
## Vignettes
Vignettes are long-form pdf guides to your package. They are sweave documents that live in the `vignettes/` directory. To write a vignette you must be familiar with both Sweave and LaTeX. Describing these is outside the scope of this book, but some useful resources are:
* http://biostat.mc.vanderbilt.edu/wiki/Main/SweaveLatex
* Resource 2
* Resource 3
If you write a nice vignette, you might want to consider submitting it to the Journal of Statistical Software or the R Journal. Both are electronic only journals and peer-reviewing can be very helpful for improving the quality of your vignette and the related software.
## Demos
A demo is very much like a function example, but is longer, and shows how to use multiple functions together. Demos are `.R` files that live in the `demo/` package directory, and are accessed with the `demo()` function.
(NOT YET IMPLEMENTED) The `demos` directory also needs an index. The easiest way to generate that index is to add a roxygen comment with `@demoTitle` tag:
#' @demoTitle my title
The roxygen process that turns this comment into an index is described in the next chapter.