Skip to content

Commit

Permalink
Prepare for CRAN. (#9)
Browse files Browse the repository at this point in the history
* Prepare for CRAN.

* Update win server details for rhub.
  • Loading branch information
jonthegeek authored Mar 3, 2022
1 parent 4d5c279 commit f893df5
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 11 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: wordpiece.data
Title: Data for Wordpiece-Style Tokenization
Version: 1.0.2.9000
Version: 2.0.0
Authors@R: c(
person(given = "Jonathan",
family = "Bratt",
Expand Down
10 changes: 7 additions & 3 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,15 @@
# wordpiece.data 2.0.0

- Breaking change: The wordpiece vocabularies are now character vectors, rather than named integer vectors. Update wordpiece to version 2.1.2 or later for compatibility.

# wordpiece.data 1.0.2

* Corrected type of loaded vocabularies from double to integer.
- Corrected type of loaded vocabularies from double to integer.

# wordpiece.data 1.0.1

* Initial CRAN release.
- Initial CRAN release.

# wordpiece.data 1.0.0

* Added a `NEWS.md` file to track changes to the package.
- Added a `NEWS.md` file to track changes to the package.
2 changes: 1 addition & 1 deletion README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ remotes::install_github("macmillancontentscience/wordpiece.data")

The datasets included in this package were retrieved from huggingface (specifically, [cased](https://huggingface.co/bert-base-cased/resolve/main/vocab.txt) and [uncased](https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt)).
They were then processed using the {[wordpiece](https://github.com/macmillancontentscience/wordpiece)} package.
This is a bit circular, because this package will be used as a dependency for the wordpiece package.
This is a bit circular, because this package is a dependency for the wordpiece package.

```{r process-datasets, eval = FALSE}
vocab_txt <- tempfile(fileext = ".txt")
Expand Down
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,8 @@ and
[uncased](https://huggingface.co/bert-base-uncased/resolve/main/vocab.txt)).
They were then processed using the
{[wordpiece](https://github.com/macmillancontentscience/wordpiece)}
package. This is a bit circular, because this package will be used as a
dependency for the wordpiece package.
package. This is a bit circular, because this package is a dependency
for the wordpiece package.

``` r
vocab_txt <- tempfile(fileext = ".txt")
Expand Down Expand Up @@ -87,8 +87,7 @@ function to load data used by
library(wordpiece.data)

head(wordpiece_vocab())
#> [PAD] [unused0] [unused1] [unused2] [unused3] [unused4]
#> 0 1 2 3 4 5
#> [1] "[PAD]" "[unused0]" "[unused1]" "[unused2]" "[unused3]" "[unused4]"
```

## Code of Conduct
Expand Down
25 changes: 23 additions & 2 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,33 @@
# Resubmission

## Changes

* Breaking change: The wordpiece vocabularies are now character vectors, rather than named integer vectors.

## Test environments
* local R installation, R 4.1.1 (Windows 10)
* local R installation, R 4.1.2 (Windows 10)
* win-builder (devel)
* Windows Server 2008 R2 SP1, R-devel, 32/64 bit (rhub)
* Windows Server 2022, R-devel, 64 bit (rhub)
* Ubuntu Linux 20.04.1 LTS, R-release, GCC (rhub)
* Fedora Linux, R-devel, clang, gfortran (rhub)

There is a NOTE when testing for Windows Server:

```
* checking for detritus in the temp directory ... NOTE
Found the following files/directories:
'lastMiKTeXException'
```

I cannot reproduce this error on my Windows machine, and a web search indicated that it is likely nothing. This package is very simple and I can't find anything that could possibly trigger that error.

## R CMD check results

0 errors | 0 warnings | 0 notes

* These words in DESCRIPTION are NOT misspelled: Tokenization, tokenize, wordpiece, Wordpiece.


## Reverse dependencies

wordpiece 2.1.2 handles the difference between this version of wordpiece.data and the previous version.

0 comments on commit f893df5

Please sign in to comment.