Skip to content

Commit

Permalink
Version 2.0.3 on CRAN (#160)
Browse files Browse the repository at this point in the history
* Add Zenodo DOI Badge. (#118) [Closes #74]

* Add Zenodo DOI Badge.

* Fix link [Closes #74]

* speed up travis builds (#125)

* removed the distribution and sudo entries from travis config - faster?

* adding back sduo false and adding cache packages option

* small updates to contributing guide (#133)

* add example for running styler

move contributing to .github folder

* ignore the .github path

* fix indexing order of operations error [fixes #119]. (#134)

* added functionality to change mtry and sparsity in Urerf (#120)

* added functionality to change mtry and sparsity in Urerf

* ran styler on modified files and removed white space.

* added tests for new RandMat functions.

* Added the functionality to splitting based on BIC score using Mclust (#124)

* added functionality to change mtry and sparsity in Urerf

* Added functionality to split based on BIC score

* Add LinearCombo arg to the Urerf fn

* Add fast version of BIC

* fix some minor errors (#141)

Ran through styler and fixed some roxygen import and documentation.

* fix issue #91 based on discussion in the comments. (#140)

* fix issue #91 based on discussion in the comments.
add some helper functions
add test for new way of computing feature importance

* remove need for library(Matrix) and update function parameteres.
fix documentation typos
[issue #91]

* update test-FeatureImportance
move `flipWeights` to helperFunctions

* update Feature Importance to be more readable [@ben].
Merge RunFeature* into the same file.
Update README with correct output names.

* check-as-cran warning will now cause TravisCI to fail. (#142)

* Print tree (#136)

* added functionality to change mtry and sparsity in Urerf

* ran styler on modified files and removed white space.

* added tests for new RandMat functions.

* added PrintTree function and modified NAMESPACE file to call PrintTree (I'm not sure this last step was necessary but it doesn't hurt.

* Add documentation and adjust the formatting of the output.

* the double comparison now relies on machine epsilon. (#149)

* the double comparison now relies on machine epsilon.

* fix for test not passing

* move an assignment out of an if condition. (#151)

Fixes issue #135

* Packed forest submodule (#152)

* add packedForest submodule

* update submodule to latest commitadd readme for submodule operations

* update submodule readme

* update submodule

* update submodule (#154)

* update submodule (#155)

* Draft of v2.0.3 for CRAN (#156)

* Draft of v2.0.3 for CRAN
no warnings, errors, or notes on my Mac.

* run README.Rmd

* update submodule (#159)
  • Loading branch information
MrAE authored Feb 6, 2019
1 parent ae07faa commit 9873b68
Show file tree
Hide file tree
Showing 41 changed files with 1,201 additions and 116 deletions.
2 changes: 1 addition & 1 deletion .#build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Rscript -e "Rcpp::compileAttributes()"
# Rscript -e "install.packages('devtools', repos = 'http://cran.us.r-project.org')"
# Rscript -e "install.packages('roxygen2', repos = 'http://cran.us.r-project.org')"
## RUN styler on directory
# Rscript -e "styler::style_dir(style = tidyverse_style)"
# Rscript -e "styler:::style_pkg(style=tidyverse_style)"
Rscript -e "devtools::document('R')"

R CMD build --resave-data .
Expand Down
2 changes: 1 addition & 1 deletion .#build_win.ps1
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Rscript.exe -e "Rcpp::compileAttributes()"
# Rscript.exe -e "install.packages('devtools', repos = 'http://cran.us.r-project.org')"
# Rscript.exe -e "install.packages('roxygen2', repos = 'http://cran.us.r-project.org')"
## RUN styler on directory
# Rscript.exe -e "styler::style_dir(style = tidyverse_style)"
# Rscript.exe -e "styler:::style_pkg(style=tidyverse_style)"
Rscript.exe -e "devtools::document('R')"

R.exe CMD build --resave-data .
Expand Down
4 changes: 3 additions & 1 deletion .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,6 @@
^.*\.so$
^.*\.Rproj$
^\.Rproj\.user$
^CONTRIBUTING.md$
^\.github$
^src/packedForest$
^src/submodule_readme.md$
8 changes: 7 additions & 1 deletion CONTRIBUTING.md → .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,13 @@ You are here to help on R-RerF? First off, thank you! Please read the followin
### Formatting

* Run your code through [styler](http://styler.r-lib.org/) auto-formater

```R
install.packages("styler")
library(styler)
styler:::style_pkg(style=tidyverse_style)
```

* Avoid modifying formatting outside the scope of your pull request
* Use **TRUE** and **FALSE**, not **T** and **F**
* Check for unnecessary whitespace with `git diff --check` before committing
Expand All @@ -28,7 +35,6 @@ We use the [testthat](https://github.com/r-lib/testthat) library for testing in

* New features need tests
* Tests should be fast, ideally each test should complete in under 5 seconds
* Mark longer running tests with
* Bug fixes need [testthat](https://github.com/r-lib/testthat) functions (test the condition that was failing)

### Make your Pull Request
Expand Down
4 changes: 4 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[submodule "src/packedForest"]
path = src/packedForest
url = https://github.com/neurodata/packedForest
branch = master
9 changes: 6 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
language: r
dist: trusty
sudo: false
cache: packages

env:
global:
- WARNINGS_ARE_ERRORS=1

r:
- release

sudo: false

addons:
apt:
packages:
Expand Down
8 changes: 4 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Package: rerf
Type: Package
Title: Randomer Forest
Version: 2.0.2
Date: 2018-12-03
Version: 2.0.3
Date: 2019-02-06
Authors@R: c(
person("Jesse", "Patsolic", role = c("ctb", "cre"), email = "software@neurodata.io"),
person("Benjamin", "Falk", role = "ctb", email = "falk.ben@jhu.edu"),
Expand All @@ -26,11 +26,11 @@ Description: R-RerF (aka Randomer Forest (RerF) or Random Projection
algorithms is where the random linear combinations occur: Forest-RC
combines features at the per tree level whereas RerF takes linear
combinations of coordinates at every node in the tree.
Depends: R (>= 3.3.0)
Depends: R (>= 3.3.0), Rcpp (>= 1.0.0)
License: Apache License 2.0 | file LICENSE
URL: https://github.com/neurodata/R-RerF
BugReports: https://github.com/neurodata/R-RerF/issues
Imports: parallel, RcppZiggurat, utils, stats, dummies
Imports: parallel, RcppZiggurat, utils, stats, dummies, mclust
Suggests: roxygen2 (>= 5.0.0), testthat
LinkingTo: Rcpp, RcppArmadillo
SystemRequirements: GNU make
Expand Down
5 changes: 5 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ export(FeatureImportance)
export(OOBPredict)
export(PackPredict)
export(Predict)
export(PrintTree)
export(RandMatBinary)
export(RandMatContinuous)
export(RandMatCustom)
Expand All @@ -18,8 +19,11 @@ export(RandMatTSpatch)
export(RerF)
export(StrCorr)
export(Urerf)
import(Rcpp)
importFrom(RcppZiggurat,zrnorm)
importFrom(dummies,dummy)
importFrom(mclust,Mclust)
importFrom(mclust,mclustBIC)
importFrom(parallel,clusterEvalQ)
importFrom(parallel,clusterExport)
importFrom(parallel,clusterSetRNGStream)
Expand All @@ -39,5 +43,6 @@ importFrom(stats,sd)
importFrom(utils,combn)
importFrom(utils,flush.console)
importFrom(utils,object.size)
importFrom(utils,tail)
importFrom(utils,write.table)
useDynLib(rerf)
22 changes: 20 additions & 2 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,30 @@
Changes in 2.0.2:
## Changes in 2.0.3:

* The `PrintTree` function has been added to aid in viewing the
cut-points, features, and other statistics in a particular tree of a
forest.

* Urerf now supports using the Bayesian information criterion (BIC) from
the `mclust` package for determining the best split.

* Feature importance calculations now correctly handle features whose
weight vectors parametrize the same line. Also, when the projection
weights are continuous we tabulate how many times a unique combination
of features was used, ignoring the weights.

* An issue where the `split.cpp` function split the data `A` into `{A, {}}`
has been resolved by computing equivalence within some factor of
machine precision instead of exactly.

## Changes in 2.0.2:

* The option `rho` in the RerF function has been re-named to `sparsity`
to match with the algorithm explanation.

* The default parameters sent to the RandMat\* functions now properly
account for categorical columns.

* The defualts have changed for the following parameters:
* The defaults have changed for the following parameters:
* `min.parent = 1`
* `max.depth = 0`
* `stratify = TRUE`
Expand Down
15 changes: 1 addition & 14 deletions R/BuildTree.R
Original file line number Diff line number Diff line change
Expand Up @@ -298,19 +298,6 @@ BuildTree <- function(X, Y, FUN, paramList, min.parent, max.depth, bagging, repl
# them accordingly
MoveLeft <- Xnode[1L:NdSize] <= ret$BestSplit

# Move samples left or right based on split
if (sum(MoveLeft) == 0 || sum(!MoveLeft) == 0) {
treeMap[CurrentNode] <- currLN <- currLN - 1L
ClassProb[currLN * -1, ] <- ClProb
NodeStack <- NodeStack[-1L] # pop node off stack
Assigned2Node[[CurrentNode]] <- NA # remove saved indexes
CurrentNode <- NodeStack[1L] # point to top of stack
if (is.na(CurrentNode)) {
break
}
next
}

Assigned2Node[[NextUnusedNode]] <- Assigned2Node[[CurrentNode]][MoveLeft]
Assigned2Node[[NextUnusedNode + 1L]] <- Assigned2Node[[CurrentNode]][!MoveLeft]

Expand Down Expand Up @@ -354,7 +341,7 @@ BuildTree <- function(X, Y, FUN, paramList, min.parent, max.depth, bagging, repl
currLN <- currLN * -1L
# create tree structure and populate with mandatory elements
tree <- list(
"treeMap" = treeMap[1L:NextUnusedNode - 1L], "CutPoint" = CutPoint[1L:currIN], "ClassProb" = ClassProb[1L:currLN, , drop = FALSE],
"treeMap" = treeMap[1L:(NextUnusedNode - 1L)], "CutPoint" = CutPoint[1L:currIN], "ClassProb" = ClassProb[1L:currLN, , drop = FALSE],
"matAstore" = matAstore[1L:matAindex[currIN + 1L]], "matAindex" = matAindex[1L:(currIN + 1L)], "ind" = NULL, "rotmat" = NULL,
"rotdims" = NULL, "delta.impurity" = NULL
)
Expand Down
108 changes: 96 additions & 12 deletions R/FeatureImportance.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,32 +4,115 @@
#'
#' @param forest a forest trained using the RerF function with argument store.impurity = TRUE
#' @param num.cores number of cores to use. If num.cores = 0, then 1 less than the number of cores reported by the OS are used. (num.cores = 0)
#' @param type character string specifying which method to use in
#' calculating feature importance.
#' \describe{
#' \item{'C'}{specifies that unique combinations of features
#' should be *c*ounted across trees.}
#' \item{'R'}{feature importance will be calculated as in *R*andomForest.}
#' \item{'E'}{calculates the unique projections up to *e*quivalence if
#' the vector of projection weights parametrizes the same line in
#' \eqn{R^p}.}
#' }
#'
#' @return feature.imp
#' @return a list with 3 elements,
#' \describe{
#' \item{\code{imp}}{The vector of scores/counts, corresponding to each feature.}
#' \item{\code{features}}{The features/projections used.}
#' \item{\code{type}}{The code for the method used.}
#' }
#'
#' @examples
#' library(rerf)
#' num.cores <- 1L
#' forest <- RerF(as.matrix(iris[, 1:4]), iris[[5L]], num.cores = 1L, store.impurity = TRUE)
#' feature.imp <- FeatureImportance(forest, num.cores = 1L)
#'
#' imp.C <- FeatureImportance(forest, num.cores, "C")
#' imp.R <- FeatureImportance(forest, num.cores, "R")
#' imp.E <- FeatureImportance(forest, num.cores, "E")
#'
#' fRF <- RerF(as.matrix(iris[, 1:4]), iris[[5L]],
#' FUN = RandMatRF, num.cores = 1L, store.impurity = TRUE)
#'
#' fRF.imp <- FeatureImportance(forest = fRF, num.cores = num.cores)
#'
#' @export
#' @importFrom parallel detectCores makeCluster clusterExport parSapply stopCluster
#' @importFrom utils object.size

FeatureImportance <- function(forest, num.cores = 0L) {
FeatureImportance <- function(forest, num.cores = 0L, type = NULL) {

## choose method to use for calculating feature importance
if(is.null(type)){
if(identical(forest$params$fun, rerf::RandMatRF)){
type <- "R"
} else if (identical(forest$params$fun, rerf::RandMatBinary)) {
type <- "E"
} else {
type <- "C"
}
}

num.trees <- length(forest$trees)
num.splits <- sapply(forest$trees, function(tree) length(tree$CutPoint))

unique.projections <- vector("list", sum(num.splits))
forest.projections <- vector("list")

idx.start <- 1L
## Iterate over trees in the forest to save all projections used
for (t in 1:num.trees) {
idx.end <- idx.start + num.splits[t] - 1L
unique.projections[idx.start:idx.end] <- lapply(1:num.splits[t], function(nd) forest$trees[[t]]$matAstore[(forest$trees[[t]]$matAindex[nd] + 1L):forest$trees[[t]]$matAindex[nd + 1L]])
idx.start <- idx.end + 1L
tree.projections <-
lapply(1:num.splits[t], function(nd) {
forest$trees[[t]]$matAstore[(forest$trees[[t]]$matAindex[nd] + 1L):forest$trees[[t]]$matAindex[nd + 1L]]
})

forest.projections <- c(forest.projections, tree.projections)
}
unique.projections <- unique(unique.projections)

CompImportanceCaller <- function(tree, ...) RunFeatureImportance(tree = tree, unique.projections = unique.projections)
## Calculate the unique projections used according to the distribution
## of weights
if (identical(type, "C")) {
message("Message: Computing feature importance as counts of unique feature combinations.\n")
## compute the unique combinations of features used in the
## projections
unique.projections <- unique(lapply(forest.projections, getFeatures))

CompImportanceCaller <- function(tree, ...) {
RunFeatureImportanceCounts(tree = tree, unique.projections = unique.projections)
}
varlist <- c("unique.projections", "RunFeatureImportanceCounts")
}

if (identical(type, "R")) {
message("Message: Computing feature importance for RandMatRF.\n")
## Compute the unique projections without the need to account for
## 180-degree rotations.
unique.projections <- unique(forest.projections)

CompImportanceCaller <- function(tree, ...) {
RunFeatureImportance(tree = tree, unique.projections = unique.projections)
}
varlist <- c("unique.projections", "RunFeatureImportance")
}

if (identical(type, "E")) {
message("Message: Computing feature importance for RandMatBinary.\n")
## compute the unique projections properly accounting for
## projections that differ by a 180-degree rotation.
unique.projections <- uniqueByEquivalenceClass(
forest$params$paramList$p,
unique(forest.projections)
)

CompImportanceCaller <- function(tree, ...) {
RunFeatureImportanceBinary(
tree = tree,
unique.projections = unique.projections
)
}
varlist <- c("unique.projections", "RunFeatureImportanceBinary")
}



if (num.cores != 1L) {
if (num.cores == 0L) {
Expand All @@ -41,7 +124,7 @@ FeatureImportance <- function(forest, num.cores = 0L) {
if ((utils::object.size(forest) > 2e9) |
.Platform$OS.type == "windows") {
cl <- parallel::makeCluster(spec = num.cores, type = "PSOCK")
parallel::clusterExport(cl = cl, varlist = c("unique.projections", "RunFeatureImportance"), envir = environment())
parallel::clusterExport(cl = cl, varlist = varlist, envir = environment())
feature.imp <- parallel::parSapply(cl = cl, forest$trees, FUN = CompImportanceCaller)
} else {
cl <- parallel::makeCluster(spec = num.cores, type = "FORK")
Expand All @@ -58,5 +141,6 @@ FeatureImportance <- function(forest, num.cores = 0L) {
sort.idx <- order(feature.imp, decreasing = TRUE)
feature.imp <- feature.imp[sort.idx]
unique.projections <- unique.projections[sort.idx]
return(feature.imp <- list(imp = feature.imp, proj = unique.projections))

return(feature.imp <- list(imp = feature.imp, features = unique.projections, type = type))
}
Loading

0 comments on commit 9873b68

Please sign in to comment.