Package recordSwapping has been moved to sdcMicro!
Please use package sdcMicro
and see ?sdcMicro::recordSwap()
or vignette('recrodSwapping')
for details and examples.\n
CRAN: https://github.com/sdcTools/sdcMicro/packages/sdcMicro/index.html\n
Development: https://github.com/sdcTools/sdcMicro
recordSwapping was an R-package for record swapping. Its aim was to develop a core library using pure C++
(11) code to implement targeted record swapping (TRS). Since version 1.0.1
, the functionality has been included in sdcMicro (versions >= 5.7.1
) where also any future development will take place.
The current files of the "core" library can be found in github.com/sdcTools/sdcMicro/tree/master/src/recordSwap.
This folder contains the pure C++
(11) code recordSwap.cpp
and recordSwap.h
which should be used if the functionality should be included in other software tools. The C++
code is not directly embedded using Rcpp
so that it can be more easily implemented into other projects which do not depend on R
libraries.
Within sdcMicro
, the library is called by an Rcpp
wrapper (src/recordSwap_R.cpp) which again is called from the top level R
-function recordSwap()
.
Below are the changes between versions shown; Versions > 1.0.1
will be documented in sdcMicro.
- fix bug when producing the log file.
- If a household has no suitable donor household once, then this household is discarded from swapping for all subsequent swaps in all other geographic hierarchies. Otherwise households which are
at-risk
and have no suitable donor at a specific geographic hierarchy can be swapped in lower geographic hierarchies. This leads to the household being swapped but the swap does not account for theat-risk
issue. - Improved inputs checks for
recordSwap()
, thanks @Kyoshido for PR 67093f7 - Corrected spelling on vignette and documentation, thanks @Kyoshido for PR 4dfb068
- included parameters
int &count_swapped_records
andint &count_swapped_hid
std::string log_file_name
to the cpp-functionrecordSwap()
which count the number of swapped records and swapped households. These parameters are convenience parameters for mu-Argus. - included parameter
std::string log_file_name
andlog_file_name
in the cpp-functionrecordSwap()
and R-functionrecordSwap()
respectively. Contains path for writing a log file. The log file contains a list of household IDs (hid
) which could not have been swapped and is only created if any such households exist. - Changed definition of parameter
k_anonymity
: now a household is treated ashigh-risk-household
if for at least 1 person in the householdcounts < k_anonymity
, was previouslycounts <= k_anonymity
. This definition is now consistent with parameterk
of functionsdcMicro::localSuppression()
.
- Made
recordSwap()
an S3 Method, which can also accept ansdcMicro
-object as input. - included unit tests for
recordSwap()
when usingsdcMicro
-objects - included unit tests for
infoLoss()
- cleaned up code and unit tests such that
R CMD check
runs without notes, warnings and errors - fixed warnings when compiling c++ code like
comparing unsigned integer ...
orunused variable ...
- fixed minor typos in README, help pages and vignettes
- included function
infoLoss()
to calculated various information loss measures afterrecordSwap()
was applied. FunctioninfoLoss()
is heavily inspired by functioncellKey::ck_cnt_measures()
but accepts the micro data as input instead of frequency or magnitude counts. - changed variable names and content of dummy data created by
createDat
. - updated documentation and fixed some typos.
- extended unit tests
- Argument
data
for R-functionrecordSwap()
does no longer need to have only integer values. Only variables needed for underlying C++ function defined through parameterhid
,hierarchy
,risk_variables
,similar
,carry_along
need to be integer type.
- Fixed bug where less households than indicated through the swaprate were drawn
- added version number to header file
- developed different method for distributing number of draws over all geographic hierarchies
- Changed parameter order in
R
andC++
functionrecordSwap()
to have a more consistent documentation. Function call toC++
function changed to
std::vector< std::vector<int> > recordSwap(std::vector< std::vector<int> > data, int hid,
std::vector<int> hierarchy,
std::vector< std::vector<int> > similar,
double swaprate,
std::vector< std::vector<double> > risk, double risk_threshold,
int k_anonymity, std::vector<int> risk_variables,
std::vector<int> carry_along,
int seed = 123456)
-
Improved interface of
R
functionrecordSwap()
. Now, parametershid
,hierarchy
,similar
andrisk_variables
can be used with column indices or column names ofdata
. Please note that indices inR
start with 1 but inC++
they start with0
. TheR
functionrecordSwap()
which is basically just a wrapper for theC++
function converts column names or indices into the correct format for theC++
function. So the call forrecordSwap()
fromR
expects indices starting from 1. -
added parameter
std::vector<int> carry_along
toC++
functionrecordSwap()
. Like the paramterhierarchy
,carry_along
expects column indices ofdata
and swaps these values in addition to the ones defined inhierarchy
. These variables do however not interfere with risk calculation, sampling or finding a donor. -
added parameter
carry_along
toR
functionrecordSwap()
, expects either column indices or column names -
added parameter
return_swapped_id
toR
functionrecordSwap()
. Ifreturn_swapped_id
isTRUE
an additional column will be returned which holds thehid
with which each record was swapped with. If this new column has the same value ashid
the record was not swapped. -
Improved documentation and vignette
-
Some parameter changes to the
C++
functionrecordSwap()
:similar
is now a vector of vectors allowing for multiple similarity profiles- changed
std::vector<int> risk
tostd::vector<int> risk_variables
for a more descriptive name - changed
int th
toint k_anonymity
for a more descriptive name - added parameter
double risk_threshold
which can be used to set a custom risk threshold. household whith a risk greater or equalrisk_threshold
is set as risky household (not yet supported byR
wrapper) - added parameter
std::vector<std::vector<double>> risk
which can be used as custom risks for each household in each hierarchy level (not yet supported byR
wrapper)
recordSwap(std::vector< std::vector<int> > data, std::vector<std::vector<int>> similar,
std::vector<int> hierarchy, std::vector<int> risk_variables, int hid,
int k_anonymity, double swaprate, double risk_threshold,
std::vector<std::vector<double>> risk, int seed = 123456)
- Fixed some bugs concerning the application of
swaprate
data
as well asrisk
are used internally such thatdata[0]
orrisk[0]
contain the micro data or risk over all hierarchies for the first individual,data[1]
orrisk[1]
contain the micro data or risk, over all hierarchies, for the second individual, and so on.- Documentation and parameter descriptions have been updated in the corresponding help files.
- First prototype version of record swapping
- Contains the function
recordSwap()
as the main function of this package recordSwap()
is anR
-wrapper which calls an underlyingRcpp
function where at the bottom a call to the C++ functionrecordSwap()
is made.
recordSwap(std::vector< std::vector<int>> data, std::vector<int> similar,
std::vector<int> hierarchy, std::vector<int> risk, int hid,
int th, double swaprate, int seed = 123456)
- The parameter
data
contains the household data and is used internally such thatdata[0]
contains values for each individual for the first column of the data,data[1]
contains the values for each individual for the second column of the data. This also implies thatdata[0][0]
addresses the first value of the first column and so on. - The procedure expects the data to be ordered by household ID (column
hid
). Ordering inside each household is irrelevant. - Various internal help functions are included. These are however not intended for flexible use and are only called from withing
recordSwap()
The functionality can be used by installing sdcMicro
and using recordSwap()
install.packages("sdcMicro")
The application of TRS using the record-swapping library is shown in a package vignette
vignette("recordSwapping", package = "sdcMicro")
which is also available online. Further information is available in the help-pages (?sdcMicro::recordSwap
)