From caa2f1fca49ebf9a00120faed9c63fb848344739 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Mon, 9 Sep 2024 16:39:54 -0400 Subject: [PATCH 01/19] adding my slide about benefits of sharing --- 04-Data_Sharing.Rmd | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/04-Data_Sharing.Rmd b/04-Data_Sharing.Rmd index 76cae3c..9efbad7 100644 --- a/04-Data_Sharing.Rmd +++ b/04-Data_Sharing.Rmd @@ -46,6 +46,19 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g117c57cc481_1_37") ``` +## Benefits of data sharing + +In addition to these benefits to yourself, data sharing has other far reaching benefits. It can help support faster advances in science and medicine, by reducing the need to collect new data, which reduces costs, time and effort, including the effort and burden that is required to collect data on or from patients. + +It also helps support researchers at institutes that do not have as many resources to collect data. + +Ultimately it can also therefore help patients benefit from research faster, as faster advances can be made through more efficient research. + +```{r, fig.align='center', echo = FALSE, fig.alt= "Data Sharing can also help with costs related to collecting data, reduces the time and effort to collect new data, including the burden on patients, it allows research to be more efficient, the same data can be used for multiple studies, which is especially helpful if combining different kinds of data and researchers don't necessarily have to ability to collect each kind of data, it supports researchers at insitutions that have less resources, and it helps patients get the benefits of research faster.’", out.width="100%"} +ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g2fddd2b0ce1_0_0") +``` + + ## Data repositories The best way to share your data is by putting it somewhere that others can download it (and it can be kept private when necessary). There's many repositories out there that handle this for you. From f11c5362a611411f0a93829816ed4d52da4006aa Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Mon, 9 Sep 2024 16:46:36 -0400 Subject: [PATCH 02/19] fixing some small things I noticed --- 04-Data_Sharing.Rmd | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/04-Data_Sharing.Rmd b/04-Data_Sharing.Rmd index 9efbad7..33d24a1 100644 --- a/04-Data_Sharing.Rmd +++ b/04-Data_Sharing.Rmd @@ -88,11 +88,12 @@ The journal you submit to may have a recommendation of one over another. If not, - [CyVerse Data Commons Repository](https://cyverse.org/data-commons) - [Data Dryad](https://datadryad.org/stash) - [FigShare](https://help.figshare.com/article/how-to-upload-and-publish-your-data) -- [ZENODO](https://help.zenodo.org/) +- [Zenodo](https://help.zenodo.org/) +- [GitHub](https://github.com/) ### Small datasets -Data sets that are small and atypical format can be published as supplementary files as a part of a manuscript. +Data sets that are small or have an atypical format can be published as supplementary files as a part of a manuscript. ## Data Submission tips @@ -108,12 +109,12 @@ To make your data truly shared, you need to take the time to make sure it is wel There are two files you should make sure to include to help describe and organize your data project: - [A main README file](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/documenting-analyses.html#readmes) that orients others to what is included in your data. -- A metadata file that samples that are included, how they are connected, and when appropriate following privacy ethics, describes clinical features. +- A metadata file that describes what data are included, how they are connected. - [Standards for genomic metadata](https://genestack.com/assets/pdfs/The%20importance%20of%20metadata%20in%20genomics%20and%20the%20FAIR%20principles%20ebook.pdf) ### Use consistent and clear names -- Make sure that sample and data IDs used are consistent across the project - make sure to include a metadata file that describes in detail your samples in a way that is clear without any prior knowledge of the project. +- Make sure that sample and data IDs used are consistent across the project - make sure to include a metadata file that describes your samples in a way that is clear to those who might not have any prior knowledge of the project. - Sample and data IDs should keep with standard formatting otherwise known in the field. - Features names should avoid using genomic coordinates as these may change with new genome versions. @@ -124,10 +125,12 @@ Reproducible projects are able to be re-run by others to obtain the same results **The main requirements for a reproducible project are:** - The data can be freely obtained from a repository (this maybe summarized data for the purposes of data privacy). -- The code can be freely obtained from GitHub (or another similar repository). -- The software versions used to obtain the results are made clear by documentation or providing a Docker container. +- The code can be freely obtained from [GitHub](https://github.com/) (or another similar repository). +- The software versions used to obtain the results are made clear by documentation or providing a [Docker](https://www.docker.com/) container (more advanced option). - The code and data are well described and organized with a system that is consistent. +Check out our [introductory](https://jhudatascience.org/Reproducibility_in_Cancer_Informatics/introduction.html) and [advanced](https://jhudatascience.org/Adv_Reproducibility_in_Cancer_Informatics/introduction.html) courses about reproducibility for more information. + ### Have someone else review your code and data! The best way to find out if your data are useable by others is to have someone else look it over! From 49e11910decbc3bbe29683aea603834f63ebb54d Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Mon, 9 Sep 2024 16:49:10 -0400 Subject: [PATCH 03/19] add some formatting to ethics ch --- 05-Data_Ethics.Rmd | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/05-Data_Ethics.Rmd b/05-Data_Ethics.Rmd index 183a54d..c561569 100644 --- a/05-Data_Ethics.Rmd +++ b/05-Data_Ethics.Rmd @@ -14,11 +14,23 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN ## What is data ethics? -Data ethics involves the consideration of data collection, maintenance, security, privacy, and sharing, and mindfulness about how our research can ultimately impact (or not impact as the case may be for research that lacks inclusivity and equity) research participants and other individuals. Importantly, we do not yet have established societal norms or protocols for every aspect of medical research, and many topics are still under debate especially when it comes to cutting edge research. However, general principles of ethics can be helpful and involve practices for research integrity, consideration for social justice, data security, and transparency. +Data ethics involves the consideration of: + +- data collection +- data maintenance +- data security +- data privacy +- data sharing + +It also involves mindfulness about how our research can ultimately impact (or not impact as the case may be for research that lacks inclusivity and equity) research participants and other individuals. + +Importantly, we do not yet have established societal norms or protocols for every aspect of medical research, and many topics are still under debate especially when it comes to cutting edge research. However, general principles of ethics can be helpful and involve practices for research integrity, consideration for social justice, data security, and transparency. ### Before and after research -Data ethics requires thoughtfulness *both* throughout the planning and research process to produce research that benefits society and does as little harm as possible, as well as mindfulness for what happens after the research is complete and published. Researchers need to consider both how their work will resolve unanswered questions and who the research might help, as well as consider how others might use or misuse their data, code, and results in the future [@lipworth_ethics_2017; @teoli_informatics_2021]. +Data ethics requires thoughtfulness *both* throughout the planning and research process to produce research that benefits society and does as little harm as possible, as well as mindfulness for what happens after the research is complete and published. + +Researchers need to consider both how their work will resolve unanswered questions and who the research might help, as well as consider how others might use or misuse their data, code, and results in the future [@lipworth_ethics_2017; @teoli_informatics_2021]. ### Considerations before From c7be9af8548fa1109492b6967e2fdb41c9511ee7 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Mon, 9 Sep 2024 16:50:55 -0400 Subject: [PATCH 04/19] adding some of Jody's thoughts --- 02-Data_Privacy.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/02-Data_Privacy.Rmd b/02-Data_Privacy.Rmd index 4de5adc..ad74dc9 100644 --- a/02-Data_Privacy.Rmd +++ b/02-Data_Privacy.Rmd @@ -19,7 +19,7 @@ Cancer research often involves the collection of information about research part ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g20f61f033e7_18_318") ``` -Note that these are general definitions and whether something counts as PII or PHI has to be evaluated in a case-by-case basis. +Note that these are general definitions and whether something counts as PII or PHI has to be evaluated in a case-by-case basis by an expert such as an Internal Review Board (IRB) member or compliance officer. ## PII (personal identifiable information) From 5978b0841f27e56a4caac6d8fc2bdf8817ad2aac Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Mon, 9 Sep 2024 17:13:01 -0400 Subject: [PATCH 05/19] adding diff between data privacy and security --- 02-Data_Privacy.Rmd | 12 ++++++++++++ book.bib | 12 ++++++++++++ 2 files changed, 24 insertions(+) diff --git a/02-Data_Privacy.Rmd b/02-Data_Privacy.Rmd index ad74dc9..66404c8 100644 --- a/02-Data_Privacy.Rmd +++ b/02-Data_Privacy.Rmd @@ -21,6 +21,18 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN Note that these are general definitions and whether something counts as PII or PHI has to be evaluated in a case-by-case basis by an expert such as an Internal Review Board (IRB) member or compliance officer. + +## Privacy vs Security + +So what exactly is privacy? There are a couple of major ways to think about this. + +The first is keeping other individuals from finding information about others from a legal stand point. In other words, there are legal restrictions like HIPAA to help protect the rights of individuals, by keeping others from accessing information about them. + +Beyond what is required by law, which may vary depending on what country you perform research in, there are ethical guidelines that define beyond legal ramifications, why someone should protect the privacy of data. In other words, the legal system defines what we have to do, while ethics defines what we should do. + +Data privacy has a close relationship with data security. Both are concerned with keeping the data from being accessed by those who should not have access. Security is however more concerned with the **actual process** of protecting the data from unauthorized people, as well as protecting the data from other forms of damage, while privacy is more concerned with who can access the data and use the data how [@bambauer_privacy_2013]. + + ## PII (personal identifiable information) PII (personal identifiable information) are aspects of a person that could allow you to identify a person. diff --git a/book.bib b/book.bib index ae7aa50..e12b61d 100644 --- a/book.bib +++ b/book.bib @@ -967,3 +967,15 @@ @article{broman_identification_2015 year = {2015}, pages = {2177--2186}, } + + +@article{bambauer_privacy_2013, + title = {Privacy {Versus} {Security}}, + volume = {103}, + language = {en}, + number = {3}, + journal = {THE JOURNAL OF CRIMINAL LAW \& CRIMINOLOGY}, + author = {Bambauer, Derek E}, + year = {2013}, + file = {Bambauer - Privacy Versus Security.pdf:/Users/carriewright/Zotero/storage/7EKT8DJ9/Bambauer - Privacy Versus Security.pdf:application/pdf}, +} From 9c762c3dcc07f6c7848477b1b59e8533ec3cb8a7 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Mon, 9 Sep 2024 17:21:51 -0400 Subject: [PATCH 06/19] adding breach notification stuff --- 02-Data_Privacy.Rmd | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/02-Data_Privacy.Rmd b/02-Data_Privacy.Rmd index 66404c8..3ba277a 100644 --- a/02-Data_Privacy.Rmd +++ b/02-Data_Privacy.Rmd @@ -75,7 +75,7 @@ What is the risk of PII getting into the hands of people it shouldn't? Why was t PII can pose a risk for identity theft, which can have financial, professional, criminal, and personal consequences [@dinardi_14_2022], as criminals can get loans and credit card in other people's names, as well as commit crimes under the guise of other people's identities. This can result in reputation loss and loss of opportunities. -In addition, the leak of PII can also pose a safety risk, as criminals can identify the likely locations of specific individuals if performing targeted crimes. +A leak of PII can also pose a safety risk, as criminals can identify the likely locations of specific individuals if performing targeted crimes. In addition, a leak of PII might breach patients’ trust in an organization's ability to keep their data safe and therefore may be less interested in engaging with the organization. ```{r, fig.align='center', echo = FALSE, fig.alt= "PII risk involves identity theft: creation of financial documents in someone else's name or criminal activity in someone else's name and safety risk: specific individuals can be found", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g20f61f033e7_18_484") @@ -128,7 +128,8 @@ Some PII is always PHI, like health insurance numbers or clinical data such as r PHI poses an additional risk rather than just typical PII. -That is because the health information related to PHI, can be used to determine if an individual has a particular condition or health risk and this information could be used against the individual when it comes to employment or insurance. +That is because the health information related to PHI, can be used to determine if an individual has a particular condition or health risk and this information could be used against the individual when it comes to employment or insurance. This is particularly an issue if conditions are not known by others or the condition is stigmatizing. + ```{r, fig.align='center', echo = FALSE, fig.alt= "PHI poses additional risks for employment and insurance. Future or current employers could discrimanate against people with certain health conditions, Insurance companies could enforce higher rates based on a preexisting condition.",out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g20f61f033e7_18_676") @@ -243,6 +244,10 @@ If compliance is not resolved, then the covered entity may have to pay fines. Currently if an individual is not aware of a violation the fine can be quite small, but if it is a repeated issue of willful neglect, they can be fined on the order of `$`50,000! If the entity committed the violation for malicious reasons for personal gain, they can face much higher fines, up to `$`250,000 and may face jail time of up to 10 years [@violations_2018]. +If it is deemed that a breach has occurred, the organization responsible for the breach is required to let affected individuals know. See [here](https://www.hhs.gov/hipaa/for-professionals/breach-notification/index.html) for more information. + + + ### Common Violations Common violations of HIPAA taken from @violations_2018 are: From e837bcd849033f6fc35f28393d6af4ece18ae95b Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Mon, 9 Sep 2024 17:52:28 -0400 Subject: [PATCH 07/19] adding Jody's thoughts on data security --- 03-Data_Security.Rmd | 64 +++++++++++++++++++++++--------------------- 1 file changed, 34 insertions(+), 30 deletions(-) diff --git a/03-Data_Security.Rmd b/03-Data_Security.Rmd index b06f9e6..0468382 100644 --- a/03-Data_Security.Rmd +++ b/03-Data_Security.Rmd @@ -101,11 +101,11 @@ Make sure you also only access secure WiFi networks. One way to ensure this is i ### Passwords -Some people suggest using sentences that are easy for you to remember, you could consider a line of lyrics from song or poem that you like, or maybe a movie. Modify part of it to include symbols and numbers [@passwords]. +Passwords need to be effective. This means they should be hard to guess by people who should not have them. Some people suggest using sentences that are easy for you to remember, you could consider a line of lyrics from song or poem that you like, or maybe a movie. Modify part of it to include symbols and numbers [@passwords]. Don't share your password and keep it safe! -If you have a Mac, you could consider storing it in your [Keychain](https://support.apple.com/en-ie/guide/mac-help/mchlf375f392/mac), alternatively if you have a different type of computer or don't like the Mac Keychain, consider [Dashlane](https://www.dashlane.com/) or other password manger services. Luckily both of these options do not come at any extra cost and can be helpful for storing all the passwords we use regularly safely. These are especially good options if your password is difficult for you to remember. +We highly suggest you consider a password manager to keep your passwords extra safe and secure. If you have a Mac, you could consider storing it in your [Keychain](https://support.apple.com/en-ie/guide/mac-help/mchlf375f392/mac), alternatively if you have a different type of computer or don't like the Mac Keychain, consider [Dashlane](https://www.dashlane.com/) or other password manger services. Luckily both of these options do not come at any extra cost and can be helpful for storing all the passwords we use regularly safely. These are especially good options if your password is difficult for you to remember. ```{r, fig.align='center', echo = FALSE, fig.alt= "Cartoon - One character says: Hey, what do you have there?. The other character says: Oh just bringing my passwords with me in case I forget. I’ve secured them carefully on paper with invisible ink, in a cypher with its own code, inside a fireproof box with a lock. The original character says: That’s very impressive. You could also just use a password manager. The other character says: Oh that might be good… because this fireproof box is quite heavy!", out.width= "100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1B4LwuvgA6aUopOHEAbES1Agjy7Ex2IpVAoUIoBFbsq0/edit#slide=id.g10e2895be5b_58_54") @@ -121,11 +121,11 @@ This process is used by the computer processor or CPU called a CPU cache, as wel This general process is also used for accessing data on servers or websites. For servers, a cache is often stored on the server side, while often for websites, a cache is stored locally on the user's computer in the long-term memory so that when users access the site again, access will be faster. Web browser caches can store your browsing history, cookies (a unique ID for the website to identify the user), as well as sensitive data. -Although this makes your work faster, caching poses some security risk. They provide additional locations where hackers could access your sensitive data. Furthermore, the data in caches are often not encrypted, making such data more vulnerable. One way to avoid the security risk associated with your cached data, is clearing your caches [@caching_security]. +Although this makes your work faster, caching poses some security risk. Caching provides additional locations where hackers could access your sensitive data. Furthermore, the data in caches are often not encrypted, making such data more vulnerable. One way to avoid the security risk associated with your cached data, is clearing your caches [@caching_security]. -In the case of a CPU or software cache, this can be important in case your laptop gets stolen, or if you decide to sell your laptop. The easiest way to clear such a cache is to simply shutdown your computer. If you have taken our computing course, you will learn that data stored in short-term memory (like RAM) requires electricity, and it will disappear when your computer is no longer connected to power [@caching_security]. Note that it can take a few minutes for such memory to disappear. +In the case of a CPU or software cache, security concerns are heightened if your laptop gets stolen, lost, or if you decide to sell your laptop. The easiest way to clear sa CPU or software cache is to simply shutdown your computer. If you have taken our computing course, you will learn that data stored in short-term memory (like RAM) requires electricity, and it will disappear when your computer is no longer connected to power [@caching_security]. Note that it can take a few minutes for such memory to disappear. -See [here](https://www.upguard.com/blog/cache) for instructions on how to clear browser caches. It's a good idea to clear your browser cache relatively often, and possibly more often if you access sensitive data on your computer regularly. +See [here](https://www.upguard.com/blog/cache) for instructions on how to clear browser caches. It's a good idea to clear your browser cache relatively often (ideally before anytime you take your computer out where it could be stolen or lost), and possibly more often if you access sensitive data on your computer regularly. ### External drives @@ -138,12 +138,12 @@ There are a few major reasons why external drives, especially flash and USB driv There are several strategies you can take to avoid these issues, if you must use such a drive [@drives_CISA_2019, @durken_how_2021]: -- never use a drive that you find randomly, ideally only use drives you get from a reputable manufacturer, even drives at conferences could pose a security risk -- use drives that have encryption (either buy drives that already have it, which is recommended or add encryption to your drive) -- disable AutoRun software which allows drives to automatically be opened. See [here](https://www.lifewire.com/disable-autorun-on-a-pc-153344) for how to do this. Note that this is mostly an issue for Windows computers. -- remove the drive when not in use (it improves the life of the drive) -- keep all your software up to date to get important security updates -- editing files directly on the drive can limit the life span of the drive, instead simply copy paste files back and forth from your computer +- Never use a drive that you find randomly. Ieally only use drives you get from a reputable manufacturer. Even drives at conferences could pose a security risk. +- Use drives that have encryption (either buy drives that already have it, which is recommended or add encryption to your drive). +- Disable AutoRun software which allows drives to automatically be opened. See [here](https://www.lifewire.com/disable-autorun-on-a-pc-153344) for how to do this. Note that this is mostly an issue for Windows computers. +- Remove the drive when not in use (it improves the life of the drive and therefore protects data on the drive from disappearing or getting damaged). +- Keep all your software up to date to get important security updates. +- Editing files directly on the drive can limit the life span of the drive, instead simply copy paste files back and forth from your computer. If you really need to use a drive that is questionable, check out this [article](https://www.popsci.com/safely-open-USB-flash-drive/) for practices to *more* safely open such a drive. @@ -151,7 +151,8 @@ If you really need to use a drive that is questionable, check out this [article] ## Data masking -Encryption is actually just one of the more complex methods of a larger concept called data masking for protecting sensitive data. There are other methods for obscuring parts of the data besides the complexity of encryption, such as the following: +Data masking refers to several several methods for protecting sensitive data. One example you might be familiar with is encryption. Encryption is a relatively complex method of data masking. There are other methods for obscuring parts of the data that are less complex, such as simply removing pieces of information, showing only parts of data values or shuffling the data. + ```{r, fig.align='center', echo = FALSE, fig.alt= "Data Masking Methods can include: deletion or nulling out by making a value in the data invisible to others. Masking out: showing only part of a data value, Number and date variance - changing or varying the data values by a small amount so that it is still meaningful but less traceable, Shuffling - moving around the data points (for example the order of participants) - again meaningful but less traceable, Substitution - changing a field that renders the value less traceable - for example using de-identification numbers instead of patient names, Encryption - complex encoding or translation of data values following a key of instructions", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g1016753ce66_0_0") @@ -159,7 +160,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN ### De-identification -Data de-identification is the process of removing any values that could be personally identifiable. In other words, it is the process of obscuring the identity of the individuals who have data values within a data set. +Data de-identification is the process of removing any values that could be used to identify an individual. In other words, it is the process of obscuring the identity of the individuals who have data values within a data set. ```{r, fig.align='center', echo = FALSE, fig.alt= "De-identification of data is a balance between removing identifiable information and preserving useful information. It requires an expert to understand the particular risks of that data.", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g20f61f033e7_18_1215") @@ -168,9 +169,9 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN Re-identification is the process of determining the identity of such individuals based on data [@de-identification_2022]. Typically an ID number is assigned to each individual to allow for re-identification. -The key for these ID numbers and the original identifier information should be stored in a safe manner with a limited number of individuals with access. The ID number should ***not*** be created by using any identifiable information (for example birthday -15*32 followed by the first digit of Zip code). In some cases people use what is called a [cryptographic hash function](https://en.wikipedia.org/wiki/Cryptographic_hash_function) which is a condensed representation of an individual's data (including identifiable information) to a standard ID length. Such functions are HIPAA compliant under special circumstances, such as restricted access because they are very difficult to decipher. These hash function IDs have an advantage of allowing for the same individual across different datasets to be tracked (but are not intended for re-identification), which can be helpful for longitudinal studies [@hipaa_regulations]. See [here](https://www.bricker.com/industries-practices/health-care/insights-resources/resource/hipaa-privacy-regulations-other-requirements-relating-to-uses-and-disclosures-of-protected-health-information-re-identification-%C2%A7-164514c-368) for additional information about HIPAA de-identification regulations. +The key for these ID numbers and the original identifier information should be stored in a safe manner with a limited number of individuals with access. The ID number should ***not*** be created by using any identifiable information (for example birthday -15*32 followed by the first digit of Zip code). In some cases people use what is called a [cryptographic hash function](https://en.wikipedia.org/wiki/Cryptographic_hash_function) which is a condensed representation of an individual's data (including identifiable information) to a standard ID length. Such functions are HIPAA compliant under special circumstances, such as restricted access because they are very difficult to decipher. These hash function IDs have an advantage of allowing for the same individual across different datasets to be tracked, which can be helpful for longitudinal studies [@hipaa_regulations]. Hash function IDs are not intended for re-identification. See [here](https://www.bricker.com/industries-practices/health-care/insights-resources/resource/hipaa-privacy-regulations-other-requirements-relating-to-uses-and-disclosures-of-protected-health-information-re-identification-%C2%A7-164514c-368) for additional information about HIPAA de-identification regulations. -While somewhat useful for protecting the identity of those included in data, de-identification is not necessarily robust to newer re-identification methods. Thus other security, privacy, and protection methods are necessary. +While somewhat useful for protecting the identity of those included in data, de-identification alone is not necessarily robust to newer re-identification methods. Thus other security, privacy, and protection methods are necessary. The HIPAA Privacy Rule specifies two methods or standards for de-identification: Safe Harbor and Expert Determination. @@ -182,14 +183,14 @@ While it is possible to de-identify your data yourself, unless you are an expert #### Safe harbor -This method is the more extreme of the two and results in more loss of data. -A list of 18 standard identifiers must be removed from the data. These are the same 18 PHI categories of identifiers that we described in the last chapter, including name, IP addresses, etc [@rights_ocr_guidance_2012]. Additional rules are required as well. For example zip codes may remain in the data if they are shortened to the initial 3 digits and meet certain population threshold criteria. Dates can only remain at the year level, with extra protection for individuals over 89 years of age. However other abbreviations, such as name initials are not permitted. +The safe harbor method is the more extreme of the two and results in more loss of data. +A list of 18 standard identifiers must be removed from the data. These are the same 18 PHI categories of identifiers that we described in the last chapter, including name, IP addresses, etc [@rights_ocr_guidance_2012]. Additional rules are required as well, see [here](https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html#safeharborguidance). For example zip codes may remain in the data if they are shortened to the initial 3 digits and meet certain population threshold criteria. Dates can only remain at the year level, with extra protection for individuals over 89 years of age. However other abbreviations, such as name initials are not permitted. #### Expert Determination -This method attempts to preserve more of the original data, and data is removed based on the risk of re-identification. As the name describes, an expert evaluates the risk for re-identification of the various values within the data including what the risk might be with if the data were to be combined with other data sources. Ultimately data is removed until there is only a very small risk of someone being able to re-identify the individuals with data values among the dataset. The expert records what methods they used an the analysis that they did to determine the risk. +The Expert Determination method attempts to preserve more of the original data, and data is removed based on the risk of re-identification. As the name describes, an expert evaluates the risk for re-identification of the various values within the data including what the risk might be with if the data were to be combined with other data sources. Ultimately data is removed until there is only a very small risk of someone being able to re-identify the individuals with data values among the dataset. The expert records what methods they used and the analysis that they did to determine the risk. @@ -203,7 +204,9 @@ An example of generalization would be that all individuals over a certain age co In some cases removing protected information can result in data loss that could impede research initiatives. However, generally the limitations of de-identification are the risks that they pose for re-identification. -For Expert Determination, there is no specific degree requirement or training to be considered an "expert", however typically they are individuals with a statistical background. There is also no specific threshold of what constitutes a very small risk. This is determined by the expert. Additionally, there is no required length of time that a determination expires. It is highly suggested that additional evaluations occur over time as the risks change as technology changes and as the uniqueness of the individuals within the dataset evolves. For example if there are much fewer people over a certain age still living in a sparsely populated area, they could become identifiable. +For the Safe Harbor method, de-identification based on 18 PHI categories does not necessarily protect from re-identification, particularly given the ubiquity of large data sets. Even though it is the more stringent method, Safe Harbor has privacy limitations. There are other aspects about the data that could be or could become identifiable. For example, if an individual has a very unique occupation (such as state senator) or combination of certain remaining characteristics, they could become identifiable. Another example is if an individual was a very unique clinical case. + +For the Expert Determination method, the determination of risk of re-identification is subjective. There is no specific degree requirement or training to be considered an "expert", however typically they are individuals with a statistical background and people who have had experience in making risk determinations. There is also no specific threshold of what constitutes a very small risk. This is determined by the expert. Additionally, there is no required length of time that a determination expires. It is highly suggested that additional evaluations occur over time as the risks change as technology changes and as the uniqueness of the individuals within the dataset evolves. For example if there are much fewer people over a certain age still living in a sparsely populated area, they could become identifiable. ```{r, fig.align='center', echo = FALSE, fig.alt= "De-identification can be tricky! This shows an example of a woman named Jane Doe who is 96, using an age group of 95-100 could be problematic if few people are in that age bracket in the region studied.", out.width="100%"} @@ -214,9 +217,6 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN There is also no standard single method to assess re-identification risk. Instead there are a set of methods that follow major principles of risk assessment. They are based on the ability to reproduce the data values for the individual, the availability of identifiable data values from other sources, and the uniqueness or distinguishability of the data values. However, whatever methods are used, the expert is required to document this and make it available to the OCR if they request it. - -Even safe harbor which is a more stringent method, has privacy limitations. There are other aspects about the data that could be or could become identifiable. For example, if an individual has a very unique occupation (such as state senator) or combination of certain remaining characteristics, they could become identifiable. Another example is if an individual was a very unique clinical case. - ### Encryption Encryption is one of the most well-known methods for keeping data safe. It is used as a last method in case unauthorized users can access to data, and it is also used to protect data when transferring it. @@ -290,20 +290,24 @@ For more about SSH and SSL: ## Data erasure -It turns out that when you delete a file (even after emptying the trash), it isn't as "deleted" as you might think. This is because when a file is deleted, the data for that file actually stays on the storage hardware, and it's simply the computer's ability to find the data that is hindered. However, there is software that can help people recover data on storage hardware. This can be a great security issue, as sensitive data can remain on people's hardware when they get a new computer or stop using particular data on a server. +Erasing data (data erasure) is not as easy as it may seem. It turns out that when you delete a file (even after emptying the trash), it isn't as "deleted" as you might think. This is because when a file is deleted, the data for that file actually stays on the storage hardware, and it's simply the computer's ability to find the data that is hindered. However, there is software that can help people recover data on storage hardware. This can be a great security issue, as sensitive data can remain on people's hardware when they get a new computer or stop using particular data on a server. -One method to ensure that the deleted data is really eliminated is to physically destroy the hardware that it was stored on. However, this isn't always necessary, as there are methods using software. This option is great because the hardware can be reused without allowing future users potential access to your data. As you might imagine, this is the preferred method for erasing data on shared computing resources like servers [@Holland_2020]. These methods erase the data by overwriting the data with random digital information [@wikipedia_erasure_2021]. +One method to ensure that the deleted data is really eliminated is to physically destroy the hardware that it was stored on. Alternatively, there are methods using software. Software methods are often optimal because the hardware can be reused without allowing future users potential access to your data. As you might imagine, software methods are preferred for erasing data on shared computing resources like servers [@Holland_2020]. These methods erase the data by overwriting the data with random digital information [@wikipedia_erasure_2021]. ## Data resiliency -If you are working with precious data directly on your laptop, in case something happens to a computer, it's a good idea to think about storing the data with multiple locations. +If you are working with precious data directly on your laptop, in case something happens to a computer, it’s a good idea to think about storing the data with multiple locations in case something happens to a computer > The traditional 3-2-1 backup rule recommends that you keep at least three copies of your data, two different copies stored on separate formats and at least one additional copy at an offsite location.[@durken_how_2021] For example, you might think about having one copy on your password protected laptop, one encrypted and zipped copy on your password protected external drive at home, and another encrypted copy on your password protected external drive in your office. -However, in the age of cloud computing and servers, this rule needs to be updated a bit. It has been suggested that if you make use of remote computing options, instead the 3 should indicate that there should be 3 additional copies to the original, 2 indicates that there should be copies stored in two different places on a server or two regions in a cloud (where possible), and the 1 indicates that you should have one copy of the data that is closer to where it was originally created just in case something happens with the server or cloud that you are using [@posey_modernizing_2021]. +However, in the age of cloud computing and servers, this rule needs to be updated a bit. It has been suggested that if you make use of remote computing options, instead there should be: + + - 3 additional copies of the data in addition to the original + - 2 different places on a server or 2 regions in a cloud (where possible) where the data is stored + - 1 copy of the data should be stored closer to where it was originally created just in case something happens with the server or cloud that you are using [@posey_modernizing_2021]. @@ -332,7 +336,7 @@ Here are some things to look out for: You can look up the sender to verify if the person is who you expect. If the email looks suspicious but still potentially real, you could contact the individual on linked-in or elsewhere to verify if the person actually contacted you. If their email does not match their name this is also extra suspicious. For example, if you get an email signed by George and the email is `Peter125@network.org`, this could very well be a phishing attempt. -2) Check the senders email. +2) Check the sender's email. Make sure that the email address from the sender looks like what you recognize. If you know the sender and they send you an email with an unusual email address, be careful. You can send them an email to their typical address to verify if it is really them. For example if your boss Karen, sends you an email from `kit345@TSU.edu` or `titan@hotmail.com` and typically her email is `karenw3@TSU.edu`, you should be suspicious. @@ -340,7 +344,7 @@ Make sure that the email address from the sender looks like what you recognize. Avoid clicking on links in emails as much as possible! If you know that your colleague is sending you a link and you see it right away, that is probably trustworthy, but if your admin sends you a link out of the blue, you should be careful. If you must click a link, first make sure that the link looks like what you would expect. Second, send a follow-up communication using another method (phone, slack, different email address), or email a different colleague that you know works with the individual to make sure the individual actually sent the link instead of a hacker. Make sure that you don't get the phone number or other information to validate if the individual really sent you the email from the suspicious email itself. -An example of these types of phishing methods is if your colleague hasn't told you that he is sending a Google doc link and you receive an email from him with a link, then do not click it before verifying that the person really intended to send it. Yet another example is if an administrator sends you a link for you to update your password. Typically they will instead have you go to whatever portal you need to go to manually on your own to update your password. Keep in mind that phishing criminals can make the emails look very legitimate! +An example of these types of phishing methods is if your colleague hasn't told you that he is sending a Google doc link and you receive an email from him with a link, then do not click it before verifying that the person really intended to send it. Yet another example is if an administrator sends you a link for you to update your password. Typically they will instead have you go to whatever portal you need to go to manually on your own to update your password. **Keep in mind that phishing criminals can make the emails look very legitimate!** Here is a [real example](https://www.csun.edu/it/phishing-examples) of a such a phishing email from California State University Northridge: @@ -418,7 +422,7 @@ See [here](https://web.archive.org/web/20220609154035/https://www.hipaajournal.c ## Summary -In summary, we covered the following concepts in this chapter: +In summary, we covered issues related to data security in this chapter. We presented the following concepts: - Authentication is the process of verifying the identity of users and servers in a communication. Users provide their credentials (username and password), while servers present certificates to confirm their identity. - Authorization is the process of ensuring that someone has permission to access a file or computing resource in a particular way. @@ -428,7 +432,7 @@ In summary, we covered the following concepts in this chapter: - Strong passwords should be used, preferably in the form of sentences with symbols and numbers. Password managers like Keychain, Dashlane, or other services can help securely store passwords. - Computers use caching to store recent data for faster access. Clearing caches regularly is essential to avoid security risks and potential exposure of sensitive data. - USB drives can pose security risks due to portability, malware, and memory issues. It's best to use reputable drives, enable encryption, disable AutoRun software, and remove the drive when not in use. -- Data masking obscures sensitive data to protect privacy. De-identification and encryption are common methods. De-identification removes identifiers, while encryption encodes data to prevent unauthorized access. +-Data masking obscures sensitive data to protect privacy. De-identification and encryption are common methods. De-identification removes identifiers, usually using the Safe Harbor or Expert Determination method. Encryption encodes data to prevent unauthorized access. - Encryption involves encoding data with keys to protect it from unauthorized access. Asymmetric encryption uses two keys (public and private), while symmetric encryption uses one key for both encryption and decryption. - SSL (Secure Socket Layer) and SSH (Secure Shell) are protocols that establish secure connections and encrypt data. SSL is commonly used for websites, while SSH is used to connect to remote computers. - When files are deleted, the data remains on storage hardware, posing a security risk. Software-based data erasure methods overwrite the data, ensuring it cannot be recovered, allowing hardware to be reused securely. From 3f3ddb51a72ac458d176d008a49693d7e4ee7682 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Mon, 9 Sep 2024 17:59:30 -0400 Subject: [PATCH 08/19] adding Jody's edits through ch4 data sharing --- 04-Data_Sharing.Rmd | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/04-Data_Sharing.Rmd b/04-Data_Sharing.Rmd index 33d24a1..b1889bc 100644 --- a/04-Data_Sharing.Rmd +++ b/04-Data_Sharing.Rmd @@ -46,6 +46,8 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g117c57cc481_1_37") ``` +4. It also provides more opportunties for others to replicate your results, which could help advance not only your career, but our understanding of science and medicine. + ## Benefits of data sharing In addition to these benefits to yourself, data sharing has other far reaching benefits. It can help support faster advances in science and medicine, by reducing the need to collect new data, which reduces costs, time and effort, including the effort and burden that is required to collect data on or from patients. @@ -65,7 +67,7 @@ The best way to share your data is by putting it somewhere that others can downl Below are some of the standard repositories for data you should consider. -**For a longer list of repositories, we also advise consulting this [Nature guidance on data repositories](https://www.nature.com/sdata/policies/repositories).** +**For a longer list of repositories, we also advise consulting this Guide on data repositories](https://www.nature.com/sdata/policies/repositories) published by Nature.** ### Genomic Data Repositories @@ -149,17 +151,17 @@ For more details on how to make data and code reproducible tips, see our [Intro [REDCap](https://www.project-redcap.org/) is a very widely used browser-based software application for managing surveys and databases. It is very often used for clinical data. In fact, it is so widely used that there is a [conference](https://i2b2transmart.org/redcapcon2022/) dedicated to it. -REDCap allows for multi-institutional work, as well as compliance with HIPAA, [21 CFR Part 11](https://www.ecfr.gov/current/title-21/chapter-I/subchapter-A/part-11) for data for the FDA, [FISMA](https://www.techtarget.com/searchsecurity/definition/Federal-Information-Security-Management-Act) for government data, HIPAA, and [GDPR](https://gdpr-info.eu/) for data for the European Union. It was developed by a team at Vanderbilt University in 2004. It is not open-source, however it is free to use for non-commercial research [@redcap_2022]. +REDCap is a platform that allows for multi-institutional work and is compliant with multiple regulations including HIPAA, [21 CFR Part 11](https://www.ecfr.gov/current/title-21/chapter-I/subchapter-A/part-11) (FDA data), [FISMA](https://www.techtarget.com/searchsecurity/definition/Federal-Information-Security-Management-Act) (government data), and [GDPR](https://gdpr-info.eu/) (data for the European Union). It was developed by a team at Vanderbilt University in 2004. It is not open-source, however it is free to use for non-commercial research [@redcap_2022]. You can find out more about how to use REDCap at the [REDCap website](https://projectredcap.org/) which includes instructional [videos](https://projectredcap.org/resources/videos/) and other resources. -There are several things to keep in mind when using REDCap from an ethical standpoint. +There are several things to keep in mind when using REDCap to ensure that data privacy and security are protected. 1) Roles -REDCap allows for various roles to be established for users on a project. Thus access to certain data and tasks can be restricted to certain individuals. As described previously, it is a good idea to restrict access to the smallest number of individuals necessary. +REDCap allows for various roles to be established for users on a project. Thus access to certain data and tasks can be restricted to certain individuals. As described previously, according to the Principal of Least Privilege, it is a good idea to restrict access to the smallest number of individuals necessary. -You can modify these roles using the `User Rights` menu. +You can modify roles using the `User Rights` menu. ```{r, fig.align='center', echo = FALSE, fig.alt= "REDCap User Rights Menu", out.width="100%"} ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g133b14b2804_28_17") @@ -171,7 +173,7 @@ This will first show you who has what role on the project and their rights. You ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g133b14b2804_28_22") ``` -These roles should be verified by your institutional review board (IRB) before beginning a study. Changes to roles should also be reviewed by your IRB. +Roles should be verified by your institutional review board (IRB) before beginning a study. Changes to roles should also be reviewed by your IRB. 2) Reports @@ -220,5 +222,5 @@ In summary, in this chapter we covered the following concepts: - There are many data repositories where you can store and share your data, including general repositories like Data Dryad and FigShare, and repositories specific to certain data types like genomics or imaging data. - When sharing data, be sure to organize and document your data well with things like a README file, consistent naming conventions, and metadata. Follow reproducibility practices whenever possible. - Tools like REDCap can help manage clinical data while ensuring security, privacy, and reproducibility through features like role-based access controls, data auditing, and locking data after collection. -- Checking with your IRB first before sharing data, sharing code, or using new tools can help ensure that data is shared and accessed responsibly. Ideally such plans should be reviewed by your IRB before you begin a study. It is often possible to safely publicly share the code used to analyze protected data, as long as you don't reveal aspects of the the data in code. Your local IRB may be able to help you learn how to do so. +- Checking with your IRB first before sharing data, sharing code, or using new tools can help ensure that data is shared and accessed responsibly. Ideally such plans should be reviewed by your IRB before you begin a study. It is often possible to safely publicly share the code used to analyze protected data, as long as you don't reveal aspects of the data in the code. Your local IRB may be able to help you learn how to do so. From 8bc0ba6fc0ec2fd95911209b6013805aa64323f5 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Thu, 12 Sep 2024 14:10:32 -0400 Subject: [PATCH 09/19] adding another point fro Jody --- 02-Data_Privacy.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/02-Data_Privacy.Rmd b/02-Data_Privacy.Rmd index 3ba277a..c31f4fe 100644 --- a/02-Data_Privacy.Rmd +++ b/02-Data_Privacy.Rmd @@ -187,7 +187,7 @@ So what does this mean for the data you handle? - Summarized cohort data - Data in which individuals have been aggregated together is generally safe. For example, a file that includes an average age calculated across all individuals or a large subset would generally be considered safe. However, this may not always be the case with individuals with very rare conditions. + Data in which individuals have been aggregated together is generally safe. For example, a file that includes an average age calculated across all individuals or a large subset would generally be considered safe. However, this may not always be the case with individuals with very rare conditions. There can also be exceptions to the assumption of safety and/or anonymity when cohort data involves specific groups of people. - De-identified data From 8b7cfd7bae9f6af9abcdb66e0b887a69a6a402aa Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Thu, 12 Sep 2024 14:29:28 -0400 Subject: [PATCH 10/19] adding more from Jody --- 05-Data_Ethics.Rmd | 8 +++++--- book.bib | 17 ++++++++++++++++- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/05-Data_Ethics.Rmd b/05-Data_Ethics.Rmd index c561569..cb3fa24 100644 --- a/05-Data_Ethics.Rmd +++ b/05-Data_Ethics.Rmd @@ -24,7 +24,8 @@ Data ethics involves the consideration of: It also involves mindfulness about how our research can ultimately impact (or not impact as the case may be for research that lacks inclusivity and equity) research participants and other individuals. -Importantly, we do not yet have established societal norms or protocols for every aspect of medical research, and many topics are still under debate especially when it comes to cutting edge research. However, general principles of ethics can be helpful and involve practices for research integrity, consideration for social justice, data security, and transparency. +Importantly, we do not yet have established societal norms or protocols for every aspect of medical research, particularly with respect to new types of data and new technologies, and many topics are still under debate especially when it comes to cutting edge research. However, general principles of ethics can be helpful and involve practices for research integrity, consideration for social justice, data security, and transparency. Health care and research ethics can also be helpful in evaluating practices for data management and use. + ### Before and after research @@ -36,7 +37,7 @@ Researchers need to consider both how their work will resolve unanswered questio Ethical research should involve consideration of how data should be collected, so that certain individuals are not left out of reaping the benefits of important research. For example, women, non-binary individuals, disabled individuals, and people of certain ethnic backgrounds, and intersections of various demographic factors have been historically left out of clinical trials or when included, their data was inadequately recorded [@clark_increasing_2019]. For example, clinical trials often have questions about sex or gender with limited binary options (overlooking [people without a binary sex](https://en.wikipedia.org/wiki/Intersex) and [non-binary gendered](https://en.wikipedia.org/wiki/Non-binary_gender) individuals) resulting in a lack of collection of important information that could impact clinical outcomes, research results, and communication about results [@chen_approach_2019]. -Beyond this, even basic studies have historically often neglected to evaluate female animal models which can provide a greater understanding of how the research may successfully translate to more individuals. Yet another example is the historical lack of diversity in genomic reference datasets. To learn more about how social injustice, sexism and other societal aspects have influence bioethical and therefore data ethics practices, see @Farmer_2004. +Beyond this, even basic studies have historically often neglected to evaluate female animal models which can provide a greater understanding of how the research may successfully translate to more individuals. Yet another example is the historical lack of diversity in genomic reference datasets. To learn more about how social injustice, sexism, and other societal aspects have influenced bioethical and therefore data ethics practices, see @Farmer_2004. ## After Considerations @@ -53,13 +54,14 @@ In some cases open awareness about patients with certain types of cancers or dis However, such information can put these individuals at risk for difficulty with insurance and employment, as well as at risk for other forms of discrimination. Furthermore, research data often also contains basic information about individuals, such as their address, which can be potentially deleterious for the safety of those individuals. New forms of research data from apps on our phone such as social media data collection, can pose more complicated risks based on data collection about the behaviors of research participants [@seh_breaches_2020]. -Beyond the risk that data breaches pose to research participants, such breaches also cause harm to the research institutes where the breach occurred. Reputations and funding opportunities can be greatly compromised. +Beyond the risk that data breaches pose to research participants, such breaches also cause harm to the research institutes where the breach occurred. Reputations and funding opportunities can be greatly compromised. Transparency and/or informed consent are discussed below as ways to mitigate these risks. Why else does data protection matter at the individual level? If data gets manipulated or corrupted, this can result in false research findings, altered treatment plans by physicians, and more @seh_breaches_2020. +If patients are concerned that information will be used against them, there is some evidence that they are less likely to be forthcoming and honest with their providers. This poses concerns for data quality as well as trust in clinicians and health systems [@nong_discrimination_2022]. We will discuss what can be done to reduce the risks of research participants and others from your research. diff --git a/book.bib b/book.bib index e12b61d..23e687d 100644 --- a/book.bib +++ b/book.bib @@ -977,5 +977,20 @@ @article{bambauer_privacy_2013 journal = {THE JOURNAL OF CRIMINAL LAW \& CRIMINOLOGY}, author = {Bambauer, Derek E}, year = {2013}, - file = {Bambauer - Privacy Versus Security.pdf:/Users/carriewright/Zotero/storage/7EKT8DJ9/Bambauer - Privacy Versus Security.pdf:application/pdf}, +} + +@article{nong_discrimination_2022, + title = {Discrimination, trust, and withholding information from providers: {Implications} for missing data and inequity}, + volume = {18}, + issn = {2352-8273}, + shorttitle = {Discrimination, trust, and withholding information from providers}, + url = {https://www.sciencedirect.com/science/article/pii/S2352827322000714}, + doi = {10.1016/j.ssmph.2022.101092}, + abstract = {Quality care requires collaborative communication, information exchange, and decision-making between patients and providers. Complete and accurate data about patients and from patients are especially important as high volumes of data are used to build clinical decision support tools and inform precision medicine initiatives. However, systematically missing data can bias these tools and threaten their effectiveness. Data completeness relies in many ways on patients being comfortable disclosing information to their providers without prohibitive concerns about security or privacy. Patients are likely to withhold information in the context of low trust relationships with providers, but it is unknown how experiences of discrimination in the healthcare system also relate to non-disclosure. In this study, we assess the relationship between withholding information from providers, experiences of discrimination, and multiple types of patient trust. Using a nationally representative sample of US adults (n = 2,029), weighted logistic regression modeling indicated a statistically significant relationship between experiences of discrimination and withholding information from providers (OR 3.7; CI [2.6–5.2], p {\textless} .001). Low trust in provider disclosure of conflicts of interest and low trust in providers' responsible use of health information were also positively associated with non-disclosure. We further analyzed the relationship between non-disclosure and the five most common types of discrimination (e.g., discrimination based on race, education/income, weight, gender, and age). We observed that all five types were statistically significantly associated with non-disclosure (p {\textless} .05). These results suggest that experiences of discrimination and specific types of low trust have a meaningful association with a patient's willingness to share information with their provider, with important implications for the quality of data available for medical decision-making and care. Because incomplete information can contribute to lower quality care, especially in the context of data-driven decision-making, patients experiencing discrimination may be further disadvantaged and harmed by systematic data missingness in their records.}, + urldate = {2024-09-12}, + journal = {SSM - Population Health}, + author = {Nong, Paige and Williamson, Alicia and Anthony, Denise and Platt, Jodyn and Kardia, Sharon}, + month = jun, + year = {2022}, + pages = {101092}, } From 1d52132efc0dbdbb2986a664e81c03d069ee86a0 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Thu, 12 Sep 2024 15:45:10 -0400 Subject: [PATCH 11/19] trying to add belmont report --- 05-Data_Ethics.Rmd | 77 +++++++++++++++++++++++++++++++++++++++++++--- book.bib | 12 ++++++++ 2 files changed, 85 insertions(+), 4 deletions(-) diff --git a/05-Data_Ethics.Rmd b/05-Data_Ethics.Rmd index cb3fa24..7ca0849 100644 --- a/05-Data_Ethics.Rmd +++ b/05-Data_Ethics.Rmd @@ -63,10 +63,14 @@ If data gets manipulated or corrupted, this can result in false research finding If patients are concerned that information will be used against them, there is some evidence that they are less likely to be forthcoming and honest with their providers. This poses concerns for data quality as well as trust in clinicians and health systems [@nong_discrimination_2022]. +Perpetuation of inequity is often cyclical. Considerations before research shape our options after research. For example, if people are excluded from the research process, data models are more likely to be biased against those populations. + We will discuss what can be done to reduce the risks of research participants and others from your research. + + ## Data ethics history To have an understanding of current theories about how to best deal with our research ethic conundrums it is helpful to be aware of the history of biomedical research in general. @@ -90,9 +94,9 @@ See [here](https://jhudatascience.org/Informatics_Research_Leadership/promoting- -## General Ethics Code +## Principles of Bioethics -Several general concepts for Healthcare ethics, and by extension medical research ethics have been described in several commonly used ways, including the four pillars and the seven guiding principles. +Several general concepts for Healthcare ethics, and by extension medical research ethics have been described in several commonly used ways, including the four pillars and the seven guiding principles. In the wake of medical and scientific abuses during WWII and beyond, several ethical prinicples and codes emerged. The Belmont Report (1979) defines the core bioethical pillars that drive ethical analysis in healthcare and research even today. ### The [four pillars](https://educationprojects.co.uk/medical-ethics-ethical-dilemmas-in-healthcare/) (this discussion is from @melvin_medical_2020): @@ -123,7 +127,8 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN ### The NIH Clinical Center [Seven Principles](https://www.nih.gov/health-information/nih-clinical-research-trials-you/guiding-principles-ethical-research) -This discussion is from @Principles_2015. +The NIH published its @Principles_2015, which stem from the four pillars described above to provide a framework for ensuring the protection of people who volunteer for clinical research. + 1)**Social and clinical value** @@ -156,10 +161,30 @@ Individuals should be treated with respect for the entirety of the process inclu - respecting their right to change their mind, including providing them any new information about risks or benefits that might cause them to change their mind - respecting their welfare and providing treatment if needed and removing individuals for their welfare if needed - respecting their welfare and their right to knowledge by letting them know what was learned from the research + +## Ethical Principles for Data + +These guidelines are also very useful for ensuring inclusive, transparent, open, and respectful data management practices: + +- [CARE Principles for Indigenous Data Governance](https://www.gida-global.org/care), which largely focus on the self-determination of indigenous people and the usage of their data, as well as consideration for the impact and purpose of data: + + - **C** stands for: Collective Benefit + - **A** stands for: Authority to Control + - **R** stands for: Responsibility + - **E** stands for: Ethics + +- [FAIR Principles](https://www.go-fair.org/fair-principles/) aim to promote open data sharing: + + - **F** stands for: Findable + - **A** stands for: Accessible + - **I** stands for: Interoperable + - **R** stands for: Reusable + +It is encouraged to consider both the CARE and FAIR principles together. ## Concept of Consent -We have already talked about the concept of informed consent. Obtaining consent should also include (based on @IRB_Iowa and @wikipedia_informed_2023 and the author's thoughts): +We have already talked about the concept of informed consent. Obtaining consent should also include the following elements (based on @IRB_Iowa and @wikipedia_informed_2023 and the author's thoughts): - Individuals should not feel pressured and should have adequate time to make the decision. - Individuals should not experience undue influence, be coerced or be manipulated - They should not feel pressured by the individual recruiting, such as a boss or someone else of power or by offers in exchange for participation that would sway the decision. @@ -168,6 +193,7 @@ We have already talked about the concept of informed consent. Obtaining consent - Individuals should have the capacity to understand the risks and benefits (this involves consideration for language barriers, intellectual capacity, emotional capacity, stress, sleep loss and other forms of physical strain) - Individuals should be able to withdraw consent at anytime - Individuals should be respected through-out the process including consideration for the cultural values of the recruited populations + - Consent forms and processes should be reviewed by people with diverse expertise, such as understanding of ethics, equity, and patients and community experience ## Medical Ethics Timeline @@ -199,6 +225,49 @@ See [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2280818/pdf/canfamphys00 The United Nations adopted the concept of "free consent" (similar to informed consent) into international law [@wikipedia_informed_2023]. +### The Belmont Report (1979) + +The [Belmont Report](https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf) was written to describe guidelines for human subjects in biomedical and behavioral research. The report aims to provide a general framework for ethical consideration of research. It states that: + +> These principles cannot always be applied so as to resolve beyond dispute particular ethical problems. The objective is to provide an analytical framework that will guide the resolution of ethical problems arising from research involving human subjects [@belmont_1979]. + +Here we briefly describe some of the major aspects of the report [@belmont_1979]. + +There are 3 ethical principals defined: + +1) Respect for Persons + +People should be allowed autonomy to use their judgement to make decisions for themselves. Those that cannot make all decisions for themselves, such as children or those who are incapacitated should be protected. + +2) Beneficence + +Harm to human subjects should be minimized and benefits should be maximized. + +3) Justice + +Benefits and burdens of research should be distributed equally. + +> Justice demands both that these not provide advantages only to those who can afford them and that such research should not unduly involve persons from groups unlikely to be among the beneficiaries of subsequent applications of the research [@belmont_1979] + +The application of these principals should involve the following: + +1) Informed Consent + +Consent should involve: information, comprehension, and voluntariness. + +2) Assessment of Risks and Benefits + +Potential risks and benefits should be thoroughly evaluated, including if human subjects are truly necessary. + +> Benefits and risks must be "balanced" and shown to be "in a favorable ratio." [@belmont_1979] + +3) Selection of Subjects + +There must be fair procedures and outcomes in the selection of research subjects. Less burdened individuals should be called upon first to take on research burdens. + +Individuals who might be in conditions where they might be utilized for research more readily (such as those who are incarcerated or institutionalized), should be protected. + + ### Health Insurance Portability and Accountability Act (HIPAA) (1996) Medical confidentially became law in the United States. Protected health information and identifiable health information must not be shared with anyone outside of certain covered entities without consent. Covered entities include: clinicians, insurance companies, and health care government agencies. diff --git a/book.bib b/book.bib index 23e687d..6ffe8d6 100644 --- a/book.bib +++ b/book.bib @@ -994,3 +994,15 @@ @article{nong_discrimination_2022 year = {2022}, pages = {101092}, } + +@article{belmont_1979, + title = {The {Belmont} {Report}}, + url = {https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf}, +} + year = {1979}, + date = {1979}, + abstract = {On July 12, 1974, the National Research Act (Pub. L. 93-348) was signed into law, there-by creating the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. One of the charges to the Commission was to identify the basic ethical principles that should underlie the conduct of biomedical and behavioral research involving human subjects and to develop guidelines which should be followed to assure that such research is conducted in accordance with those principles. In carrying out the above, the Commission was directed to consider: (i) the boundaries between biomedical and behavioral research and the accepted and routine practice of medicine, (ii) the role of assessment of risk-benefit criteria in the determination of the appropriateness of research involving human subjects, (iii) appropriate guidelines for the selection of human subjects for participation in such research and (iv) the nature and definition of informed consent in various research settings.}, + language = {en}, + author = {{The} {National} {Commission} {for} {the} {Protection} {of} {Human} {Subjects} {of} {Biomedical} {and} {Behavior Research}}, + urldate = {2024-09-12}, +} \ No newline at end of file From cfa17b0c5b75c86b6bb7547d2ef81489d46c1423 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Thu, 12 Sep 2024 16:07:23 -0400 Subject: [PATCH 12/19] more on helsinki --- 05-Data_Ethics.Rmd | 8 ++++- book.bib | 81 +++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 87 insertions(+), 2 deletions(-) diff --git a/05-Data_Ethics.Rmd b/05-Data_Ethics.Rmd index 7ca0849..771014d 100644 --- a/05-Data_Ethics.Rmd +++ b/05-Data_Ethics.Rmd @@ -221,6 +221,12 @@ The United States [American Medical Association](https://code-medical-ethics.ama See [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2280818/pdf/canfamphys00158-0229.pdf) by @higgins_history_1989 and [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7132445/) by @moskop_hippocrates_2005 for more about the history of medical ethics codes. +### Declaration of Helsinki (1964) + +The [Declaration of Helsinki](https://www.wma.net/what-we-do/medical-ethics/declaration-of-helsinki/) was published by the World Medical Association (WMA) and is considered "the world’s most widely recognised ethical principle for medical research involving humans" [@kurihara_declaration_2024]. It describes a set of principals for "medical research involving human subjects, including research on identifiable human material and data." It has been amended several times and the WMA aims to keep it up to date. + +It outlines that research subjects welfare is the priority, that they have a right to self determination and the right to informed consent. Risks and benefits should be carefully considered and research should be discontinued if risks are determined to be to high [@wma_1964]. + ### International Covenant on Civil and Political Rights (1966) The United Nations adopted the concept of "free consent" (similar to informed consent) into international law [@wikipedia_informed_2023]. @@ -346,7 +352,7 @@ While personality traits were identified, the above reasons suggest that if rese ## Consequences of research misconduct -Research misconduct either due to malicious intent or unintentional neglect can have far reaching consequences. This section is based on @davis_causal_2007 and @national_academies_of_sciences_incidence_2017. +Research misconduct either due to malicious intent or unintentional neglect can have far reaching consequences. This section is based on @davis_causal_2007 and @national_academies_of_sciences_2017. diff --git a/book.bib b/book.bib index 6ffe8d6..c131f5a 100644 --- a/book.bib +++ b/book.bib @@ -542,7 +542,7 @@ @misc{myheritage_2018 } -@book{national_academies_of_sciences_detailed_2017, +@book{national_academies_of_sciences_2017, title = {Detailed {Case} {Histories}}, url = {https://www.ncbi.nlm.nih.gov/books/NBK475955/}, abstract = {The following five detailed case histories of specific cases of actual and alleged research misconduct are included in an appendix to raise key issues and impart lessons that underlie the committee's findings and recommendations without breaking up the flow of the report. In several cases, including the translational omics case at Duke University and the Goodwin case at the University of Wisconsin, the committee heard directly from some of those involved.}, @@ -1005,4 +1005,83 @@ @article{belmont_1979 language = {en}, author = {{The} {National} {Commission} {for} {the} {Protection} {of} {Human} {Subjects} {of} {Biomedical} {and} {Behavior Research}}, urldate = {2024-09-12}, +} + + +@misc{wma_1964, + title = {{WMA} - {The} {World} {Medical} {Association}-{Declaration} of {Helsinki}}, + url = {https://www.wma.net/what-we-do/medical-ethics/declaration-of-helsinki/}, + abstract = {WMA - The World Medical Association}, + language = {en-US}, + urldate = {2024-09-12}, + author = {{The} {World} {Medical} {Association}}, + year = {1964}, +} + +@article{kumar_theoretical_2010, + title = {A {Theoretical} {Comparison} of the {Models} of {Prevention} of {Research} {Misconduct}}, + volume = {17}, + issn = {0898-9621}, + url = {https://doi.org/10.1080/08989621003641132}, + doi = {10.1080/08989621003641132}, + abstract = {The current methods of dealing with research misconduct involve detection and rectification after the incident has already occurred. This method of monitoring scientific integrity exerts considerable negative effects on the concerned persons and is also wasteful of time and resources. Time has arrived for research administrators to focus seriously on prevention of misconduct. In this article, preventive models suggested earlier by Weed and Reason have been combined to arrive at six models of prevention. This is an effort to streamline the thinking regarding misconduct prevention, so that the advantages and disadvantages of each can be weighed and the method most appropriate for the institute chosen.}, + number = {2}, + urldate = {2023-06-01}, + journal = {Accountability in Research}, + author = {Kumar, Malhar N.}, + month = mar, + year = {2010}, + pmid = {20306348}, + note = {Publisher: Taylor \& Francis +\_eprint: https://doi.org/10.1080/08989621003641132}, + keywords = {models of prevention, prevention of research misconduct, research ethics, research misconduct investigation}, + pages = {51--66}, +} + +@article{mousavi_review_2020, + title = {A review of the current concerns about misconduct in medical sciences publications and the consequences}, + volume = {28}, + issn = {1560-8115}, + url = {https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7214560/}, + doi = {10.1007/s40199-020-00332-1}, + abstract = {Background +In the new era of publication, scientific misconduct has become a focus of concern including extreme variability of plagiarism, falsification, fabrication, authorship issues, peer review manipulation, etc. Along with, overarching theme of “retraction” and “predatory journals” have emphasized the importance of studying related infrastructures. + +Methods +Information used in this review was provided through accessing various databases as Google Scholar, Web of Science, Scopus, PubMed, Nature Index, Publication Ethics and Retraction Watch. Original researches, expert opinions, comments, letters, editorials, books mostly published between 2010 and 2020 were gathered and categorized into three sections of “Common types of misconduct”,” Reasons behind scientific misconduct” and “Consequences”. Within each part, remarkable examples from the past 10 years cited in Retraction Watch are indicated. At last, possible solution on combating misconduct are suggested. + +Results +The number of publications are on the dramatic rise fostering a competition under which scholars are pushed to publish more. Consequently, due to several reasons including poor linguistic and illustration skills, not adequate evaluation, limited experience, etc. researchers might tend toward misbehavior endangering the health facts and ultimately, eroding country, journal/publisher, and perpetrator’s creditability. The reported incident seems to be enhanced by the emergence of predatory with publishing about 8 times more papers in 2014 than which is in 2010. So that today, 65.3\% of paper retraction is solely attributing to misconduct, with plagiarism at the forefront. As well, authorship issues and peer-review manipulation are found to have significant contribution besides further types of misconduct in this duration. + +Conclusion +Given the expansion of the academic competitive environment and with the increase in research misconduct, the role of any regulatory sector, including universities, journals/publishers, government, etc. in preventing this phenomenon must be fully focused and fundamental alternation should be implemented in this regard.}, + number = {1}, + urldate = {2023-06-01}, + journal = {DARU Journal of Pharmaceutical Sciences}, + author = {Mousavi, Taraneh and Abdollahi, Mohammad}, + month = feb, + year = {2020}, + pmid = {32072484}, + pmcid = {PMC7214560}, + pages = {359--369}, + annote = {undefined}, +} + + +@article{kurihara_declaration_2024, + title = {Declaration of {Helsinki}: ethical norm in pursuit of common global goals}, + volume = {11}, + issn = {2296-858X}, + shorttitle = {Declaration of {Helsinki}}, + url = {https://www.frontiersin.org/journals/medicine/articles/10.3389/fmed.2024.1360653/full}, + doi = {10.3389/fmed.2024.1360653}, + abstract = {{\textless}p{\textgreater}The World Medical Association’s Declaration of Helsinki is in the process of being revised. The following amendments are recommended to be incorporated in pursuit of the common goal of promoting health for all. 1. Data-driven research that facilitates broad informed consent and dynamic consent, assuring participant’s rights, and the sharing of individual participant data (IPD) and research results to promote open science and generate social value. 2. Risk minimisation in a placebo-controlled study and post-trial access to the best-proven interventions for all who need them. 3. A future-oriented research framework for co-creation with all the relevant stakeholders.{\textless}/p{\textgreater}}, + language = {English}, + urldate = {2024-09-12}, + journal = {Frontiers in Medicine}, + author = {Kurihara, Chieko and Kerpel-Fronius, Sandor and Becker, Sander and Chan, Anthony and Nagaty, Yasmin and Naseem, Shehla and Schenk, Johanna and Matsuyama, Kotone and Baroutsou, Varvara}, + month = apr, + year = {2024}, + note = {Publisher: Frontiers}, + keywords = {Data-driven research, Declaration of Helsinki, Health for all, placebo, post-trial access, stakeholder involvement}, } \ No newline at end of file From aa0279cc86f35caa5eaa1305006a09ee562610ff Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Thu, 12 Sep 2024 16:23:22 -0400 Subject: [PATCH 13/19] adding beecher --- 05-Data_Ethics.Rmd | 4 ++++ book.bib | 35 +++++++++++++++++++++++++++++++++++ 2 files changed, 39 insertions(+) diff --git a/05-Data_Ethics.Rmd b/05-Data_Ethics.Rmd index 771014d..8452a48 100644 --- a/05-Data_Ethics.Rmd +++ b/05-Data_Ethics.Rmd @@ -231,6 +231,10 @@ It outlines that research subjects welfare is the priority, that they have a rig The United Nations adopted the concept of "free consent" (similar to informed consent) into international law [@wikipedia_informed_2023]. +### Beecher's Ethics and clinical research (1966) + +In 1966, Henry Beecher published an article called ["Ethics and clinical research"](https://www.scielosp.org/pdf/bwho/2001.v79n4/367-372/en), outlining serious ethical issues in biomedical research at the time. This encouraged the creation of additional guidelines [@beecher_ethics_1966; @stark_unintended_2016]. + ### The Belmont Report (1979) The [Belmont Report](https://www.hhs.gov/ohrp/sites/default/files/the-belmont-report-508c_FINAL.pdf) was written to describe guidelines for human subjects in biomedical and behavioral research. The report aims to provide a general framework for ethical consideration of research. It states that: diff --git a/book.bib b/book.bib index c131f5a..03ee0ab 100644 --- a/book.bib +++ b/book.bib @@ -1084,4 +1084,39 @@ @article{kurihara_declaration_2024 year = {2024}, note = {Publisher: Frontiers}, keywords = {Data-driven research, Declaration of Helsinki, Health for all, placebo, post-trial access, stakeholder involvement}, +} + + +@article{stark_unintended_2016, + title = {The unintended ethics of {Henry} {K} {Beecher}}, + volume = {387}, + issn = {0140-6736, 1474-547X}, + url = {https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(16)30743-7/fulltext}, + doi = {10.1016/S0140-6736(16)30743-7}, + language = {English}, + number = {10036}, + urldate = {2024-09-12}, + journal = {The Lancet}, + author = {Stark, Laura}, + month = jun, + year = {2016}, + pmid = {27312298}, + note = {Publisher: Elsevier}, + pages = {2374--2375}, +} + + +@article{beecher_ethics_1966, + title = {Ethics and {Clinical} {Research}}, + volume = {274}, + issn = {0028-4793}, + url = {https://www.nejm.org/doi/full/10.1056/NEJM196606162742405}, + doi = {10.1056/NEJM196606162742405}, + abstract = {HUMAN experimentation since World War II has created some difficult problems with the increasing employment of patients as experimental subjects when it must be apparent that they would not have been available if they had been truly aware of the uses that would be made of them. Evidence is at hand that many of the patients in the examples to follow never had the risk satisfactorily explained to them, and it seems obvious that further hundreds have not known that they were the subjects of an experiment although grave consequences have been suffered as a direct result of experiments described . . .}, + number = {24}, + urldate = {2024-09-12}, + journal = {New England Journal of Medicine}, + author = {Beecher, Henry K.}, + month = jun, + year = {1966} } \ No newline at end of file From b7935fbf5709d95edafaaae41295b793408b4e1b Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Thu, 12 Sep 2024 16:36:46 -0400 Subject: [PATCH 14/19] adding Jody to about and adding small editions to current concerns --- 06-Current_Data_Concerns.Rmd | 18 +++++++++++++----- About.Rmd | 2 +- 2 files changed, 14 insertions(+), 6 deletions(-) diff --git a/06-Current_Data_Concerns.Rmd b/06-Current_Data_Concerns.Rmd index 4a805b6..a895f02 100644 --- a/06-Current_Data_Concerns.Rmd +++ b/06-Current_Data_Concerns.Rmd @@ -15,13 +15,18 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN ## Current Ethical Issues -There are several issues that researchers and research participants, and really all individuals engaging in health care face. We will discuss some of these briefly. +There are several current issues that researchers and research participants, and really all individuals engaging in health care face. We are facing new and bigger ethical issues in data due to: + +- Increasing practice of reusing data (and in ways that are new), which means consent processes cannot fully describe risks and benefits +- Large scale data collection which sometimes doesn't involve consent, often this is done by companies for other reasons (for example, to create an AI tool that can distinguish images, a company may by coincidence also collect images of faces) +- New data sharing technologies provides increasing opportunities for security failures + +We will discuss some of these new issues briefly. ### Consent for Data Reuse One major current ethical issue that we face now, is the consequences of the reuse of shared data. As we have described, there are major benefits of sharing data. It can allow researchers to really maximize their efforts. However, there are also negative potential consequences as well. Furthermore, it is still unclear what is exactly possible with our data, for both good and bad uses, as technology continues to advance. - Previous management strategies for informed consent originate from research that predates the large scale data sharing that we now use today. In those cases, informed consent was a bit more straight forward to achieve. Now that data is often reused more often, it is often less obvious how data will be used for research purposes in the future, thus it is less obvious how to inform potential research participants what participation really means. Ideally we want to protect participants and family, while also maximizing the research potential of useful data. So how do we do this? Although ethical guidelines about this type of consent are evolving as research and technology evolve, here are some current methods for consent as described in @mckeown_consent_2021. @@ -71,13 +76,13 @@ A [review](https://journals.sagepub.com/doi/10.1177/2053951720982012) of recent 4. Legal Considerations - Some countries or nations have regulations about what can be done with data particularly when it comes to transmission to other nations. -The authors of the review [@hummel_data_2021] caution that the inconsistency with which data sovereignty is defined could lead to negative consequences if efforts are not made to define the term when it is used. For example, if someone's concept of data sovereignty is strictly of a technical definition, then important aspects related to transparency or self-determination may be overlooked. They note that Indigenous data sovereignty or data sovereignty as it pertains to indigenous populations, is more more clearly defined, and could be used as a model for other uses of the term in other contexts. +The authors of the review [@hummel_data_2021] caution that the inconsistency with which data sovereignty is defined could lead to negative consequences if efforts are not made to define the term when it is used. For example, if someone's concept of data sovereignty is strictly of a technical definition, then important aspects related to transparency or self-determination may be overlooked. They note that Indigenous data sovereignty or data sovereignty as it pertains to indigenous populations, is more more clearly defined, and could be used as a model for other uses of the term in other contexts. Also see the [CARE Principles for Indigenous Data Governance](https://www.gida-global.org/care), as previously described in the last chapter, for more information about methods to protect data sovereignty for indigenous populations. The meaning of the terminology is likely to evolve over time, however, this indicates the complexity of data handling ethics that we are currently encountering and will continue to encounter. ### Finding Artifacts -In some cases researchers will determine aspects about the potential health or genetic risk of an individual as an artifact of performing other research. This leads to the question of if those individuals should be informed about these findings. +In some cases researchers will have “incidental findings,” outside of the scope of the intended research. These incidental findings reveal aspects about the potential health or genetic risk of an individual as an artifact of performing other research. This leads to the question of whether those individuals should be informed about these findings. Depending on the nature of the research, the potential for finding incidental findings will vary. The [Secretary’s Advisory Committee on Human Research Protections (SACHRP)](https://www.hhs.gov/ohrp/sachrp-committee/recommendations/attachment-f-august-2-2017/index.html) at the US Department of Health and Human Services [@protections_ohrp_attachment_2017] offers guidelines about this topic. Furthermore, if the research requires FDA regulations, than there is more defined guidance and requirements about incidental findings. Yet, for other forms of research, the determination will occur mostly with the institutional IRB. @@ -121,7 +126,7 @@ To be more mindful of future consequences, researchers could also ask their rese ## Recent Incidences -With advances in technology allowing for cheaper and easier production of medical datasets more than ever before, we saw the creation of databanks and other shared data resources. This resulted in new ethical issues. +With advances in technology allowing for cheaper and easier production of medical datasets more than ever before, we have seen the creation of databanks and other shared data resources. This has resulted in new ethical issues. Here are some examples that exemplify more current data ethics issues: @@ -129,6 +134,7 @@ Here are some examples that exemplify more current data ethics issues: Commercial use of data is yet another possible use of research data. There is one example in which such a situation may have occurred, although there sources about the incident are conflicting [@kramer_surescripts_2019]. ReMy Health is a data analytics company that processes raw patient prescription and insurance data and provides this data to other companies. It was using data from [Surescripts](https://surescripts.com/), a prescription and health record data company and providing it in a processed form for Amazon's PillPack (https://www.pillpack.com/), a prescription delivery service. ReMy Health or one of its customers was accused of providing unauthorized access of prescription and patient health insurance information, which was believed to be for pharmaceutical companies for marketing decisions about what medications to market [@chiruvella_ethical_2021]. Surescripts then decided to revoke access for ReMy health to their data, thus hindering access to PillPack. However, Surescripts who made the allegations against ReMy Health has also had complaints of being threatening toward other companies, so it is a bit unclear exactly what happened [@kramer_surescripts_2019]. However, ultimately this resulted in a difficult situation for patients to receive their prescriptions and illustrates how data breaches or misuse by a single party when the data is utilized by multiple parties can get complicated [@kramer_surescripts_2019]. +Consent forms are now required to disclose the potential for commercialization of products, and institutions navigate relationships with commercial companies in different ways. ### Data Breaches @@ -145,6 +151,8 @@ Places that report data breaches - based on @@seh_breaches_2020: 5) Verizon-DBIR (yearly investigations by Verizon Enterprises, established in 2008) +The Department of Health and Human Services now requires reporting of data breaches to affected individuals. Breaches of over 500 people need to be notified publicly. See [here](https://www.hhs.gov/hipaa/for-professionals/breach-notification/index.html) for more information. + ### Data mistakes and neglect diff --git a/About.Rmd b/About.Rmd index d027b91..1465e9d 100644 --- a/About.Rmd +++ b/About.Rmd @@ -17,7 +17,7 @@ These credits are based on our [course contributors table guidelines](https://gi |**Pedagogy**|| |Lead Content Instructor|[Carrie Wright](https://carriewright11.github.io)| |Content Contributors|[Candace Savonen](https://www.cansavvy.com/) (sections of Data Sharing, Data Security, Data Ethics, and quizzes)| -|Content Editors/Reviewers|[Candace Savonen](https://www.cansavvy.com/), [Jeff Leek](https://jtleek.com/)| +|Content Editors/Reviewers|[Candace Savonen](https://www.cansavvy.com/), [Jeff Leek](https://jtleek.com/), [Jodyn Platt](https://sph.umich.edu/faculty-profiles/platt-jodyn.html)| |Content Directors| [Jeff Leek](https://jtleek.com/)| |Content Consultant (General)| [Elana Fertig](https://fertiglab.com/)| |Content Consultants (REDCap section)| Jennifer Durham| From b89ad238e2eca1829e32fad6dbff6019798333f1 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Thu, 12 Sep 2024 16:45:30 -0400 Subject: [PATCH 15/19] fix spelling --- 03-Data_Security.Rmd | 2 +- 04-Data_Sharing.Rmd | 2 +- 05-Data_Ethics.Rmd | 2 +- resources/dictionary.txt | 9 ++++++++- 4 files changed, 11 insertions(+), 4 deletions(-) diff --git a/03-Data_Security.Rmd b/03-Data_Security.Rmd index 0468382..4663f6f 100644 --- a/03-Data_Security.Rmd +++ b/03-Data_Security.Rmd @@ -138,7 +138,7 @@ There are a few major reasons why external drives, especially flash and USB driv There are several strategies you can take to avoid these issues, if you must use such a drive [@drives_CISA_2019, @durken_how_2021]: -- Never use a drive that you find randomly. Ieally only use drives you get from a reputable manufacturer. Even drives at conferences could pose a security risk. +- Never use a drive that you find randomly. Ideally only use drives you get from a reputable manufacturer. Even drives at conferences could pose a security risk. - Use drives that have encryption (either buy drives that already have it, which is recommended or add encryption to your drive). - Disable AutoRun software which allows drives to automatically be opened. See [here](https://www.lifewire.com/disable-autorun-on-a-pc-153344) for how to do this. Note that this is mostly an issue for Windows computers. - Remove the drive when not in use (it improves the life of the drive and therefore protects data on the drive from disappearing or getting damaged). diff --git a/04-Data_Sharing.Rmd b/04-Data_Sharing.Rmd index b1889bc..469a5bb 100644 --- a/04-Data_Sharing.Rmd +++ b/04-Data_Sharing.Rmd @@ -46,7 +46,7 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g117c57cc481_1_37") ``` -4. It also provides more opportunties for others to replicate your results, which could help advance not only your career, but our understanding of science and medicine. +4. It also provides more opportunities for others to replicate your results, which could help advance not only your career, but our understanding of science and medicine. ## Benefits of data sharing diff --git a/05-Data_Ethics.Rmd b/05-Data_Ethics.Rmd index 8452a48..bdd9220 100644 --- a/05-Data_Ethics.Rmd +++ b/05-Data_Ethics.Rmd @@ -247,7 +247,7 @@ There are 3 ethical principals defined: 1) Respect for Persons -People should be allowed autonomy to use their judgement to make decisions for themselves. Those that cannot make all decisions for themselves, such as children or those who are incapacitated should be protected. +People should be allowed autonomy to use their judgment to make decisions for themselves. Those that cannot make all decisions for themselves, such as children or those who are incapacitated should be protected. 2) Beneficence diff --git a/resources/dictionary.txt b/resources/dictionary.txt index 9fcb74c..524c200 100644 --- a/resources/dictionary.txt +++ b/resources/dictionary.txt @@ -75,6 +75,7 @@ fastq FedRAMP Fertig FigShare +Findable FireCloud FISMA funders @@ -106,6 +107,7 @@ IBM Imbeaud impactful inclusivity +Interoperable iPlant IRB itcrtraining @@ -148,6 +150,7 @@ omics OpenBEL ORI overcommitted +patients’ PII PillPack PKCS @@ -198,14 +201,18 @@ Unsplash useable Vick voiceprints +voluntariness VPN Wheelan +WMA workspaces WUFlux www XNAT XQuartz Xsede -ZENODO +Zenodo 20th 23andMe + + From dc34f29f8a3a9efdb480f187eab7b3e1bea313ea Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Thu, 12 Sep 2024 16:55:56 -0400 Subject: [PATCH 16/19] spacing --- 03-Data_Security.Rmd | 2 +- 05-Data_Ethics.Rmd | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/03-Data_Security.Rmd b/03-Data_Security.Rmd index 4663f6f..08b88ae 100644 --- a/03-Data_Security.Rmd +++ b/03-Data_Security.Rmd @@ -123,7 +123,7 @@ This general process is also used for accessing data on servers or websites. For Although this makes your work faster, caching poses some security risk. Caching provides additional locations where hackers could access your sensitive data. Furthermore, the data in caches are often not encrypted, making such data more vulnerable. One way to avoid the security risk associated with your cached data, is clearing your caches [@caching_security]. -In the case of a CPU or software cache, security concerns are heightened if your laptop gets stolen, lost, or if you decide to sell your laptop. The easiest way to clear sa CPU or software cache is to simply shutdown your computer. If you have taken our computing course, you will learn that data stored in short-term memory (like RAM) requires electricity, and it will disappear when your computer is no longer connected to power [@caching_security]. Note that it can take a few minutes for such memory to disappear. +In the case of a CPU or software cache, security concerns are heightened if your laptop gets stolen, lost, or if you decide to sell your laptop. The easiest way to clear a CPU or software cache is to simply shutdown your computer. If you have taken our computing course, you will learn that data stored in short-term memory (like RAM) requires electricity, and it will disappear when your computer is no longer connected to power [@caching_security]. Note that it can take a few minutes for such memory to disappear. See [here](https://www.upguard.com/blog/cache) for instructions on how to clear browser caches. It's a good idea to clear your browser cache relatively often (ideally before anytime you take your computer out where it could be stolen or lost), and possibly more often if you access sensitive data on your computer regularly. diff --git a/05-Data_Ethics.Rmd b/05-Data_Ethics.Rmd index bdd9220..2a9d8b3 100644 --- a/05-Data_Ethics.Rmd +++ b/05-Data_Ethics.Rmd @@ -96,7 +96,7 @@ See [here](https://jhudatascience.org/Informatics_Research_Leadership/promoting- ## Principles of Bioethics -Several general concepts for Healthcare ethics, and by extension medical research ethics have been described in several commonly used ways, including the four pillars and the seven guiding principles. In the wake of medical and scientific abuses during WWII and beyond, several ethical prinicples and codes emerged. The Belmont Report (1979) defines the core bioethical pillars that drive ethical analysis in healthcare and research even today. +Several general concepts for Healthcare ethics, and by extension medical research ethics have been described in several commonly used ways, including the four pillars and the seven guiding principles. In the wake of medical and scientific abuses during WWII and beyond, several ethical principles and codes emerged. The Belmont Report (1979) defines the core bioethical pillars that drive ethical analysis in healthcare and research even today. ### The [four pillars](https://educationprojects.co.uk/medical-ethics-ethical-dilemmas-in-healthcare/) (this discussion is from @melvin_medical_2020): @@ -223,7 +223,7 @@ See [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2280818/pdf/canfamphys00 ### Declaration of Helsinki (1964) -The [Declaration of Helsinki](https://www.wma.net/what-we-do/medical-ethics/declaration-of-helsinki/) was published by the World Medical Association (WMA) and is considered "the world’s most widely recognised ethical principle for medical research involving humans" [@kurihara_declaration_2024]. It describes a set of principals for "medical research involving human subjects, including research on identifiable human material and data." It has been amended several times and the WMA aims to keep it up to date. +The [Declaration of Helsinki](https://www.wma.net/what-we-do/medical-ethics/declaration-of-helsinki/) was published by the World Medical Association (WMA) and is considered "the world’s most widely recognized ethical principle for medical research involving humans" [@kurihara_declaration_2024]. It describes a set of principals for "medical research involving human subjects, including research on identifiable human material and data." It has been amended several times and the WMA aims to keep it up to date. It outlines that research subjects welfare is the priority, that they have a right to self determination and the right to informed consent. Risks and benefits should be carefully considered and research should be discontinued if risks are determined to be to high [@wma_1964]. From 8667431a98dc115b3654dc0faf34ca943e3bde26 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Fri, 13 Sep 2024 10:57:47 -0400 Subject: [PATCH 17/19] trying to complete final line to help url checker --- 05-Data_Ethics.Rmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/05-Data_Ethics.Rmd b/05-Data_Ethics.Rmd index 2a9d8b3..b31be53 100644 --- a/05-Data_Ethics.Rmd +++ b/05-Data_Ethics.Rmd @@ -394,4 +394,4 @@ In summary, we have covered the following concepts in this chapter: - Key principles of data ethics include: beneficence, non-maleficence, justice, autonomy and informed consent. - Medical ethics has evolved over time with milestones like the Hippocratic Oath, AMA code of ethics, Nuremberg Code and establishment of laws like HIPAA, GINA and HITECH. - Research misconduct can happen due to personal, organizational and job related factors. It can negatively impact researchers, institutes, research fields, patients and public trust. -- Misconduct prevention involves prioritizing research quality over quantity and reducing pressures that promote misconduct. Providing mentoring, supervision and support can also help prevent misconduct. \ No newline at end of file +- Misconduct prevention involves prioritizing research quality over quantity and reducing pressures that promote misconduct. Providing mentoring, supervision and support can also help prevent misconduct. From 9aef451960425cdb1ba1c9248071dfafe90df8c2 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Fri, 13 Sep 2024 13:21:23 -0400 Subject: [PATCH 18/19] imrpoving the text a tad --- 03-Data_Security.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/03-Data_Security.Rmd b/03-Data_Security.Rmd index 08b88ae..bb57120 100644 --- a/03-Data_Security.Rmd +++ b/03-Data_Security.Rmd @@ -226,11 +226,11 @@ The process involves encoding or scrambling the data in a nonrandom format (we c There are different methods for encrypting data. One common method is called asymmetric, which involves **two** keys, a **public key** and a **private key** [@IBM_encryption]. This method is also sometimes simply called **public key**. -Users can get access to the public key to allow them to encrypt the data, while the private key remains private and is used to decrypt the data. This method is also called public-key encryption [@IBM_encryption]. +Users can get access to the public key to allow them to encrypt the data, while the private key remains private and is used to decrypt the data. [@IBM_encryption]. Symmetric cryptography on the other hand uses **one** key for encryption and decryption. In systems that use this type of encryption, pairs of users will often be given their own key. The advantage of this system is that decryption is a bit faster, the keys are smaller, and it is generally less expensive to implement. If someone gains access to the key, however they can decrypt data or messages, and encrypt data or messages and appear as if they are the person that owns that key. So often the keys themselves are encrypted [@IBM_encryption]. -Since symmetric decryption is faster, it is often used for transferring data or for large datasets.Common symmetric algorithms are [AES-128, AES-192, and AES-256](https://www.trentonsystems.com/blog/symmetric-vs-asymmetric-encryption) [@cyware_social_encryption]. +Since symmetric decryption is faster, it is often used for transferring data or for large datasets. Common symmetric algorithms are [AES-128, AES-192, and AES-256](https://www.trentonsystems.com/blog/symmetric-vs-asymmetric-encryption) [@cyware_social_encryption]. Asymmetric encryption is regarded to be more secure, common algorithms included [RSA](https://simple.wikipedia.org/wiki/RSA_algorithm) and [DSA](https://en.wikipedia.org/wiki/Digital_Signature_Algorithm), and several [PKCS](https://en.wikipedia.org/wiki/PKCS) standards [@cyware_social_encryption]. From 4441e43f7cb05b7764992b804b5bfe41cdd10cf9 Mon Sep 17 00:00:00 2001 From: carriewright11 Date: Fri, 13 Sep 2024 16:24:57 -0400 Subject: [PATCH 19/19] adding some images --- 05-Data_Ethics.Rmd | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/05-Data_Ethics.Rmd b/05-Data_Ethics.Rmd index b31be53..744391a 100644 --- a/05-Data_Ethics.Rmd +++ b/05-Data_Ethics.Rmd @@ -17,21 +17,31 @@ ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN Data ethics involves the consideration of: - data collection -- data maintenance - data security - data privacy +- data maintenance - data sharing It also involves mindfulness about how our research can ultimately impact (or not impact as the case may be for research that lacks inclusivity and equity) research participants and other individuals. -Importantly, we do not yet have established societal norms or protocols for every aspect of medical research, particularly with respect to new types of data and new technologies, and many topics are still under debate especially when it comes to cutting edge research. However, general principles of ethics can be helpful and involve practices for research integrity, consideration for social justice, data security, and transparency. Health care and research ethics can also be helpful in evaluating practices for data management and use. +Importantly, we do not yet have established societal norms or protocols for every aspect of medical research, particularly with respect to new types of data and new technologies, and many topics are still under debate especially when it comes to cutting edge research. + +However, general principles of ethics can be helpful and involve practices for research integrity, consideration for social justice, and transparency. Health care and research ethics can also be helpful in evaluating practices for data management and use. ### Before and after research Data ethics requires thoughtfulness *both* throughout the planning and research process to produce research that benefits society and does as little harm as possible, as well as mindfulness for what happens after the research is complete and published. -Researchers need to consider both how their work will resolve unanswered questions and who the research might help, as well as consider how others might use or misuse their data, code, and results in the future [@lipworth_ethics_2017; @teoli_informatics_2021]. +Researchers need to consider how their work will resolve unanswered questions and who the research might help, as well as consider how others might use or misuse their data, code, and results in the future [@lipworth_ethics_2017; @teoli_informatics_2021]. + + +```{r, fig.align='center', echo = FALSE, fig.alt= "Research requires ethical considerations before and after research. Beforehand one should consider how to reduce risks but enhance benefits for research participants and society, if the planned samples will be inclusive, if the data collection on those samples will be inclusive, and will the data be managed in a safe and private way. Afterwards, one should consider if people will be able to re-identify research participants, if the might be able to in the future and what that might mean for those patients, and if and how others might misuse the data, code or findings. ", out.width="100%"} +ottrpal::include_slide("https://docs.google.com/presentation/d/1SRokLaGAc2hiwJSN26FHE0ZEEhPr3KQdyMICic8kAcs/edit#slide=id.g3001a067199_0_0") +``` + + + ### Considerations before @@ -48,7 +58,7 @@ While data sharing can result in wonderful opportunities for secondary analysis, Overall there is a continuum of risk across the various types of data that we as researchers collect. Wile some forms of data, such as that derived from model organisms pose essentially no risk, intermediate forms of data such as summarized counts across a set of human samples pose more risk, while raw data and in particular data from individuals such as whole genome sequencing data, pose great risk for identification [@byrd_responsible_2020]. -Why does it mater that research subjects might be identifiable to others? +### Why does it mater that research subjects might be identifiable to others? In some cases open awareness about patients with certain types of cancers or diseases can be useful to allow other researchers and patients to find these individuals to encourage additional research and patient support group participation (especially for rare diseases or conditions). @@ -57,7 +67,7 @@ However, such information can put these individuals at risk for difficulty with Beyond the risk that data breaches pose to research participants, such breaches also cause harm to the research institutes where the breach occurred. Reputations and funding opportunities can be greatly compromised. Transparency and/or informed consent are discussed below as ways to mitigate these risks. -Why else does data protection matter at the individual level? +### Why else does data protection matter at the individual level? If data gets manipulated or corrupted, this can result in false research findings, altered treatment plans by physicians, and more @seh_breaches_2020.