-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathAgape-OpenScienceIntroductoryCourse.tex
1580 lines (1137 loc) · 148 KB
/
Agape-OpenScienceIntroductoryCourse.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
% Options for packages loaded elsewhere
\PassOptionsToPackage{unicode}{hyperref}
\PassOptionsToPackage{hyphens}{url}
%
\documentclass[
]{book}
\usepackage{amsmath,amssymb}
\usepackage{iftex}
\ifPDFTeX
\usepackage[T1]{fontenc}
\usepackage[utf8]{inputenc}
\usepackage{textcomp} % provide euro and other symbols
\else % if luatex or xetex
\usepackage{unicode-math} % this also loads fontspec
\defaultfontfeatures{Scale=MatchLowercase}
\defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
\fi
\usepackage{lmodern}
\ifPDFTeX\else
% xetex/luatex font selection
\fi
% Use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\IfFileExists{microtype.sty}{% use microtype if available
\usepackage[]{microtype}
\UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
}{}
\makeatletter
\@ifundefined{KOMAClassName}{% if non-KOMA class
\IfFileExists{parskip.sty}{%
\usepackage{parskip}
}{% else
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}}
}{% if KOMA class
\KOMAoptions{parskip=half}}
\makeatother
\usepackage{xcolor}
\usepackage{longtable,booktabs,array}
\usepackage{calc} % for calculating minipage widths
% Correct order of tables after \paragraph or \subparagraph
\usepackage{etoolbox}
\makeatletter
\patchcmd\longtable{\par}{\if@noskipsec\mbox{}\fi\par}{}{}
\makeatother
% Allow footnotes in longtable head/foot
\IfFileExists{footnotehyper.sty}{\usepackage{footnotehyper}}{\usepackage{footnote}}
\makesavenoteenv{longtable}
\usepackage{graphicx}
\makeatletter
\def\maxwidth{\ifdim\Gin@nat@width>\linewidth\linewidth\else\Gin@nat@width\fi}
\def\maxheight{\ifdim\Gin@nat@height>\textheight\textheight\else\Gin@nat@height\fi}
\makeatother
% Scale images if necessary, so that they will not overflow the page
% margins by default, and it is still possible to overwrite the defaults
% using explicit options in \includegraphics[width, height, ...]{}
\setkeys{Gin}{width=\maxwidth,height=\maxheight,keepaspectratio}
% Set default figure placement to htbp
\makeatletter
\def\fps@figure{htbp}
\makeatother
\setlength{\emergencystretch}{3em} % prevent overfull lines
\providecommand{\tightlist}{%
\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
\setcounter{secnumdepth}{5}
\usepackage{booktabs}
\ifLuaTeX
\usepackage{selnolig} % disable illegal ligatures
\fi
\usepackage[]{natbib}
\bibliographystyle{apalike}
\usepackage{bookmark}
\IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
\urlstyle{same}
\hypersetup{
pdftitle={AGAPE: An introductory course to open science for early career researchers},
pdfauthor={An Agape initiative},
hidelinks,
pdfcreator={LaTeX via pandoc}}
\title{AGAPE: An introductory course to open science for early career researchers}
\author{An Agape initiative}
\date{}
\begin{document}
\maketitle
{
\setcounter{tocdepth}{1}
\tableofcontents
}
\includegraphics[width=0.6\textwidth,height=\textheight]{images/agapecover.png}
\chapter*{Introduction}\label{introduction}
\addcontentsline{toc}{chapter}{Introduction}
Greetings, fellow early career researchers and open science-curious friends!
In this course, we will introduce you to the world of open science. Perhaps you are familiar with some of the concepts and ideas of open science or maybe the open science movement is completely new to you. Whatever your current understanding is, we believe that what you learn here will be interesting, thought-provoking and useful in your future research career.
We are a group of budding researchers and PhD students who first met during a course focusing on open and collaborative research designed under the project Opening Doors funded by the Horizon 2020 EU programme for research and innovation. We felt that what we learned was both fascinating and helpful and believe that other students should have an opportunity to get familiar with these concepts too. Hence, we decided to create Agape. Agape means wide open, as is the open science philosophy and practice we want to promote. The word \emph{agapē} originates from Greek and means love that is unconditional, such as our love for science (Okay, maybe not entirely unconditional, but you get the drift!). Under Agape we aim to share open science with fellow students and researchers, starting with this course and continuing with a series of workshops where we can learn, exchange our opinions and experiences, and together contribute to a better, more open future.
With this course, Agape would like to open doors for you into the world of open science and introduce various concepts that we think are crucial to high quality research and prior to the opening doors project we were entirely unaware of. Sure, we all heard about scientific integrity and open access publishing at some point in our studies, but the domain of open science encompasses a much larger set of ideas and concepts. Given the sheer extent of open science, this course does not, and could not cover the whole scope of open science. However, we will provide you with a scaffolding to understand the core concepts and signpost you to useful links and resources should you wish to delve deeper and start practising open science in your own work.
And now, without any further delay, let's quench that thirst for knowledge!
\begin{center}\rule{0.5\linewidth}{0.5pt}\end{center}
\section*{Course structure and certificate of completion}\label{course-structure-and-certificate-of-completion}
\addcontentsline{toc}{section}{Course structure and certificate of completion}
The course is structured into chapters that are written to expand on various topics. We think that the order they follow is logical and the latter chapters build on knowledge acquired in previous chapters. That said, you can decide to go through them in whatever order you like by clicking on different chapters in the menu on the left or to return to some of them should you find something is not clear or you require a brief refresher.
At the end of each chapter you will find activities to enhance your understanding of the concepts introduced in a particular chapter and to improve your practical knowledge. All chapters except one contain a short quiz consisting of five questions where you can test your freshly acquired knowledge. You will see your score immediately and you can save your high scores if you wish. These scores will be saved in cookies in your browser and you can delete them by deleting cookies. The quizzes have no time limit and you have as many attempts to practice as you want.
Once you read all chapters and practise each short quiz you can attempt the final quiz. If you score 90\% or higher a certificate of completion will be generated for you and you can download it directly. You have unlimited attempts to successfully complete the final quiz but only three minutes for each attempt.
We know we are by no means perfect. We would love you to share your opinions, concerns, or feedback about a specific chapter or the course as a whole. There are two ways to do this. You can either fill in the survey or use the Disqus widget that you can find at the end of each chapter to access the forum. In a truly open spirit we will discuss, collaborate and offer constructive criticism and helpful advice and ask for the same from you. We will do our best to address your comments or pass them on to the course admins. We value all suggestions and feedback, and together we will make this course awesome. We just ask for a little patience.
\begin{center}\rule{0.5\linewidth}{0.5pt}\end{center}
\section*{Course evaluation}\label{course-evaluation}
\addcontentsline{toc}{section}{Course evaluation}
We have surveys at the beginning and end of the course to offer the best course experience possible.
\textbf{Pre-course survey}
You are invited to participate in a survey that will help the Agape team learn more about learners. We'd like to ask you to answer a few questions about yourself. Your honest feedback is very important for us to improve and serve all learners better. Whether you simply browsed or completed the course, your feedback is valuable.
Thank you for taking this survey. Please click the link below to take the survey.
\href{https://forms.gle/oBLgoyT6G7FvN9uR8}{Pre-course survey link}
\textbf{Bug Report/Feature request}
If you come across any glitches or bugs, or would like to see a different topic or feature in our MOOC, please open an issue in \href{https://github.com/agape-openscience/AgapeOpenscienceMOOC/issues}{github} or \href{https://forms.gle/oBLgoyT6G7FvN9uR8}{Bug Report/Feature request form}
\section*{Contacts}\label{contacts}
\addcontentsline{toc}{section}{Contacts}
Should you experience any technical problems or should you wish to share your ideas on how to improve this course email us on {\href{mailto:agape.open.science@gmail.com}{\nolinkurl{agape.open.science@gmail.com}}}.
To share your thoughts and experiences either with this course or on open science in general, or to see what's new we will be delighted if you start following us on
Facebook {\href{https://facebook.com/AgapeOpen-Science}{Agape Open-Science}},
Twitter {\href{https://twitter.com/AgapeOpenSci}{@AgapeOpenSci}},
Instagram {\href{https://www.instagram.com/Agape.Open.Science}{Agape.Open.Science}},
or on LinkedIn {\href{https://www.linkedin.com/company/agape-open-science/}{Agape Open Science}}.
\begin{center}\rule{0.5\linewidth}{0.5pt}\end{center}
\section*{Credits}\label{credits}
\addcontentsline{toc}{section}{Credits}
Aswathi Surendran : content creator, IT whisperer\\
\url{https://orcid.org/0000-0002-8709-6417}
Cassandra Murphy : content creator, social media wizard\\
\url{https://orcid.org/0000-0003-1332-359X}
Ciarán Purcell : proofreading and editing rockstar\\
\url{https://orcid.org/0000-0002-4376-599X}
Marco Prevedello : content creator, IT advisor ~\\
\url{https://orcid.org/0000-0002-8329-6294}
Mohammed Mahmoud : content creator\\
\url{https://orcid.org/0000-0002-1224-0381}
Nina Trubanová : content creator, deadline overlord, vision pusher\\
\url{https://orcid.org/0000-0001-8156-3304}
Philipp Junk : content creator\\
\url{https://orcid.org/0000-0002-5228-3896}
Rasaq Semiu Abolore : content creator\\
\url{https://orcid.org/0000-0001-6486-4754}
Tendai Mukande : content creator\\
\url{https://orcid.org/0000-0002-0654-7141}
Una Ruddock : proofreading and editing balladeer\\
\url{https://orcid.org/0000-0001-9118-4121}
Wei Qi Koh : Info graphics wizard\\
\url{https://orcid.org/0000-0001-8196-1628}
Yao Zhang : content creator, social media wizard\\
\url{https://orcid.org/0000-0003-0093-3882}
And a big thanks to our muse Dr Denise McGrath.
\begin{center}\rule{0.5\linewidth}{0.5pt}\end{center}
\subsubsection*{How to cite this material}\label{how-to-cite-this-material}
\addcontentsline{toc}{subsubsection}{How to cite this material}
Agape Initiative. (2023). Agape Open Science MOOC. OSF. \url{https://doi.org/10.17605/OSF.IO/DTB7V}
\begin{center}\rule{0.5\linewidth}{0.5pt}\end{center}
\section*{Disclaimer}\label{disclaimer}
\addcontentsline{toc}{section}{Disclaimer}
Any views or opinions represented in this course belong solely to the Agape team and do not represent those people, institutions, or organizations that the authors may or may not be associated with in a professional or personal capacity unless explicitly stated.
The information in this course is provided without warranty. The authors and Agape team have neither liability nor responsibility to any person or entity related to any loss or damages arising from the information contained in this course.
\chapter{Open science}\label{open-science}
Let's start at the beginning.
\textbf{Open science} is a movement to make scientific research, data and their dissemination available to any member of an inquiring society, from professionals to citizens.
Open science comprises several themes from conception to dissemination of knowledge. Based on principles of scientific growth and public access, open science includes practices such as open publishing and campaigning for open access, with the ultimate aim of making it easier to publish and share scientific knowledge.
\section{What is open science?}\label{what-is-open-science}
Open science refers to a vision to improve scientific practices for reproducibility, transparency, sharing and collaboration of knowledge. Multiple pathways to achieving this vision have developed since the concept emerged in 1985 (\href{https://journals.sagepub.com/doi/10.1177/016224398501000211}{Chubin, 1985}).
As part of the global open science community, we expand the term ``science'' beyond its common use, such as life sciences or engineering, to include the arts, humanities and any other scholarly activities. Open science, open research and open scholarship are often used interchangeably. For consistency and ease of understanding, we use open science in this course.
\subsection{Open science = open research = open scholarship}\label{open-science-open-research-open-scholarship}
Generally, the open science movement identifies increased openness of scientific content, tools and processes as the key means of action. However, a single definition cannot encompass the diversity of the open science movement. Therefore, we give some alternative definitions of the phrase \textbf{open science} throughout this chapter.
The first definition we review is from the European \href{https://www.fosteropenscience.eu/}{FOSTER project}. According to the FOSTER team, ``open science is the practice of science in such a way that others can collaborate and contribute, where research data, lab notes and other research processes are freely available, under terms that enable reuse, redistribution and reproduction of the research and its underlying data and methods.'' This definition highlights that open science is the act of improving access and contribution to all aspects of scientific practice: from research design, methodologies and tools, generated data, reporting and evaluation.
In an attempt to capture the complexity of open science, Fecher \& Friesike (\href{https://www.researchgate.net/publication/236607487_Open_Science_One_Term_Five_Schools_of_Thought}{2014}) define it as ``an umbrella term encompassing a multitude of assumptions about the future of knowledge creation and dissemination''. The authors summarise the movement complexity by identifying five schools of thought. Namely, the democratic school (concerned with knowledge access), the public school (concerned with accessibility to knowledge creation), the measurement school (concerned with alternative impact measurement), the infrastructure school (concerned with the technological architecture of science) and the pragmatic school (concerned with collaborative research). This (somewhat arbitrary) separation highlights the various paths in which science can be ``opened''.
Based on a review of published definitions and information Vicente-Saez and Martinez-Fuentes (\href{https://isiarticles.com/bundles/Article/pre/pdf/143111.pdf}{2018}) concluded that ``open science is transparent and accessible knowledge that is shared and developed through collaborative networks''. Here, knowledge includes code, data, ideas, information, scientific outputs, scientific publications and scientific results. Paic (\href{https://goingdigital.oecd.org/data/notes/No13_ToolkitNote_OpenScience.pdf}{2021}) identified emerging trends in open science such as alternative reputation systems, open notebooks, open lab books, science blogs, collaborative bibliographies, citizen science and open peer-review.
Another definition we would like to present is the United Nations Educational, Scientific and Cultural Organisation (\href{https://www.unesco.org/en}{UNESCO}) definition. Drafted at the 40th UNESCO General Conference in 2019 and officially published in 2021, the \href{https://en.unesco.org/science-sustainable-future/open-science/recommendation}{Recommendation on Open Science} contains the following statement: ``For the purpose of this Recommendation, open science is defined as an inclusive construct that combines various movements and practices, aiming to make multilingual scientific knowledge openly available, accessible and reusable for everyone. It also aims to increase scientific collaboration and sharing of information for the benefit of science and society and to open the processes of scientific knowledge creation, evaluation and communication to societal actors beyond the traditional scientific community. It comprises all scientific disciplines and aspects of scholarly practices, including basic and applied sciences, natural and social sciences and the humanities and it builds on the following key pillars: open scientific knowledge, open science infrastructures, science communication, open engagement of societal actors and open dialogue with other knowledge systems.''
Here, the UNESCO members clarify the meaning of the term science to include the humanities and the liberal arts. Furthermore, this definition highlights how the open science movement broadens its areas of influence beyond general research practices to the researcher community and society at large. The movement thus aims to better the inclusion of diverse ethnicities, cultures, languages, backgrounds and availability of resources across the scientific community.
Lastly, \href{https://the-turing-way.netlify.app/welcome}{The Turing Way Community} (2021) illustrates open research and its subcomponents as fitting under the umbrella of the broader concept of open scholarship in their handbook on reproducible, ethical and collaborative data science. These subcomponents are open data, open-source software, open-source hardware, open access, open notebooks, open educational resources, citizen science, equity, diversity and inclusion.
Many other definitions of open science can be found on-line or within written records. However, in consolidating the mentioned sources together, we conclude that six unifying principles and core values characterise the open science movement. These are:
\begin{itemize}
\item
Transparency, scrutiny, critique and reproducibility
\item
Equality of opportunities
\item
Responsibility, respect and accountability
\item
Collaboration, participation and inclusion
\item
Flexibility
\item
Sustainability
\end{itemize}
\section{The history of open science}\label{the-history-of-open-science}
Previously, we mentioned the vision and principles the open science movement shares. To better understand the implications opening science might have, it is useful to appreciate how the contemporary values and principles of both science and open science came to be. As Watson (\href{https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0669-2}{2015}) suggests, science is easily perceived as already ``open'', as already belonging to everyone. However, contemporary science is much more recent than one may think and its organisation has greatly transformed over the last 500 years. Even the current dominant practices of science are not immutable and they are likely to continue to change again in the future, evolving within the broader context of society.
\subsection{17th to 20th century: The emergence of contemporary science}\label{th-to-20th-century-the-emergence-of-contemporary-science}
\begin{quote}
``For in the sciences the authority of thousands of opinions is not worth as much as one tiny spark of reason in an individual man.''
-- Galileo Galilei, ca. 1597
\end{quote}
Between the 17th and 20th century, science underwent multiple reformations. If you were to meet scientists from before the 17th century, their perception of science would probably shock you. For centuries, they had more or less accepted the authority of the Church and of monarchs and their claims and theories didn't need to be backed up with either observable proof or the proof of reason. During the reformation, profound advancements in science took place and the scientific community gradually adopted scientific publications, formal review processes, scholarly associations, public grants and many more of the common modern practices.
The advent of printing and publishing companies in the early 17th century made public libraries more and more common. Knowledge started to accumulate in printed reports and encyclopaedias and, for the first time, it became accessible to the general public. At the time, this meant mostly privileged male and white citizens with greater socioeconomic resources and greater perceived standing in the social hierarchy.
During the same period, scholars started to move from unstable aristocratic patronages to assembled academies of science, slowly adopting academic publishing. Merton (\href{https://www.cambridge.org/core/journals/european-journal-of-sociology-archives-europeennes-de-sociologie/article/abs/resistance-to-the-systematic-study-of-multiple-discoveries-in-science/8FEC108B3D8B0DAD60416B36BE342959}{1963}) tells us that between the 1650s and the 1850s, the number of simultaneous discoveries ending in disputes dropped from 92\% to 33\%. Cryptic monographs and academic duels slowly became a thing of the past and science gradually became more open.
\begin{quote}
``The assumption that peer review is as old as journal publishing {[}\ldots{]} is based on a misunderstanding of Philosophical Transactions' editorial practice. {[}\ldots{]} Indeed, for most of the history of scientific journals, it has been editors -- not referees -- who have been the key decision-makers and gatekeepers.''
-- Aileen Fyfe, 2015
\end{quote}
After the establishment of scientific publishing and academic societies, the development of a formal review process developed between the 19th century and first half of the 20th century. We might take today's practice of \textbf{peer-review}, in which one or more people with similar competencies (peers) as the author review a manuscript before publication on a voluntary basis, for granted. However, it took until the 1970s before peer-review became widespread. (\href{https://web.archive.org/web/20220316123936/https:/www.timeshighereducation.com/features/peer-review-not-old-you-might-think}{Fyfe, 2015}).
The first recognized formal review practice was implemented by the British Royal Society in 1832. A special committee within the society was responsible for accepting or rejecting submissions for publication based on independently written evaluations. George Gabriel Stokes, secretary of the Royal Society from 1854--1885, further refined this practice by sharing the referees' suggestions with the authors and facilitating the discussion between authors and referees. Similar review processes started to become common practice in other academic societies across the world during the 19th century.
On the other hand, private publishers would only introduce formal review practices in the 20th century and the journal editor(s) were the sole judge of acceptance or rejection of a submission. These trivial selection practices allowed for a fast research-to-publication cycle, making private journals appealing for swift communication between scientists. At this stage, societies' journals gradually hosted fewer and more refined publications, while private journals were the preferred communication channel between scientists.
\subsection{Modern science and the fight for openness}\label{modern-science-and-the-fight-for-openness}
As submissions grew in volume, private journals started to implement review practices in the second half of the 20th century, gradually bringing us to the current state of research practices. However, with private publishers flourishing, new economic and structural barriers to scientific knowledge arose. For example, nowadays a yearly subscription to a scientific journal can cost between 3,000 and 7,000 USD (\href{https://www.libraryjournal.com/story/Are-We-There-Yet-Periodicals-Price-Survey-2022}{Bosch} \href{https://www.libraryjournal.com/story/Are-We-There-Yet-Periodicals-Price-Survey-2022}{\emph{et al}., 2022}) compared with much more affordable mainstream media subscriptions. For example, the New York Times offers a yearly subscription for 20 USD.
Much of today's scientific publication is in the hands of five large publishing companies: American Chemical Society (ACS), Elsevier, Springer-Nature, Taylor \& Francis and Wiley-Blackwell, who own 50 to 70\% of the scientific writing market (\href{https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253226}{Puehringer et al., 2021}). A sentiment of anger is often apparent in scientific communities towards the private publishing sector. For example, Buranyi (\href{https://www.theguardian.com/science/2017/jun/27/profitable-business-scientific-publishing-bad-for-science}{2017}) explains that ``Scientists create work under their own direction -- funded largely by governments -- and give it to publishers for free. The publisher pays scientific editors who judge whether the work is worth publishing and check its grammar, but the bulk of the editorial burden is done by working scientists on a volunteer basis. The publishers then sell the product back to government-funded institutional and university libraries, to be read by scientists -- who, in a collective sense, created the product in the first place.''
The roots of open science stem from philosophical discussions ongoing in the 1970s and 1980s about what it means to have freely available scientific knowledge and not in the fight for more just publication practices (\href{https://journals.sagepub.com/doi/10.1177/016224398501000211}{Chubin, 1985}). However, we believe that the struggle for open access publication spurred the rise of the open science movement, which then drew together all of the elements previously mentioned.
A second steppingstone in the history of the open science movement was the advent of the Internet. From its infancy, the Internet shifted how information is shared and facilitated new methods of scientific exchanges. Literature research moved from public libraries to online repositories such as the Directory of Open Access Journals (DOAJ) which was launched in 2003 with 300 open access journals. Scientists also began to share early versions of their work or pre-prints through servers such as \href{https://www.biorxiv.org/}{Biorxiv}. Furthermore, online collaboration facilitated easily accessible information resources such as \href{https://en.wikipedia.org/}{Wikipedia} which became a household staple. A new reformation of science had begun.
\begin{quote}
``The question is no longer whether open science is happening, but how everyone can contribute to it and benefit from this transition.''.
-- Audrey Azoulay, Director-General of UNESCO, 2021
\end{quote}
\begin{center}\includegraphics[width=0.7\linewidth]{images/slide01} \end{center}
\section{Test your understanding}\label{test-your-understanding}
Loading\ldots{}
\textbf{Activities}
In a recommend activities section like this one, we will recommend the activities to increase your understanding of the concepts and improve your practical knowledge.
\begin{itemize}
\item
When did you first hear about open science?
\item
Propose one change you could bring to your workflow to implement one of the core values of open science presented.
\item
Try to imagine how open science will change during your lifetime. And what about the 22nd century? What do you think the future of science will look like? Will it finally become fully open?
\item
Share anything interesting that you learned or found in this chapter with others on our social media.
\end{itemize}
\chapter{Open data and open access}\label{open-data-and-open-access}
Now, let's have a look at open data and open access. What exactly is it?
\section{Definitions and history}\label{definitions-and-history}
``Open data refers to data access and sharing arrangements, where data can be accessed and shared and reused by anyone without technical or legal restrictions, free of charge (to the greatest extent possible) and used by anyone for any purpose subject, at most, to requirements that preserve integrity, provenance, attribution, and openness'' (\href{https://www.oecd-ilibrary.org/science-and-technology/making-open-science-a-reality_5jrs2f963zs1-en}{OECD, 2015}). Access to data can occur along a spectrum, with ``different degrees of openness, depending on the community of stakeholders involved. `As open as possible, as closed as necessary' is often used to illustrate the fact that while opening up data can help advance the science, technology and innovation (STI) agenda, this needs to be balanced against issues of costs, privacy, security, intellectual property rights and preventing malevolent uses'' (\href{https://goingdigital.oecd.org/data/notes/No13_ToolkitNote_OpenScience.pdf}{Paic, 2021}).
According to the \href{https://ec.europa.eu/info/research-and-innovation/strategy/strategy-2020-2024/our-digital-future/open-science/open-science-monitor_en}{Open science monitor} of the European Commission, the open science scholarly community, open access to publications and open research data comprise the main pillars of open research and open science.
Without open data, scientific research would progress very slowly. The idea of open data stemmed when large international consortia collaborated on complex projects and sharing data became a necessity. You may already be thinking about some of these applications such as oceanography, particle physics, molecular biology or genetics projects. The first significant initiative in data sharing is considered to be the \href{http://wdc.org.ua/}{World Data Center}, established in 1957 by the International Council for Science. Its original purpose was to serve the International Geophysical Year, a worldwide effort to study the Earth, oceans and atmosphere in a coordinated and synchronous way (\href{https://www.researchgate.net/publication/270166513_The_Origins_and_Principles_of_the_World_Data_Center_System}{Korsmo, 2010}). Data from many different fields can and should be made open. These include but are not limited to science, finance, business, government, culture, weather and the environment.
It makes sense that research data and scientific publications funded by taxpayers should be open and accessible to the public for free. Arguably, the positive societal impact of free access to scientific knowledge outweighs the investment of time and money. The \href{https://ec.europa.eu/info/research-and-innovation/strategy/strategy-2020-2024/our-digital-future/open-science_en}{European Commission} identifies open science and open access to data as the dominant driver for the future of science, with high expectations of improved scientific integrity, a better connection between science and society and making science more responsive to societal challenges. You can learn more about the benefits of open data and exceptions to this rule in the chapter ``Research data''. Whilst some of the ideas around open science and open access are really amazing, it also comes with some negatives. You can learn more about this topic in the chapter ``Pros and cons''.
But let's get back to the topic of this chapter. Where does open access come from and how can it be defined?
Machado (\href{https://goingdigital.oecd.org/data/notes/No13_ToolkitNote_OpenScience.pdf}{2015}) traced the origins of open access to research data that started as anonymous transfers through file transfer protocols (FTPs) within some private networks and exchanging of physical media, such as tapes and disks. This gave rise to the first free access databases of electronic open access data, the \href{http://www.eric.ed.gov/}{Educational Resources Information Center} (ERIC) and \href{https://www.nlm.nih.gov/}{Medlin} (NLM) which are managed by the National Library of Medicine and the National Institute of Health now known under the name \href{http://www.pubmed.gov/}{PubMed}, in 1966 in the USA. These were followed by other catalogues of scientific literature and books. Then everything changed in 1990 with the arrival of the internet. Its concept was developed by the \href{https://home.web.cern.ch/}{European Laboratory for Particle Physics} (CERN) as an answer to the ever-increasing needs of particle physicists to exchange large volumes of data (\href{https://www.academia.edu/2148655/World_Wide_Web_The_Information_Universe}{Berners-Lee} \href{https://www.academia.edu/2148655/World_Wide_Web_The_Information_Universe}{\emph{et al.}, 1992}). In the following year, the repository of physics, mathematics and computer science texts \href{http://arxiv.org/}{arXiv} was created, followed by the genetic research database \href{http://www.ncbi.nlm.nih.gov/genbank}{Genbank} in 1992. Since then, databases and repositories have played a key role in open access, allowing the availability of articles, papers and research materials produced by universities and research centres.
What exactly do we understand regarding the term open access? According to the OECD (\href{https://www.oecd-ilibrary.org/science-and-technology/making-open-science-a-reality_5jrs2f963zs1-en}{2015}), open access is ``unrestricted online access to scientific articles, via a number of channels, such as institutional repositories, journal publishers' websites, researchers' webpages, etc.''
\section{Types of open access}\label{types-of-open-access}
Open access (OA) publication means making a publication freely accessible online in a digital format with no barriers to access. There are different types or levels of open access:
\begin{itemize}
\item
\textbf{Green open access} -- Articles where the author or institution provides access. This is often referred to as self-archiving. Usually, researchers submit their manuscript (published or sometimes unpublished) to an archive. Most institutions provide open access archives. This is usually where preprints are published while undergoing review. Preprints are full drafts of research papers which are currently under peer-review for publication. Researchers choose to share these publicly prior to a completed review to allow for feedback and visibility of their results while they wait to hear back from the journal. It is essential to ensure that the authors comply with the publisher's copyright policy. Some publishers have a standard embargo period before making the work openly accessible. Even in that scenario, meta-data is often exempted from this restriction.
\item
\textbf{Diamond or platinum open access} -- Journals do not charge either readers or authors directly. Publishers then often require funding from the government or non-for-profit, non-commercial organisations, associations or networks or rely on advertising. The peer-review process is performed by volunteers.
\item
\textbf{Gold open access} -- Immediate access to an article and data upon publication is provided by a publisher. Publishing costs can be recovered through fees, but more often, an article processing charge is covered by the author, institution or the funding body of which the research is being sponsored. Gold open access provides a rigorous peer review mechanism. More information on publishing gold access can be found on the \href{https://www.phdontrack.net/open-science/open-access-publishing/\#toc3}{PhD on Track} website.
\item
\textbf{Bronze open access} -- The article is free to read only on the publisher's page. However, it lacks the open licence for reuse.
\item
\textbf{Hybrid open access} -- This is a controversial model in which an author or institution is required to pay the open access article-processing charge to make their paper available open access in a traditional journal which provides a subscription service. It is advised to stay away from this type of open-access publishing as many institutions and funders will not agree to pay a fee.
\item
\textbf{Black open access} -- Free access to publications behind the paywall when people with access share free copies. This is an unauthorised large-scale copyright infringement. Black open access can take the form of shadow libraries, such as \href{https://en.wikipedia.org/wiki/Sci-Hub}{Sci-Hub} or \href{https://en.wikipedia.org/wiki/Library_Genesis}{Library Genesis}. This may also be done by sharing the publication via social media, such as with \href{https://en.wikipedia.org/wiki/ICanHazPDF}{\#ICanHazPDF} hash-tag on Twitter.
\end{itemize}
\href{https://www.budapestopenaccessinitiative.org/}{Budapest Open Access Initiative} defines the terms ``gratis'' and ``libre'' in order to distinguish between free to read versus free to reuse. Gratis open Access (\includegraphics[width=0.09375in,height=0.14583in]{images/paste-9AC8E881.png}) refers to an online access to read the article free of charge. Similarly, libre open access (\includegraphics{images/image2.png}) refers to an online access to read an article free of charge, however this includes some additional rights to reuse the article under specific \href{https://en.wikipedia.org/wiki/Creative_Commons_license}{Creative Commons licences}. Libre open access covers types of open access defined in the \href{https://en.wikipedia.org/wiki/Budapest_Open_Access_Initiative}{Budapest Open Access Initiative}, the \href{https://en.wikipedia.org/wiki/Bethesda_Statement_on_Open_Access_Publishing}{Bethesda Statement on Open Access Publishing} and the \href{https://en.wikipedia.org/wiki/Berlin_Declaration_on_Open_Access_to_Knowledge_in_the_Sciences_and_Humanities}{Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities}.
\section{Quid pro quo}\label{quid-pro-quo}
What are the benefits of publishing your studies in open access journals?
Citation is still considered an important metric of influence and impact of academic work and open-access publications are proven to have a higher citation impact (\href{https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253129}{Langham-Putrow} \href{https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253129}{\emph{et al.}, 2021}). Open access can also help in other ways. For example, it makes publications and data available to researchers from countries and organisations who cannot afford to pay for access. This allows greater access to current scientific research which can help connect people to find solutions to problems they are targeting. For instance, tackling increasing crop yields to address growing populations during the battle against climate change requires lower socio-economic nations and regions to contribute and collaborate and open access facilitates this.
In order to make data open, it needs to be compliant with legislation, such as General Data Protection Regulation (GDPR) in the EU or cyber security legislation, and follow \href{http://www.nature.com/articles/sdata201618}{FAIR Guiding Principles for scientific data management and stewardship} first published by Wilkinson \emph{et al.} (\href{https://www.nature.com/articles/sdata201618}{2016}). Processing personal data transparently makes publishing and re-use of open data easier and patients or respondents more involved in the research. You will learn more about these principles in the chapter ``FAIR principles'' and about policies in the chapter ``Open science policy, scientific integrity and ethics''.
The number of funding agencies and research institutions adopting open science policies is increasing every day, with Europe being in the lead, followed by North America, Asia, Latin America, Oceania and Africa (\href{https://roarmap.eprints.org/}{ROARMAP, n.d.}). Nowadays, about half of published papers are open access and the number keeps increasing. However, it varies greatly based on scientific discipline. In a study of articles published between 2009 and 2015, more than 80\% of astronomy, astrophysics, embryology, tropical medicine and fertility papers were available in open access. In contrast, less than 10\% of those in pharmacy, applied, inorganic and nuclear chemistry and criminology were open access. This study also unearthed that the dominant category of open access is not green or gold open access, but articles made free to read on the publisher's website, without an explicit open licence (\href{https://peerj.com/articles/4375/?utm_source=TrendMD&utm_campaign=PeerJ_Tre\%20ndMD_0&utm_medium=TrendMD}{Piwowar} \href{https://peerj.com/articles/4375/?utm_source=TrendMD&utm_campaign=PeerJ_Tre\%20ndMD_0&utm_medium=TrendMD}{\emph{et al.}, 2018}).
We live in an era where data-driven innovation is transforming society, and the need for open science is unanimously recognised in the scientific research community. Governments, funding agencies and institutions recognise the benefits of open science and open research and are steadily developing strategies, policies and guidelines as groundwork for its existence. Since the original OECD Recommendation was adopted in 2006, many of these policies have been implemented at national and institutional levels and have contributed to significant advancement in this area and open science and open data have become mainstream. At least 58 countries have adopted dedicated national strategies and policies for open data and publications (\href{https://stip.oecd.org/stip.html}{EC/OECD, 2018}). This has had a significant impact in areas such as the reproducibility of scientific results, diffusion of knowledge across society, cross-disciplinary co-operation, resource efficiency, productivity and scientific advancement. An updated OECD Recommendation on research data published in January 2021 (\href{https://www.oecd.org/sti/recommendation-access-to-research-data-from-public-funding.htm\#:~:text=On\%2020\%20January\%202021\%2C\%20the,shown\%20in\%20the\%20figure\%20below.}{OECD, 2021}) emphasises the relevance and importance of several key principles set out in 2006. These are openness, flexibility, transparency, legal conformity, protection of intellectual property, formal responsibility, professionalism, interoperability, quality, security, efficiency, accountability, and sustainability. It also expands the scope to cover not only research data but also related metadata, bespoke algorithms, workflows, models and software that are essential for their interpretation.
\begin{center}\includegraphics[width=0.7\linewidth]{images/slide02} \end{center}
\section{Test your understanding}\label{test-your-understanding-1}
Loading\ldots{}
\textbf{Activities}
In a recommend activities section like this one, we will recommend the activities to increase your understanding of the concepts and improve your practical knowledge.
\begin{itemize}
\item
If you're looking for a tool that enables you simple, free and legal open access to research articles try \href{https://openaccessbutton.org/}{Open Access Button}. Check it out!
\item
Explore a few databases, ideally those that are close to your area of research. You can also check some of those mentioned in this chapter. Look at some data and explore how these databases work.
\item
If you are an Ireland based researcher, you can check out the list of Open Access Journals that are close to your area in the Irish Open Access Publishers (\href{https://www.ioap.ie/irish-oa-publishers}{IOAP}) website.
\item
Check your national and university policies and guidelines about open access publishing. It is possible that your university also has an agreement in place that allows its corresponding authors to publish an agreed number of open access articles without paying the processing charge. Can you find their list and identify those journals where you could publish your research? Discuss this possibility with your supervisor/PI.
\item
Share anything interesting that you learned or found in this chapter with others on our social media.
\end{itemize}
\chapter{Open source, open licencing, scientific programming}\label{open-source-open-licencing-scientific-programming}
\section{Open-source}\label{open-source}
It is no secret that the open science movement got its inspiration from the open-source culture movement. Open-source software refers to source code that anyone can inspect, modify and enhance because the licence under which it is released grants permission to do so. You can find a more detailed definition on the website of the \href{https://opensource.org/osd}{Open Source Initiative}. Similarly, \href{https://www.oshwa.org/definition/}{Open Source Hardware Association} defines open-source hardware as ``hardware whose design is made publicly available so that anyone can study, modify, distribute, make, and sell the design or hardware based on that design.''
The main benefits and reasons for the adoption of open-source software and hardware (\href{https://www.researchgate.net/publication/228296692_Open_Standards_Open_Source_Adoption_in_the_Public_Sector_and_Their_Relationship_to_Microsoft's_Market_Dominance}{Casson and Ryan, 2006}) can be categorised as follows:
\begin{itemize}
\tightlist
\item
Security
\item
Affordability
\item
Transparency
\item
Perpetuity
\item
Interoperability
\item
Flexibility
\item
Localization
\end{itemize}
\section{Open licensing}\label{open-licensing}
Even when your data, software or hardware design are made freely available in the public domain, an explicit licence would provide legal clarity on the access and re-use of it. You can licence the data only if you are the rightful owner. Licensing helps to:
\begin{itemize}
\tightlist
\item
Remove the ambiguity on the re-use of data
\item
Exempt users from copyright infringement
\item
Ensure that the source author is credited rightfully
\item
Ensure that the re-used or re-distributed data remain open access
\item
Ensure that data is not misused or distorted
\end{itemize}
\subsection*{How to choose a licence}\label{how-to-choose-a-licence}
\addcontentsline{toc}{subsection}{How to choose a licence}
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
Ensure that the data is copyrightable. This may vary across domains, jurisdictions, funders, etc.
\item
Check the licensing obligation of the funder(s), institution(s), government, data centre or repository.
\item
If your work is a derivative of a third-party author, ensure to comply with the source data's licensing requirements.
\item
Select the data licence with the conditions that meet your criteria and that covers the content that you want to share.
\end{enumerate}
The most common conditions found in data licences are:
\begin{itemize}
\tightlist
\item
\textbf{Attribution} (BY): The source/author must be acknowledged when it is distributed, displayed, performed or used to derive a new work. If you are using data from multiple sources, each contributor needs to be acknowledged.
\item
\textbf{Copyleft} or \textbf{share alike} (SA): Any new work derived from the licensed data should be released under the same licence of the source data.
\item
\textbf{Non-commercial}: This type of licence prevents the user from using the data for commercial purposes.
\end{itemize}
\section{Licence providers}\label{licence-providers}
\textbf{Prepared licence}: Research Institutions or other data publishers can create licences. For example, the \href{https://www.data-archive.ac.uk/d/1QbEbEi0v_mnprVG2sfFdzJ-vYVqv4GijnIadOWkrbFE/edit}{UK Data Archive} requires that you sign a standard licence agreement that clarifies the rights and responsibilities of both parties and permits the UK Data Archive to perform its curatorial function.
\textbf{Bespoke licence}: If the existing licences don't meet the author's requirement or cater to special circumstances, they can make their own licence. In this scenario, it is mandatory to ensure that the custom licence complies with any existing legal bindings.
\href{https://creativecommons.org/}{\textbf{Creative Commons}}: One of the most popular and widely accepted licence providers for most content with the exception of source code. Three versions (CCO, CC-BY, CC-BY-SA) of it are intended for open licensing. \href{https://choosealicense.com/licenses/}{Choose a licence} by \href{https://github.com/}{GitHub} provides a list of licences that are specific to software codes.
\href{https://opendatacommons.org/}{\textbf{Open data commons}}\textbf{:} Similar to Creative Commons, but these licences are specifically designed for databases.
\section{Scientific programming}\label{scientific-programming}
\subsection*{Documenting your code}\label{documenting-your-code}
\addcontentsline{toc}{subsection}{Documenting your code}
To make your research open, i.e.~transparent and reproducible, it is good practice to share not only your data but also the code used for the analysis, modelling, visualisation, etc. This code is then known as open-source research software. When sharing your code or software, it's good practice to also include documentation explaining how to use your code. So why do you need to prepare this documentation?
Benefits for you:
\begin{itemize}
\tightlist
\item
In six months' time or whenever you choose to work on it, you'll still be able to use your code.
\item
You want people to credit you when they use your code.
\item
You want to learn how to be self-reliant
\item
You may attract others to contribute to your code. Benefits for others:
\item
Others can simply utilise and extend your code. Benefits for science:
\item
You are contributing to science.
\item
You are promoting open science.
\item
Documentation allows for clarity and reproducibility.
\end{itemize}
Here are \textbf{best practices} for writing documentation:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
Provide a README file with the following information:
○ A quick overview of the project
○ Instructions for installation
○ A brief example/tutorial
\item
Allow others to use the problem tracker.
\item
Create application programming interface (API) documentation.
\item
Write down your code.
\item
Use coding conventions, including file structure, comments, naming conventions, programming methods, etc.
\item
Include an introduction for contributors.
\item
Provide citation information.
\item
Include any licensing information.
\item
Include a link to your email address.
\item
List all the file versions and the fundamental changes you made.
\end{enumerate}
A helpful hint: When naming files, make sure their names are descriptive and consistent!
\subsection*{Importance of scientific programming}\label{importance-of-scientific-programming}
\addcontentsline{toc}{subsection}{Importance of scientific programming}
There are multiple ways in which scientists and researchers can benefit from scientific programming. Scientific programming significance includes a wide range of abilities without focusing on any particular field. Generally, scientific programming can facilitate the following:
\begin{itemize}
\tightlist
\item
\textbf{Time-consuming tasks can be automated} -- Automating tasks using scientific programming can simplify long-term tasks or those that are impossible to do by hand. Imagine, for example, that you want to figure out how many tweets were posted about a recent natural disaster and you have to sift through tens of thousands of feeds one by one. A few minutes might be enough to complete this task with code.
\item
\textbf{Creating adaptable research} -- You can modify and rerun your code repeatedly if you write it correctly. Consider you are researching the relationship between socio-economic data and air pollution in a particular location. Using a properly structured and well-commented script, updating each year's socio-economic data can be easily incorporated.
\item
\textbf{Help to publicise the research and share the findings with other researchers} -- Because code is so easily accessible, research becomes more open and repeatable. It helps the researcher to convey their specific methodology to other experts as well as the community.
\item
\textbf{Documenting your thinking} -- You can quickly document your strategy with code. You may use comments to describe each stage of the process (to your future self or others), making it quick and easy to update or adjust things afterwards.
\item
\textbf{Research collaboration} -- Collaboration is facilitated by the use of code. Returning to the previous example, if you are researching air pollution in a particular location and a colleague is researching air pollution in another location, you may compare models, swap scripts and collaborate.
\end{itemize}
The above five features of scientific programming assist the researcher significantly in various ways and are considered key tools to nudge research forward. The significance can be simply identified by the speed of conducting the research through a high level of computation and, most importantly, the collaboration and the modifications that may apply. Scientific programming has been, and continues to be, ground-breaking. From assisting biologists in sequencing the human genome to allowing social scientists to make better economic forecasts the applications are limitless.
\subsection*{What exactly is scientific programming?}\label{what-exactly-is-scientific-programming}
\addcontentsline{toc}{subsection}{What exactly is scientific programming?}
There is a simple definition of scientific programming, yet it covers a vast array of applications and industries. Using a computer-aided program for scientific research is referred to as science programming. Scientific programming can be useful for most scientists and researchers, especially PhD researchers. The rate and reproducibility of a researcher's work can be exponentially increased using scientific programming. Computers, designed for efficiency and scale, can perform massive calculations, store data and analyse results. By automating processes, scientists are able to save time and effort and make research more accurate, reliable and efficient.
It is essential to note that computers are error-free when it comes to mathematical processes. Occasionally, mistakes can happen, but these mistakes usually occur because people make errors when using computers. Computers follow directions, so if a calculation goes wrong, the computer will not understand it independently. However, a computer can do calculations within minutes that would take researchers months or even years to perform. Furthermore, the code will execute the calculations consistently for each run.
\subsection*{Respond to disasters: an example of the power of scientific programming}\label{respond-to-disasters-an-example-of-the-power-of-scientific-programming}
\addcontentsline{toc}{subsection}{Respond to disasters: an example of the power of scientific programming}
In addition to its many advantages, scientific programming can accomplish a great deal of work that would be impossible for one person or even a team of people to complete without it. Lise St.~Denis's research on climate change at Earth Lab demonstrates that power clearly. Lise uses \href{https://twitter.com/lastdenis}{Twitter} to notify first responders about emergency situations that result from natural disasters as they develop or progress. Without this technology vital time sensitive information may be missed.
In the event of a disaster, the police are contacted alongside emergency services. Disaster survivors also reach out to their online communities, sometimes providing vital information. The call volume of hotlines during disasters can be overwhelming for authorities and the only way for people to communicate may be through social media. Thus, sites like Twitter can be a great source of information for disaster response teams. However, the volume of tweets makes it difficult for one person (or team) to vet all the information and still get it to emergency response teams in time. Lise St.~Denis witnessed this through her extensive experience of natural disasters. For instance, during the Carlton complex fire of 2014, she worked on a team that was tasked to sort tweets and compile a full report. Although Lise's team provided useful information, it was unable to keep up with the volume of tweets and responders needed the information faster than the team was able to provide it. For Lise, the answer was obvious, they needed to deploy the superpowers of scientific computing.
Since the beginning of 2013, Lise had been developing a filtration algorithm to harvest data from Twitter and sort it by importance. Using this code, one person can automate the work of a whole team of humans by analysing and categorising every single tweet as they arrive. As a result of the algorithm, tweets are separated into those which first responders need to know about and those which the algorithm deems as less important. One of scientific programmings' capabilities is the ability to ``look'' at enormous amounts of rapidly generated data and categorise it.
\subsection*{The future of scientific programming}\label{the-future-of-scientific-programming}
\addcontentsline{toc}{subsection}{The future of scientific programming}
Because there are so many fascinating possibilities, it is difficult to pick one field of scientific programming that is remarkably promising. Almost every discipline of study has a programming tool that could be considered as the ``future of programming'' and it would be difficult to discuss and list them all.
Modelling is a promising development that serves multiple professions. For decades, models have served as the foundation of science. There are a wide range of examples of using modelling techniques in science, ranging from Earth science (predicting wildfires) to medicine (analysing illnesses). No model is perfect. Thus, there is continuous development in the pursuit of better, more precise models with complete algorithms producing reliable results.
In today's data-driven world, the term science is closely linked with scientific programming. Problems that have baffled scientists for decades are addressed in a matter of seconds by leverage the power of large computers. The rise in efficiency and speed has completely transformed most sectors of modern science. Without question, scientific programming is our way forward. To be part of this movement, you can share your code with others and always appropriately cite the open source you use.
\begin{center}\includegraphics[width=0.7\linewidth]{images/slide3} \end{center}
\section{Test your understanding}\label{test-your-understanding-2}
Loading\ldots{}
\textbf{Activities}
In a recommend activities section like this one, we will recommend the activities to increase your understanding of the concepts and improve your practical knowledge.
\begin{itemize}
\item
Learn more about open-source software and hardware on \href{https://open-science-training-handbook.gitbook.io/book/open-science-basics/open-research-software-and-open-source}{Open Science Training Handbook} or try to work on a \href{https://en.wikipedia.org/wiki/List_of_open-source_hardware_projects}{DIY open-source hardware project}.
\item
Have you ever checked licences on the types of software you are using? Now is the time to do that. Is the software you are using open-source software?
\item
Working on your research project can be exhausting at times. Why don't you try relaxing and practice open-source software skills by playing some of the following games? Play and learn!
○ \href{https://codecombat.com/}{CodeCombat}: These games take you step by step through ideas, starting with basic computer science and gradually increasing in difficulty.
○ \href{https://www.codingame.com/start}{CodinGame}: When you have a better grasp, this game is about solving challenges in specific languages.
○ \href{https://www.codewars.com/}{CodeWars}: Get right into programming challenges and experience debugging your code.
\item
Have you ever shared your code, worked on an open-source hardware project or do you have any other experience related to this chapter? Share it with others on our social media.
\end{itemize}
\chapter{Pros and cons}\label{pros-and-cons}
\section{Misconceptions about open science}\label{misconceptions-about-open-science}
\begin{quote}
``Open science describes the practice of carrying out scientific research in a completely transparent manner, and making the results of that research available to everyone. Isn't that just `science'?''
-- \href{https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0669-2}{Watson, 2015}
\end{quote}
While the situation is improving, there are still many misconceptions about open science that give it a negative image. In 2014, a \href{https://group.springernature.com/gp/group/media/press-releases/archive-2015/perceptions-of-open-access-publishing-are-changing-for-the-bette/12000378}{survey} by the Nature Publishing Group and Palgrave Macmillan found that 40\% of scientists that had not published open access said ``I am concerned about the perceptions of the quality of open access publications''. Other misconceptions commonly cited include: ``There is no peer review for open access publications'', ``Open science leads to worse research'' and ``In open science, it is possible for others to steal your research''. We believe that such misconceptions arise from a lack of understanding of open science. Two motivations in creating this course are to demystify open science and challenge these misconceptions.
Encouragingly, there is a positive trend in understanding open science and acceptance within the scientific community. In a 2015 follow-up \href{https://group.springernature.com/gp/group/media/press-releases/archive-2015/perceptions-of-open-access-publishing-are-changing-for-the-bette/12000378}{survey}, only 27\% of scientists expressed their concern with the perception of the quality of open-access publications compared to 40\% in 2014.
\section{Cons}\label{cons}
While we have created this course because we believe in open science, there are currently some issues with open science that we feel obliged to discuss:
\begin{itemize}
\item
\textbf{Concerns about quality --} While we will talk about the improvements that open science brings to the world of science later, there are some legitimate concerns. \href{https://en.wikipedia.org/wiki/Preprint}{Preprints} accelerate open science but make science accessible before the peer review process. The reader has to be more careful when reading and interpreting preprints. However, most manuscripts on preprint servers are considered to be final drafts of manuscripts that are/will soon be submitted to journals for peer review and processing and should therefore meet high quality standards. Similarly, \href{https://en.wikipedia.org/wiki/Predatory_publishing}{predatory journals} undermine the quality standards expected from scientific publishing. You, the scientist, need to be aware of the phenomenon and pay attention to identify predatory journals when reading or publishing (see \href{https://beallslist.net/}{Beall's List}).
\item
\textbf{Time- and effort-consuming --} Many open science practices are both time and effort-consuming. For example, after publishing open-source software, you might need to spend time on bug fixes and updates, or on interacting with potential users and updating your documentation, all at no further benefit to you. Properly annotating and publishing data sets takes a considerable amount of time. Sometimes, going the open science route can even be associated with additional financial costs, such as publishing under an open access licence. Sadly, there is no good way to sugar-coat this: participating in open science will take up some of your time. However, we will shortly discuss the incentives for spending your time (your most precious resource during your research journey).
\item
\textbf{Open science is not properly incentivised --} Currently, open science practices are not properly awarded and incentivised. The hard currency of high impact publications is still one of the most valuable aspects of your CV. Devoting time and effort to open science aspects that do not contribute to this metric can seem like a waste of time. However, the situation is getting better. More and more funding agencies are beginning to require open science practices in their projects and additional metrics apart from the number of publications and citations are becoming increasingly popular. That said, currently open science practices are not as highly valued as generating more traditional research outputs such as journal articles.
\end{itemize}
\section{Pros}\label{pros}
On the other hand, there are also plenty of positive aspects of open science.
\begin{itemize}
\item
\textbf{Individual benefits for your research --} There are collective benefits of open science that can also be useful for your research. As we have already established, practising open science requires a considerable effort from the individual for the benefit of the collective community. However, as part of the community, you can also access, and benefit from, the efforts of others. Maybe there is a piece of software that you can use because someone made it available open-source. Maybe there is an interesting dataset that is publicly available and relates to your research question. There are also many big collaborative resources that are produced under the umbrella of open science. Open science can produce opportunities for your research that ``closed'' science could not.
\item
\textbf{Investment in your career --} One of the trends in research we have already touched upon is that the general trend in the scientific community is towards adopting open science. A good example of this is open access publishing, which is increasingly required by research institutions and funding agencies. Similar trends can be seen with regards to data deposition or code sharing. Starting to incorporate these open science practices into your research as early as possible shows your ability to adhere to open science practices, which might give you an edge when applying for a position or funding. In addition, there are some more direct benefits to your research from open science. Open science, by definition, makes your research more accessible and visible. It has been shown that research articles published with an open access licence are on average cited more often (\href{https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253129}{Langham-Putrow} \href{https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0253129}{\emph{et al.}, 2021}). Other researchers can more easily engage with or try to reproduce your research or use and cite your data. While not all these achievements are captured by traditional metrics, they can be useful investments in your own career.
\item
\textbf{Open science aims to make science better --} One of the reasons there has been a move towards open science in recent years has been the \href{https://en.wikipedia.org/wiki/Replication_crisis}{replication crisis}, in which it was found that the results of many scientific studies were not reproducible. One of the lessons learned from that was to move towards a more open, transparent and inclusive scientific environment. Open science was therefore designed to make science better. Full access to protocols, methods and analyses can make science more reproducible. Combating publication bias that exists in scientific literature by making data and negative results available can reduce the number of experiments necessary or even help to avoid pitfalls that others might have encountered before. It is well known that results of a treatment, drug or process that have a positive effect on the target group or issue tend to get published more than research that finds negative or no effects. This may be due to authors' motivation to publish but often it is due to journals and publishers being less willing to publish research that might not grab the reader's attention and therefore sell more copies. Pre-prints and open access publishing help to counteract this issue. Pre-prints can also help to accelerate science as well as improve manuscripts by open peer review. Here we can see the real-world impact of open science where both sides of the ``scientific story'' are shared and end users of treatments, drugs, etc. are better informed regarding the overall body of evidence and can make a more informed decision.
\end{itemize}
\section{Conclusion}\label{conclusion}
After discussing the positive and \href{https://www.pygaze.org/2016/03/the-downsides-of-open-science-that-nobody-talks-about/}{negative aspects} of open science, we think it is fair to come to the conclusion that currently open science practices benefit science and the community in general, granted the benefits to individual researchers may not be immediately visible and an investment of time and resources is required. However, we strongly believe that the scientific community as a whole will move towards an increasingly open science approach in the coming years and decades. While currently there might not be enough individual incentives, these will come. Maybe in some disciplines and countries faster than in others.
Finally, hearing about all the best practices in open science can be intimidating. We think it is very important to point out that, while all these practices are important, we believe that it is best to figure out what is possible and appropriate for your work in particular and start implementing these practices bit by bit.
\begin{center}\includegraphics[width=0.7\linewidth]{images/slide4} \end{center}
\textbf{Activities}
In a recommend activities section like this one, we will recommend the activities to increase your understanding of the concepts and improve your practical knowledge.
\begin{itemize}
\item
Sort out your thoughts about open science by filling in the \href{https://www.orion-openscience.eu/public/2019-01/ORION_Questionaire_RPFO-CRECIM.pdf}{ORION Questionnaire}.
\item
Think about how much time and effort you feel confident to dedicate to open science.
\item
Have you come up with other pros and cons that you think are relevant and were not mentioned in this chapter or the ORION Questionnaire? Is there anything you would like to share with others? Feel free to do so on our social media.
\end{itemize}
\chapter{Research data}\label{research-data}
Research data are a fundamental part of the research process and open science.
But what is research data?
The University College Dublin \href{https://hub.ucd.ie/usis/!W_HU_MENU.P_PUBLISH?p_tag=GD-DOCLAND&ID=227}{(UCD) Research Data Management Policy} defines research data as ``information collected to be examined and considered and to serve as a basis for reasoning, discussion or calculation. It is used as a primary source to support technical or scientific enquiry, research, scholarship or artistic activity, is used as evidence in the research process and/or is commonly accepted in the research community as necessary to validate research findings and results.''
Research data exists in a specific research context and provides evidence or validation to claims and findings relevant to a specific research question. Data can be quantitative or qualitative and in physical, digital or analogue format. Quantitative data are measures of quantity and are recorded as numbers. In the research community, there are widely accepted standard units that allow specific quantities to be expressed in ways that are unambiguous and universally understandable. Qualitative data carries information about quality and do not take the form of numbers. Physical data refers to samples and specimens.
Examples of research data:
\begin{itemize}
\tightlist
\item
Documents, spreadsheets and notes.
\item
Laboratory notebooks and field notebooks.
\item
Laboratory protocols, methodologies and workflows.
\item
Questionnaires, surveys, interviews, transcripts, codebooks and test responses.
\item
Standard operating procedures and protocols.
\item
Photographs, films, digital images, audiotapes and videotapes.
\item
Protein or genetic sequences.
\item
Spectral data.
\item
Slides, artefacts, specimens and samples.
\item
Maps and geo-spatial data.
\item
Collection of digital objects acquired and generated during the process of research and results of computer simulations.
\item
Database contents (video, audio, text and images).
\item
Models, algorithms and scripts.
\item
Contents of an application (input, output, log files for analysis software, simulation software and schemas).
\end{itemize}
Research data can be categorised as:
\begin{itemize}
\item
\textbf{Observational} -- data captured in real time, usually unique and irreplaceable. It is collected using methods such as human observation, open-ended surveys or through the use of an instrument or sensor to monitor and record information. E.g., weather data, noise level and recordings.
\item
\textbf{Experimental} -- data collected through experiments or clinical trials by the researcher to measure change or to create differences when a variable is altered. It helps to determine a causal relationship and is typically projectable to a larger population. This type of data is often reproducible, but it can be expensive to do so. E.g., sequencing data and quantitative data recorded with laboratory equipment.
\item
\textbf{Simulation} -- data generated using computer test models that try to determine what would happen under certain conditions. The test model and metadata can be more important than the output. E.g., climate predictions, economic models and chemical reactions.
\item
\textbf{Derived or compiled} -- data originating from processing or combining existing data points, often from different data sources. It can be replaced if lost, but this can be very time-consuming and/or expensive. Typically used in secondary research. E.g., databases and population statistics.
\item
\textbf{Reference or canonical} -- collection of smaller datasets, usually published and curated. E.g., IUCN Red List of Threatened Species and NASA Earth science data.
\end{itemize}
Next, we can divide data into primary and secondary. \textbf{Primary data} are collected or generated first-hand to answer a specific research question. \textbf{Secondary data} refers to existing data that is being reused for a purpose other than the one it was collected for. It tends to be readily available, include large samples and be collected over a long period of time. Nowadays, high quality research and publications can be made using only secondary data. Thanks to open science, sharing and access to data utilising this approach is becoming more popular. It can be, and often is, more cost-effective than primary research. One potential drawback of secondary research is a lack of control over the research question, the data collected and the methods used. The potential strengths and benefits of secondary research do not undermine the importance of primary data collected in well-designed experiments and studies that are necessary for improving our knowledge and understanding of the world around us.
\textbf{Metadata} is another important type of data. It is often referred to as data that provides information, background or context information about other data. Put simply, metadata is data about data. In other words, metadata is structured reference data that helps to sort and identify attributes of the information it describes. Examples of metadata are author, type of data, file size, the date the document was created, HTML tags, geolocation, environmental conditions affecting the main variable and instruments used to collect or generate data. You can learn more about metadata in the chapter ``FAIR principles''.
Other types of data exist that are not commonly shared because of the nature of the records themselves or because of ethical and privacy concerns. E.g., preliminary analysis results, drafts of scientific papers, peer reviews, communication with colleagues or stakeholders. Research data also does not include trade secrets, commercial information, materials necessary to be held confidential by a researcher until data are published or similar information which is protected under law. Personal and medical information that could be used to identify a particular person or culturally sensitive data are special types of data that come under specific legislation. In the EU the relevant regulation is the General Data Protection Regulation (\href{https://gdpr-info.eu/}{GDPR}).
\section{Research data life cycle}\label{research-data-life-cycle}
When working on your research, you are very much focused on your project carrying out the necessary practical tasks. The challenge is to also think about data management. It is in your best interest due to the long-term value of data you might be generating. You want to make sure that at the end of your research not only your data, but also metadata and documentation are complete, preserved and made accessible so that other people can use it and that you get credit for all the hard work you put into collecting or generating that data.
Data management is the practice of collecting, preserving and sharing research data. Data management continues beyond the duration of a specific research project and covers all aspects of curating and caring for data. Different activities and stages of this process that can be schematised in the research data lifecycle model include:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\item
\textbf{Planning} \textbf{--} The first step is identifying data to be collected or generated in your research. It should include the nature, scope and scale of data. Resources and costs associated with data collection should be identified. This will depend on which methodologies or software will be used if new data are collected or produced. Data preservation should be planned before data are collected or generated.
\item
\textbf{Collecting or generating data} \textbf{--} For your research you can use either primary or secondary data. In the case of primary data, you are collecting, generating, storing and organising data and metadata. If you are reusing data made accessible by someone else, these are referred to as secondary data.
\item
\textbf{Processing data} \textbf{--} Processing means converting raw data to formats suitable for analysis or generating new variables. Necessary steps are also cleaning and standardising data and applying quality controls. All steps of data processing activities, including scripts and outputs, should be documented.
\item
\textbf{Analysing data} \textbf{--} Essential parts of your research are data analysis and interpretation. Statistical analysis, computational analysis and data visualisation are used to produce research outputs. All these steps must be reproducible. Therefore, it is important to document all steps of this process.
\item
\textbf{Preserving data} \textbf{--} Data of long-term value should be preserved and made available for others to reuse. This involves selecting data for preservation, converting data to other formats, creating supporting documentation and depositing data in data centres, data repositories or institutional data repositories for preservation. It is important to plan data preservation from the very beginning of the research project in order to collect all necessary metadata.
\item
\textbf{Making data accessible} \textbf{--} Creating online metadata records for data in a data centre/repository, obtaining a unique persistent identifier for data, licensing data for reuse, enabling access to data via a data centre/repository and citing and linking to data and code from research outputs all fall under making data accrssible.
\item
\textbf{Re-using data} \textbf{--} For your research, you can use secondary data collected and made accessible by other researchers. Similarly, your primary data can be used as secondary data. It can be used by other researchers or by you to conduct secondary analysis or follow-up research, by policymakers to inform evidence-based policymaking or used by the scientific community in communication and engagement with the general public, industry, private sector and media.
\end{enumerate}
\section{Data management plan}\label{data-management-plan}
Developing a data management plan (DMP) can be invaluable to your research by ensuring efficient research data management and sharing. A DMP is a document that outlines how data are handled throughout the entire research process and once it is completed. It should consider all aspects of the research data lifecycle. A DMP should be produced before you start working on your research project and you should continually update and edit it when needed. A DMP is not a static but living document. It is okay if you can't answer all the questions at the beginning. Although it requires time to create your DMP and to keep it updated, in the long-term it saves a lot of time and effort. Also, public funding bodies often require at least an outline of a DMP. It is good to know that requirements for a DMP differ between various institutions and funding bodies. In general, it may provide information to answer the following questions:
\begin{itemize}
\item
What is the nature of your research and your scientific hypothesis? What questions are you trying to answer?
\item
Who has what roles and responsibilities? Do you have any special requirements for hardware and software?
\item
How are you planning to generate or collect data? If working with physical samples, how will these be labelled and what system of unique identifiers will be used? If working with people or animals, how will ethical issues be handled?
\item
How will these data be processed?
\item
What quality assurance checks will be carried out and how will you deal with problems, missing values and errors, if found?
\item
How will data be stored and shared during the project (permission levels, version control and backups) and once it is completed (archiving)? How will intellectual property rights be handled?
\end{itemize}
Although not addressed in a DMP, other types of documents and records should be managed during and beyond the life of a project. E.g., correspondence (e-mail and paper-based), project files, grant and ethics applications and approvals, signed consent forms, research reports, technical reports, project reports and files and master lists.
\section{Data access}\label{data-access}
Data can be stored in data centres or repositories. Sometimes, your university or institution might require you to store your data in its repositories. Make sure to familiarise yourself with policies and requirements when planning your research. You can learn more about Data centres and repositories in the chapter ``Data repositories and data centres''.
Not all data uploaded into data centres or repositories are necessarily accessible to everyone. Whether there are concerns that releasing some data can have negative consequences or you just need more time to publish results of analysis using those data, special restrictions can be applied on how others gain access to your data. Depending on the repository, access can be classified, for example, as:
\begin{itemize}
\item
\textbf{Fully open access} \textbf{--} Open data has no restrictions on access. Anyone can view and download it. This makes it more likely to be reused, for others to verify the results of your research and for you to get credit for generating and publishing that data.
\item
\textbf{Embargoed access} \textbf{--} Embargoed data are made available within a specific period of time. Until then, only metadata is made public. At the end of the embargo period, data will become available by either open or mediated access, depending on the option that you've selected. This should give you enough time to publish your findings or to register a patent.
\item
\textbf{Mediated access} \textbf{--} Although metadata is made public, for someone to access your data you must approve an application that should meet conditions you have outlined. This may include requesting proof that a person asking for access to your data is a genuine researcher and that they have ethical approval from their own institution to undertake the research.
\item
\textbf{Restricted access} \textbf{--} Metadata are made public, but the full access is granted only to registered users.
\item
\textbf{Closed access} \textbf{--} Metadata is published, but data is inaccessible and there is no process in place to apply for access to it. This type of access is rarely used. Examples include a case when you worked on generating data, but you don't have the right to publish it. Alternatively, it might be classified as military research or another type of sensitive data.
\item
\textbf{Depositor access} \textbf{--} Data can be accessed solely by depositors.
\end{itemize}
To access data, various types of user agreements exist. The most common type is an open content licence, such as a Creative Commons (CC) or general public licence. You can learn more about different types of licences in the chapter ``Open-source, open licensing, scientific programming''.
\section{What exactly is open research data?}\label{what-exactly-is-open-research-data}
According to the \href{https://ec.europa.eu/info/research-and-innovation/strategy/strategy-2020-2024/our-digital-future/open-science/open-science-monitor/facts-and-figures-open-research-data_en}{European Commission}, ``open research data refers to data underpinning scientific research results that has no restrictions on its access, enabling anyone to access it.''
Making your data open brings a lot of advantages to you, as well as to the whole scientific community. It reduces cost and saves time for the government or private sector when reusing your data for further understanding and knowledge or for other researchers when they don't have to collect the same data again. Many collaborations with other scientists are born this way. By opening your research, you are making it not only more visible but also transparent and you are potentially increasing its impact. Your work can get potential recognition by the general public and in the scientific community. This also encourages scientific enquiry, debate and improvement and validation of research methods. It is possible that others identify errors in your data or methods. This gives you an opportunity to learn from your mistakes and grow as a scientist.
In fact, research integrity is an essential driver of reliable and trustworthy research and scientific discovery. A fundamental principle of the scientific method is reproducibility. The most often used reasons for not meeting the criteria of reproducibility, also known as replicability or repeatability, are: pressure to publish, poor statistical analysis leading to conclusions not supported by results, poor reliability of results, selective reporting and lack of replication within the original environment. Environmental observations and measurements are unique. That is why choosing correct methods and tools and interpretation of results are important in this type of research. On the other side of the spectrum is experimental research which is repeatable by nature. Whether your research is observational or experimental, good data management helps to support your own research integrity, as well as the validity and reproducibility of your research.
It is your responsibility as a researcher along with all other individuals involved in the research process to manage your data well and to make them open. Whether you've already generated data for your research project or are only planning how to do it, it's never too early (or too late) to start considering how to make your data open. Do not forget that you don't own data you collect or generate when working on your research project. Make sure to familiarise yourself with institutional policies and confirm with your PI/supervisor prior to making your research data open. You can learn more about research ethics in the chapter ``Open science policy, scientific integrity and ethics''.
To make data open is not enough. It must also be FAIR.
\begin{center}\includegraphics[width=0.7\linewidth]{images/slide5} \end{center}
\section{Test your understanding}\label{test-your-understanding-3}
Loading\ldots{}
\textbf{Activities}
In a recommend activities section like this one, we will recommend the activities to increase your understanding of the concepts and improve your practical knowledge.
\begin{itemize}
\item
Create metadata for one of your data files. Metadata standards used in different scientific disciplines, including specifications and various tools, can be found on the \href{https://www.dcc.ac.uk/guidance/standards/metadata}{Digital Curation Centre} website. Other useful schemas with overall structure of metadata are available on the \href{https://guides.lib.unc.edu/metadata/standards}{UNC University Libraries} website. Pick one that is the closest to your research and give it a go.
\item
Fill in a DMP template. You can do it online at \href{https://dmponline.dcc.ac.uk/}{DMPonline} after creating an account or you can download one from the EC \href{https://ec.europa.eu/research/participants/data/ref/h2020/gm/reporting/h2020-tpl-oa-data-mgt-plan_en.docx}{Horizon 2020 research and innovation funding programme} or the \href{https://guides.lib.umich.edu/c.php?g=283277&p=2138498}{University of Michigan Library}.
\item
Check out a free online \href{https://mantra.ed.ac.uk/}{MANTRA course} from the University of Edinburgh that was created for those who manage digital data as part of their research project.
\item
Check out the free \href{https://datatree.org.uk/}{Data Tree} research data management course from NERC.
\item
Do you have your data management plan? Or have you found during your research that your data could be managed better and with much less effort should you have had the relevant information when starting your project? Share your experience with others on our social media.
\end{itemize}
\chapter{FAIR principles}\label{fair-principles}
\section{What is FAIR?}\label{what-is-fair}
In a nutshell, FAIR is a set of guiding principles to make data \textbf{F}indable, \textbf{A}ccessible, \textbf{I}nteroperable and \textbf{R}eusable.
The \href{https://www.go-fair.org/fair-principles/}{FAIR principles} were first launched in 2014 at a \href{https://www.lorentzcenter.nl/}{Lorentz Workshop} and officially published in 2016 with the focus on the EU's goal of increasing sharing and reusing of research data. \href{https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=uriserv:OJ.L_.2019.172.01.0056.01.ENG}{The implementation of the FAIR principles} for research data is a requirement imposed by the EU, alongside the EU's request on Open Science \& Open Data. It is noteworthy that the FAIR principles are not a standard.
\subsection*{What is in it for you if you make your data FAIR?}\label{what-is-in-it-for-you-if-you-make-your-data-fair}
\addcontentsline{toc}{subsection}{What is in it for you if you make your data FAIR?}
The FAIR principles have multiple advantages for researchers. In general, by working in line with the FAIR principles, you can make your research more transparent, collaborative and sustainable and meanwhile facilitate your data management and protect your data's value for future use.
More specifically, you can expect the following by working with the FAIR principles:
\begin{itemize}
\item
Greater impact and visibility of your research.
\item
Opportunities for new research collaborations.
\item
More credit for yourself as a researcher.
\item
A more efficient data management plan.
\item
Possibilities for future research.
\end{itemize}
\subsection*{What is in it for science if you make your data FAIR?}\label{what-is-in-it-for-science-if-you-make-your-data-fair}
\addcontentsline{toc}{subsection}{What is in it for science if you make your data FAIR?}
The FAIR principles also bring great benefits to the research community and thereby a fulfilling sense of community commitment to you as a researcher. FAIR principles:
\begin{itemize}
\item
Enhance scientific enquiry and debate.
\item
Enable innovation and new data use.
\item
Increase the efficiency of research due to reusability and replication studies.
\item
Provide a valuable resource for education and training.
\item
Encourage the improvement and validation of research methods.
\item
Enable scrutiny of research results.
\item
Facilitate transparency and accountability.
\end{itemize}
\section{Key concepts to start with if you want to FAIRify your data}\label{key-concepts-to-start-with-if-you-want-to-fairify-your-data}
\textbf{PID (persistent identifier)}
A PID is a long-lasting reference to a document, a file, a web page or another object. It is usually used for digital objects that are accessible over the Internet but can also be used for physical objects. For example, the PID for a book can be its ISBN (International Standard Book Number). The use of a PID can effectively slow or prevent the damage of ``link rot'' in citations, which means that the cited URLs ``go dead'' because the contents are removed for different reasons.
You can encounter all kinds of PIDs in your research work. Here are two of the most frequently used types:
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
\textbf{DOI (digital object identifier)}
\end{enumerate}
The use of DOI is to identify academic and professional information, such as research articles, reports, datasets, publications -- and in some cases government documents and commercial videos.
Archiving your data with a data DOI as the PID will allow you to be compliant with the FAIR principles and enhance the impact of your research through increased visibility, leading to more citations.
You can read more about DOIs on the official website of the \href{https://www.doi.org/}{International DOI Foundation} (IDF).
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{1}
\tightlist
\item
\textbf{ORCID (open researcher and contributor ID)}
\end{enumerate}
How can you find the work of one specific researcher among all the baffling names? ORCID might be your answer. ORCID provides a persistent identity for humans, so that a particular author's contributions to the literature or publications in the humanities can be easily and clearly recognized.
\textbf{Metadata}
In short, you can define Metadata as ``data about data''.
There are multiple categories of metadata with different definitions, while the following three are the most relevant to the FAIR principles.
○ \textbf{Descriptive metadata} are data that allow people to discover and identify them through the context or content, including title, author, abstract, keywords, etc.
○ \textbf{Structural metadata} are data about the project's internal structure and relationships to other objects, including the unit of analysis, data collection method, sampling procedure, etc.
○ \textbf{Administrative metadata} are data that are relevant for managing the project, including provenance, licence, creation date, file type, etc.
Metadata are not set out from the beginning and forgotten about. Instead, they are subject to changes and updates. Remember to add or modify your metadata continuously throughout the project.
Metadata can help you to play better with the FAIR principles, because metadata are machine-readable and, especially when they have a PID, search engines can easily find them.
\section{Let's FAIR up!}\label{lets-fair-up}
\textbf{The principles}
The FAIR principles are quite straightforward. Below are the guidelines and you can read about the details for each at \href{https://www.go-fair.org/fair-principles/}{the FAIR principles} website.
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\tightlist
\item
\textbf{Findable}
\end{enumerate}
F1. Metadata and data are assigned a globally unique and persistent identifier.
F2. Data are described with rich metadata (defined by R1 below).
F3. Metadata clearly and explicitly include the identifier of the data it describes.
F4. Metadata and data are registered or indexed in a searchable resource.
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{1}
\tightlist
\item
\textbf{Accessible}
\end{enumerate}
A1. Metadata and data are retrievable by their identifier using a standardised communications protocol:
A1.1. The protocol is open, free and universally implementable.
A1.2. The protocol allows for an authentication and authorization procedure, where necessary.
A2. Metadata are accessible, even when the data are no longer available.
\begin{enumerate}
\def\labelenumi{\arabic{enumi}.}
\setcounter{enumi}{2}
\tightlist
\item
\textbf{Interoperable}
\end{enumerate}
I1. Metadata and data use a formal, accessible, shared and broadly applicable language for knowledge representation.
I2. (Meta)data use vocabularies that follow FAIR principles:
○ Ontologies
○ Vocabularies
○ Taxonomies