Chapter 6 Open Science
Opening science does not only mean that scientific products should be openly accessible, or that the scientific process should be open to scrutiny: it also means that science should be open to anybody to participate. Historically1, this has not been the case
6.3 Free/Libre and Open Source Software
For science to be open, it is important that the infrastructure that is built is free to use, and will remain free in perpetuity. This infrastructure should be owned by the community, not one or more commercial organisations. Therefore, it i important to build this infrastructure using Free/Libre and Open Source Software.
Software can be free in two ways: free as in beer and free as in speech. Free/Libre and Open Source Software (FLOSS) is free in both ways, the Free signifying the first way, and the Libre signifying the second way.
Although choosing to use FLOSS packages does not completely eliminate so-called ‘vendor lock-in’, it does eliminate many forms. For example, consider these six types of vendor lock-in (see https://twitter.com/jeroenbosman/status/1194618057181794306 for the origin):
A. Disincentives to combine offerings from various vendors 1. User interface and technical compatibility 2. Sales combinations and package deals B. Disincentives to switch to another vendor 1. Knowledge investments 2. Data/procedure adaption 3. Data applicability 4. Collaboration opportunity
Of these, A2, B3 and B4 are eliminated by using FLOSS solutions.
An additional benefit of FLOSS is that it is generally more secure (suggested reasons are that “developers are usually also users of the software, developers are members of a community of developers, public availability of the source code and fast bug removal practices since thousands of independent programmers testing and fixing bugs of the software”; Pandey and Tiwari 2011).
6.4 Open Data
6.4.1 Types of data participants provide
There are three types of data participants can provide: personal data, creations, and facts.
126.96.36.199 Personal data
Personal data are data about a person, and are that person’s property as established in the General Data Protection Regulation (Crutzen, Peters, and Mondschein 2019). Unless a person decides to release their personal data under a license or into the public domain, these data can never be owned by another person or organisation: at most, those can temporarily control those personal data.
Creative works are copyrighted by their creator, as estbalished in intellectual property law. Qualitative data are usually creations of the data provider, and as such, the data providers (participants) hold copyright of those data.
Data that are not about persons and that are not creative works are facts, which intellectual property law defines as existing in the public domain. Anonymized quantitative data in psychological research usually falls within this category.
6.4.2 Raw data
It has been argued that “[Without] raw data, [there is] no science” (Miyakawa 2020). The availability of raw data has many benefits. One is that it enables the close scrutiny required when one aims to engage in an exercise as complex as the scientific endeavour. The analyses that raw data are subjected to that ultimately lead a researcher to their conclusions comprise many decisions. To err is human, and each of these decisions is therefore subject to some probability of error. When engaging in complex endeavours, therefore, some errors are inevitably made. Making raw data available increases the oppotunities to correct these.
Second, because most decisions a researcher makes are subjective, re-analyses of the same raw data can yield different conclusions - and in fact, this has been shown to be the case (Silberzahn et al. 2018). This means that the original researcher’s results and conclusions are to a degree arbitrary: consequences of the route the researcher ended up taking in the garden of forking paths (Gelman and Loken 2014). Re-analysis of the same data can yield insights into which forks play a particularly large role in determining the final destination, and as such, which decisions require especially comprehensive justifications.
Third, many research questions can be answered with existing data. That requires the data to be available.
6.4.3 Processed data
6.5 Open Materials
Crutzen, Rik, Gjalt-Jorn Ygram Peters, and Christopher Mondschein. 2019. “Why and How We Should Care About the General Data Protection Regulation.” Psychology & Health 34 (11): 1347–57. https://doi.org/10.1080/08870446.2019.1606222.
Gelman, Andrew, and Eric Loken. 2014. “The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No ‘Fishing Expedition’ or ‘P-Hacking’ and the Research Hypothesis Was Posited Ahead of Time.” Psychological Bulletin 140 (5): 1272–80. https://doi.org/dx.doi.org/10.1037/a0037714.
Miyakawa, Tsuyoshi. 2020. “No Raw Data, No Science: Another Possible Source of the Reproducibility Crisis.” Molecular Brain 13 (1): 24. https://doi.org/10.1186/s13041-020-0552-2.
Pandey, R. K., and Vinay Tiwari. 2011. “Reliability Issues in Open Source Software.” International Journal of Computer Applications 34 (1): 34–38. https://doi.org/10.5120/4065-5849.
Silberzahn, R., E. L. Uhlmann, D. P. Martin, P. Anselmi, F. Aust, E. Awtrey, Š Bahník, et al. 2018. “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results:” Advances in Methods and Practices in Psychological Science, August. https://doi.org/10.1177/2515245917747646.
Well, in recent history; ↩