Chapter 10 Git

Git is an open source version control system, enabling minimum-effort documentation of changes over time. It can be downloaded and installed from https://git-scm.com/. A collection of directories and files that is kept track of by git is called a repository, or repo for short.

Although Git is decentralized, in simple projects, it is usually used in a centralized fashion. This means that one of the Git repositories is treated as the central repository, and all collaborators sync their local clones of that repository with that central repo.

Unlike cloud synchronization services such as Dropbox and Sync, Git does not automatically synchronize changes. This may seem tedious but it is in fact an important feature: it facilitates a workflow where documenting what you do and why eventually becomes an automatic process.

Also unlike cloud synchronization services, Git keeps a full history of all changes in the project. This means that it’s relatively easy to find out when a paragraph in a manuscript was last changed; who made that change, and how it looked before that change. It is also easy to see how many lines were added or removed and by whom, and what the reasons were for those changes. This also makes it possible to rewind the project to a specific point in time, and even to let collaborators work in parallel versions of a project that are merged together at a later time.

In other words, whereas cloud synchronization services were in principle designed as convenient tools for, well, file synchronization, Git was designed for collaboration and version control. Whereas if you collaborate intensively on a set of files using Sync, Dropbox, Google Drive, OneDrive, or iCloud, it is easy to end up with conlficting versions of files, Git avoid this as much as possible. With Git, if two people edit the same file, instead of just saving as as a ‘conflicting version, Git merges their changes in a line-by-line fashion. If it turns out that both people edited the same information in the file, Git presents a so-called ’merge conflict’, allowing you to choose which changes to retain.

Note that this Chapter introduces Git as a tool, providing the necessary information to start using it from the Git bash command line. The ‘how-to’ guides relating to workflow are located in Part III of this book. Also note that the author are themselves only a novice user of Git. This means that this Chapter probably oversimplifies Git. For a more thorough explanation, just search the internet - there are many excellent free resources, tutorials, books, and movies. If you are familiar with R already, Jenny Bryan wrote an awesome Open Access book called Happy Git With R, hosted at https://happywithwithr.com. Danielle Navarro has a great concise slide deck that in itself pretty much explains the basics, available at https://djnavarro.github.io/chdss2018/day2/git-slides.html (use the space bar or the arrow keys to navigate).

10.1 Preliminaries

10.1.1 Git Bash

Git Bash is a Windows application that provides an environment that’s very similar to what you’d get if you’d run Git in the Bash environment provided with the *nix operating systems (such as macOS and Linux). Depending on the options you choose if you install Git, you can also access Git from the standard Windows command line interface, but that usually doesn’t come with the pretty colors that Git Bash has available.

Also, depending on the options you choose during install, you can normally rightclick a directory and choose the options “Git Bash Here” from the context menu that pops up. That’s very convenient, because it allows you to quickly interact with Git for a given repository.

And finally, RStudio offers you direct access to Git Bash through the Terminal tab in the bottom-left pane. Note that RStudio also adds a dedicated Git tab to the top-right pane if it detects that the project you opened is a Git repository. How to interact with Git from RStudio is discussed in Chapter 15.

10.1.2 Rights

Before you can interact with a Git repository, you need to have the authorization to do so. Some Git repositories are public (most of mine are; see https://gitlab.com/matherion), which means everybody is authorized to clone the repository. However, of course that doesn’t mean that everybody can also change my files. In a Git repository manager such as GitLab (see Chapter 11), the repository’s owner can add other users who have the right to interact with the Git repository. Git projects that are not set to public aren’t even visible unless you’ve been added to that list of members. Once somebody has access to a Git repository, what they can do depends on the rights they have been given.

10.1.3 Introducing yourself

Git will first need to know who you are. You can specify this using Git Bash. Note that it’s important to use the email address that you also used for your Git repo manager account, because when Git talks to that that server, that email address is how you’ll be identified. To set your username and email address, use:

git config --global user.name "Gjalt-Jorn Peters"
git config --global user.email "gjalt-jorn@behaviorchange.eu"

Note that you should probably use your own name and email address, though.

The first time you’ll ask Git to do something that requires it to authenthicate with the Git server hosting the remote repository, Git will ask for your password. Usually, Git will store this in its credential manager so you’ll only have to provide it once.

10.2 Getting started: cloning

To get started with a repository, you clone it to your local computer. Cloning a repository not only downloads all current versions of the files, but it downloads the entire repository - and therefore, also the complete history of the project. With large projects, this may become quite a large project. However, the advantage is that you can then locally inspect the entire history.

After you cloned the repository, you can make changes to the files and synchronize those with the repo. Note that sometimes, you may want to create a new repository. Although you can do this directory with Git, unless you know what you’re doing, you’ll probably want to initiate new repositories using a central repository suite such as GitLab. Therefore, see Chapter @ref{managing-a-gitlab-project} for instructions on how to initiate a new Git repo. Once you created a new repository at a GitLab server, you will still have to clone it to your local PC before you can start working with it.

Once you cloned a repository to your local machine, you can start working on the files it contains. If you want to try it out, you can open a Git Bash session in a directory where you’d like to clone a repository, and clone the repository containing this book by typing:

git clone https://gitlab.com/psy-ops/psy-ops-guide.git

For more information about cloning, see https://git-scm.com/book/en/v2/Git-Basics-Getting-a-Git-Repository.

10.3 Working on files in a repo: pulling, committing, and pushing

Working on files in a Git repository is no different from working on files that are not in a Git repository. Just use whichever software you prefer to make your edits. However, once you realised one or more changes (that are more or less coherent), you will want to package them up and synchronize them.

In Git, this is called committing the files to the repository. If you create a so-called commit, you basically tell Git to take all (or some) changes that were made to all (or some) of the files in the repository that Git tracks (and you can specify which files Git shouldn’t track, or in other words, which files Git should ignore: see section 10.4), and bundle them together in a package (i.e. a commit). Each commit has its own commit message, where you can (and have to; commit messages are not optional) explain what you did (and maybe why you did that).

To create a commit you have to first stage one or more files. If you work on a project that is very important and/or collaborate heavily using Git and/or have the discipline to keep finegrained documentation of what changes you make and why, you will probably not commit too many changes at once. However, in most cases, I just stage and then commit all changes I made in one go.

To stage all modified files (that includes created and deleted files and directories), use the following command:

git add .

To then specify a commit message, use:

git commit -m "This is a message"

This wraps all those changes up into one commit with that message associated to it.

Commits are created locally. In other words, you don’t need an internet connection - which also means that creating the commit, or a whole bunch of commits if you’re on a roll, does nothing to synchronize your local repository with the central one (and therefore, with the repo’s on everybody else’s machine).

To actually push the commits to the central repository, you have to literally tell Git to do that:

git push

Note that to push to a repository, you need to have be authorized to do so (see the Rights section above).

If there’s pushing, there’s also pulling. Where pushing sends all your commits to the remote repository, pulling downloads all changes from the remote repository that you didn’t yet have locally. In other words, if somebody else changed one or more files since the last time you told Git to pull, pulling will download those change to your computer, overwriting the old files.

You can pull with:

git pull

Because you will want to avoid working on an outdated version of a file, it is wise to always pull all recent changes into your local repository before you start working on anything. So, normally, you pull, work on files and stash your changes into one or more commits, and push either after each commit or set of commits, or when you’re done working in the repository for the day.

If you try to push commits to the remote repository, but the remote repository has been updated in the meantime, Git will refuse. It will tell you that the remote repository contains work you don’t have locally, and that you have to pull first. So, in that case you pull first. Sometimes, Git also refuses pulling: that happens if you made changes to one or more files that you did not yet stash in a commit yet, and that were also changed in the remote repository. In that case, you first have to make sure you create one or more commits with all changes, before you can pull the changes to your local machine. If the same parts of one or more files were changed, you have a merge conflict: you’ll have to tell Git which bits to use. This is discussed in section 10.5.

Commit messages have a useful extra functionality. If you use a Git repo management suite, such as GitLab (or GitHub), these often parse the commit messages and allow you to specify actions that they can then take. For example, in more complicated projects, in projects where an overview and structure and very important, or in projects that just involve people who really like an overview and structure, you may want to use issues, a feature of Git repo management suites. Issues allow you to keep track of tasks relating to the project, and GitLab has a so-called quick action that allows you to specify that a commit closes an issue. If you use this quick action in your commit message, GitLab will close the issue with a message that links to that commit. This is a very efficient way to keep track of progress in the project.

10.4 Preventing files from syncing

You will usually not want to synchronize all files in a directory: projects may contain files including personal data (e.g. raw data) or secrets (e.g. passwords). Git allows you to specify the files to ignore in a file called “.gitignore”. Each line in the gitignore file specified a pattern, and all directories and files matching the pattern are ignored. Details are described in the relevant page of the Git manual.

_PRIVATE_

10.5 Merge conflicts

Normally, if you collaborate in a project, there will be some division of labour, making it unlikely that two people work on the same file at the same time. However, still you may sometimes run into situations where you did edit one or more lines in a file that somebody else edited at the same time.

In that case, probably just after Git forced you to pull recent changes from the remote repository, Git will present you with a merge conflict. In the files that were simultaneously edited locally (by you) and remotely (by somebody else who had already pushed their changes to the server), Git will insert both fragments (i.e. both versions of the conflicting lines), delimited bt three lines produced by Git.

Your job is now to manually merge both versions and by doing so, resolve each conflict. You do this simply by editing the file until it has the state you want it to have. You have three alternatives. First, you can select your version of the file content on those conflicting lines, in which case you remove the version produced by somebody else, as well as the three delimiting lines Git added. Second, you can retain the changes that the other person made, and remove your own version (as well as the three delimiting lines Git added). Or, third, you can really merge both versions into one ‘best of both worlds’ version, and then remove the three delimiting lines.

After you resolved all merge conflicts you stash those changes into a commit (or into several commits), and push them to the server.

10.6 Advanced: branching