What is Git/Github and why to use it as a researcher
What are Git and GitHub?
Git is the de facto standard version control system (git-scm.com). Think of it as a tool that tracks every change you make to your files. It creates a complete history of your project, allowing you to revert to previous versions, compare changes, and work on different features without affecting the main code.
It operates locally on your computer, and it can be used locally, without having to connect to a remote platform. Most Git users add the remote connectivity, for example with a platform such as GitHub.
Git is free, open-source, trademarked by a US American non-profit organization that supports free and open source software projects (Software Freedom Conservancy).
GitHub is the largest cloud-based service that hosts Git repositories. It’s like a large collection for code, where your project folders (aka repositories aka repos) can be either kept private or made public and downloadable. It provides a central place to store your project online, enabling you to share your work, collaborate with others, and have an off-site backup. While Git provides the protocol and machinery, GitHub is the platform that hosts the work and provides tools for collaboration.
GitHub is part of Microsoft. There are alternatives, e.g. non-profit Codeberg or BitBucket, but GitHub is the most widely used platform.
As a further primer after this workshop, this 2016 PLOS paper with accompanying files on GitHub might be helpful (but it only uses command line code).
Why an Academic Researcher Needs It 🔬
Even if you’re not a data analyst or software developer, Git and GitHub are invaluable for several reasons:
Look at the first part of the current URL: this page was entirely prepared in RStudio as a Quarto website project, and is published via GitHub.com and GitHub pages
Reproducible Research: By tracking every change to your analysis scripts, you create a transparent and verifiable history. A reviewer, collaborator, or even your future self can see exactly how a figure or result was generated from the raw data. This is a core characteristic of transparent and reproducible science.
Showcase your own techniques: GitHub is a good platform for making code scripts or R packages available to the public.
Backup and Disaster Recovery: Your research code is valuable, and probably has to be published together with your research paper. If your computer fails, your entire project is safe and sound on GitHub. It’s a free, off-site backup.
Collaboration with Students: As a principal investigator, you can use GitHub to work on projects with your students. You can review their code, suggest changes, and merge their contributions into the main project. You can see who made which changes and when, which helps with accountability and streamlines the collaboration process.
Access Anywhere: You can access your code from any computer with an internet connection. While working remotely is also possibly with other cloud solutions (e.g. LJMU’s Microsoft, or Google), they lack the version control functionality.
Without Git, you probably end up with a lot of different code files, maybe with the date in the name. While possible, tracking changes across these files is more difficult and not part of an enforced workflow, unlike in Git where it is mandatory to comment changes to single files or groups of files.
A Simple GitHub Workflow for RStudio
This diagram illustrates a basic workflow for a single user using Git and GitHub for an R project in RStudio. Text in red arrows are the functions used by Git. !! You can just use Git locally, without pushing/publishing to a remote/cloud location such as GitHub
