Beyond Basic R - Version Control with Git
Brief introduction to version control, Git, and GitHub; plus, resources for learning more.
Depending on how new you are to software development and/or R programming, you may have heard people mention version control, Git, or GitHub. Version control refers to the idea of tracking changes to files through time and various contributors. Git is an example of a version control tool, and GitHub is a popular web interface for Git. That’s it. Easy!
But wait, I would need to learn an additional tool?
Yes, but don’t panic! Git is a tool with various commands that you can use to help track your changes. Luckily, you don’t need to know too many commands in Git to use the basic functionality. As an added bonus, using Git with RStudio takes away some of the burden of knowing Git commands by including buttons for common actions.
As with any tool that you pick up to help your scientific workflows, there is some upfront work before you can start seeing the benefits. Don’t let that deter you. Git can be very easy once you get the gist. Think about the benefits of being able to track changes: you can make some changes, have a record of that change and who made it, and you can tie that change to a specific problem that was reported or feature request that was noted.
Tracking changes are not the only benefit of using Git and GitHub. If you work on your code with other than just yourself, Git and GitHub is a great way to facilitate collaboration and simultaneous code development. Multiple people can work on the same body of code at one time and request to combine their edits/additions/deletions into the main code along with a peer review. You can “freeze” your body of code at a specific version so that you can reference it at that state in the future (perfect for referencing your scripts in a paper). You can also work on separate features at one time without one depending on the other, so you can easily keep one but discard the other.
Need more convincing? Check these out:
- First three sections of An introduction to Git and how to use it with RStudio - What is version control?, What is Git?, and What is GitHub? (François Michonneau)
- Excuse me, do you have a moment to talk about version control? (Jenny Bryan)
Let’s try it! Where do I start?
Ease into using Git and GitHub. There is no need need to jump into the deep end with a full blown Git project to start learning about Git. We suggest you start by creating an account and following a repository for a package or project available on USGS-R. You’ll start to get notifications for the repository and can check in periodically to see how others are using Git and GitHub to track their code changes, manage project features and bugs, and collaborate effectively as a team. It will also make you feel more comfortable with the terminology.
If you’d rather start by getting a project setup in Git and GitHub, you can follow our lesson on version control in the R Package Development curriculum. Please note that you don’t need to be creating an R package to use Git or GitHub. Here is an example of a basic R project that has R code, is not a package, and uses Git and GitHub.
There are many existing blogs and websites that dive into how to use these tools. There are also many different ways to use apply Git and GitHub to create an effective workflow. We suggest you follow the typical workflow used by the community at USGS-R; see our lesson on version control from the R Package Development course. Here are some other resources for mastering Git and GitHub:
- Happy Git and GitHub for the useR (Jenny Bryan)
- Git guides (Mara Averick)
- Git and GitHub from R Packages (Hadley Wickham)
- Version control with Git (Roel M. Hogervorst)
Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.
Beyond Basic R - Mapping
August 16, 2018
Introduction There are many different R packages for dealing with spatial data. The main distinctions between them involve the types of data they work with — raster or vector — and the sophistication of the analyses they can do.
Beyond Basic R - Plotting with ggplot2 and Multiple Plots in One Figure
August 9, 2018
R can create almost any plot imaginable and as with most things in R if you don’t know where to start, try Google. The Introduction to R curriculum summarizes some of the most used plots, but cannot begin to expose people to the breadth of plot options that exist.
Beyond Basic R - Data Munging
August 1, 2018
What we couldn’t cover In the data cleaning portion of our Intro to R class, we cover a variety of common data manipulation tasks. Most of these were achieved using the package dplyr, including removing or retaining certain columns (select), filtering out rows by column condition (filter), creating new columns (mutate), renaming columns (rename), grouping data by categorical variables (group_by), and summarizing data (summarize).
Beyond Basic R - Introduction and Best Practices
July 30, 2018
We queried more than 60 people who have taken the USGS Introduction to R class over the last two years to understand what other skills and techniques are desired, but not covered in the course.
The Hydro Network-Linked Data Index
November 2, 2020
Introduction updated 11-2-2020 after updates described here. The Hydro Network-Linked Data Index (NLDI) is a system that can index data to NHDPlus V2 catchments and offers a search service to discover indexed information.