8  Git

For full documentation of using git from the command line you can read the Git Pro Book.

8.1 Common Git Commands

Note: These commands will only work if you are in a git repo.

8.1.1 Workflow

  • git status shows any files updated locally
  • git add [filename] add file to staging area
  • git add . adds all modified files to staging area
  • git commit create commit of staged files - this will prompt you to enter commit message. You can also use the -m flag to write your message directly with commit command (git commit -m "updates xyz").
  • git commit -am "commit message" will add all files and commit them in one step. However,
  • git push pushes your commits to github
  • git pull pulls updates from github (it is good practice to pull before commiting)
  • git mv <oldname> <newname> to rename and/or move a file without losing its link to its earlier history. Read more
  • git stash to stash changes; git stash pop to reapply stashed changes

8.1.2 Repos

  • git clone [repo_url] clones git repo.
  • git init [repo_name] creates a new empty repo

8.1.3 Branching

  • git branch shows local branches (git branch -a will show you all branches local and on github)
  • git checkout [branch_name] switches branches
  • git checkout -b [new_branch_name] switches to new branch (git branch [new_branch_name] creates new branch but doesn’t switch to it).
  • git branch -d [brand_name] deletes a local branch. This must be done from another branch or main.
  • git push origin -d [brach_name] deletes branch_name on remote/github (assumes remote is called origin, which it is for us).

8.1.4 Log and Versioning

  • git log displays the most recent commits
  • git log --stat -[N] displays stats on the most recent N commits. Cleaner than log
  • git log [filename] displays commit history for a file. May have problems if the filename was ever changed. In that case consider, git log --follow -- [filename]. Can add–stat` for more info on commit

8.1.5 Diffs

  • we use meld merge for file diffs

  • meld merge can do diffs on two files saved locally. For this, you can use the gui and select the files. For example, between a qmd file for a paper in a repo and another copy of that file with edits from a lab member saved outside the repo.

  • meld merge can also be set up to be used as the difftool for git. In Linux, add the following to your .gitconfig file

[diff]
    tool = meld
[difftool]
    prompt = false
[difftool "meld"]
cmd = meld "$LOCAL" "$REMOTE"

# The order of the Meld GUI window panes can be controlled by the order of $LOCAL and $REMOTE in cmd.  JJC still not certain about this?

Once you have done this, git difftool will call meld. You can now use meld for diffs:

  • git difftool filename to compare local copy (not committed) of filename to head on repo
  • git difftool <COMMIT_HASH> filename to compare previous version of filename to head version on repo.
  • git difftool HEAD HEAD~2 filename to compare head version of filename on repo with version of filename two commits back.
  • git difftool <COMMIT_HASH_1> <COMMIT_HASH_2> filename to compare two versions of a file across two separate commits that are not the head on the repo.

8.2 Git Command Line for Windows

The recommended terminal for using the Git command line on Windows is Git Bash. You can have Git command line and Git Desktop installed on the same machine and they mostly do not affect each other (see notes below).

8.2.1 Installation

During installation, you will be given several options:

  1. Select the default text editor. It defaults to Vim, but you can select any text editor you have installed on your system.
  2. Override branch name for new repos created with git init. It defaults to “master” but you can set your own. This does not affect existing repos. I selected to override and use “main” which is more familiar to me.
  3. Adjust path environment. The most cautious choice is to use Git soley from Git Bash; I selected this as I do not plan on using any other 3rd party software.
  4. Change SSH executable & backend. The default is the bundled OpenSSH software and crt file.
  5. Configure line endings. As Windows and Unix use different line-end characters, I recommend using the default which is to check out as Windows and commit at Unix. This will prevent issues for other users not on Windows.
  6. Select terminal emulator. I selected the default MinTTY option to make sure I didn’t mistake a cmd terminal window for my git terminal.
  7. Choose behavior of git pull. Leave it on the default unless you know what effects the other options may have.
  8. Choose credential manager. Choose the default which is the bundled Git Credentials Manager. You will then be able to authenticate once and stay logged in.
  9. Extra options. By default, “Enable file system caching” is checked and “Enable symbolic links” is unchecked. Leave as is unless you know what each option affects.
  10. Experimental options. Leave these unchecked unless you know what you’re doing!

8.2.2 Windows Notes

  1. When first attempting to interact with a repo (either clone/pull/push) Windows will pop up a Git Credentials Window for you to authenticate. Select “Use website” and it will open a browser window the same as happens with Github Desktop. It takes a moment for the connection to be made before you will be able to click the green button to authenticate.
  2. When navigating between directories, if a directory has a space in the name, you must enclose it in single quotes (not backtics)
  3. Occasionally I have typed unequal quotes, ie cd 'My Documents" and although GitBash acts as if the command was accepted (showing a new command prompt), nothing happens. The only solution I have found is to close and re-open GitBash.
  4. pwd is a shortcut to print the working directory.

8.3 git filter-repo

git filter-repo is a tool that can be used to remove large or sensitive files from a repository. It removes the file from all caches, commits, and revision histories without affecting the rest of the repo.

One note: If you commit a large file and make no other changes to the repo in that commit, then you use this tool to remove all traces of that file, the entire commit WILL disappear from the revision history. The commit will remain as long as ANY other file was also touched by the commit.

8.3.1 Installation

  1. You must first install Python. In GitBash run python3and it will install it from the Windows Store.
  2. Find the location of your git installation by running git --exec-path in GitBash.
  3. Save this file into your Downloads folder if your OS won’t allow you to save directly to the git installation location.
  4. Using File Explorer, change the file name to remove the .txt extension (in Windows 11, at least, it will not download without adding that file extension.)
  5. Copy the renamed file into your git installation folder if you weren’t already able to save it there. You will be prompted to provide administrative authorization to permit the copy.
  6. cd into a repo and type git filter-repo to test. If the installation was successful it will say no arguments specified

8.3.2 Usage

First you will need to know the full repo path & name of the large or sensitive files you wish to delete. To do this, you can browse the previous commits in the repo to identify where they were first added. (Technically, you can get the path/name of the large file from any commit touching that file, but getting it from the initial commit makes sure you don’t miss any renames or moves).

You could also run the following command to find XX largest objects in repo:

git rev-list --objects --all \  
  | grep "$(git verify-pack -v .git/objects/pack/*.idx \  
           | sort -k 3 -n \  
           | tail -XX \  
           | awk '{print$1}')"

This outputs the path names in a format which is easy to copy.

Once you have the path/file names, follow the commands here to remove each file and add them to gitignore so they don’t accidentally get re-committed. You can run an entire list of git filter-repo commands to remove many files at once.

8.4 Github Website Tools

8.4.1 Create Personal Access Token

On the website, go to Account Settings > Developer Settings > Personal Access Tokens > Token (Classic). There is a Beta version that allows more fine-grained access but I haven’t experimented with that yet.

Hit “Generate new token (Classic)”. You can specify what this token will be used for and select various permissions to allow for this token.

When the token is generated, you must save it somewhere - Github won’t let you see it again. You can then use this token for certain privileged access actions, such as importing another repo into a clean repo repo.

8.4.2 Import a repo

This action is used when you want to copy another repo into a new repo. It copies ALL past fils, commits, and history, but any actions taken to the new repo will not affect the old repo in the way a branch would.

  1. Create a new repo via the website “New repo” button. Give it whatever name and privacy settings are desired.
  2. On the next page, to populate your repository, choose “Import code from another repository”.
  3. It will then ask for “Your old repository’s clone URL”. If you are importing another github repo, the plain repo URL is all that’s needed i.e. “https://github.com/jjcurtin/analysis_risk” will import an exact copy of the Risk1 analysis repo. Click “Begin Import”.
  4. You will be asked to provide your Login (your github email, NOT the password) and Private Access Token (see above).