Git HOW-TO document for the GISS GCM

Git HOW-TO document for the GISS GCM

Git

Git is a revision control system as CVS is. The main difference between these two systems is that Git is a "distributed" revision control system. This means that each checked out copy of the code ("cloned" in git terms) contains the entire history of the project and in theory can serve as a new repository. This allows to perform a lot of operations locally, without accessing the central repository. These operations include looking into the history, switching between the branches, comparing modified code in the local directory to the original code etc. Local operations are much faster and one doesn't need network access when performing them. Particularly useful are local commits. "git commit -a" will commit changes to local repository, nobody will see them until they are sent to central repository (with "git push"). So while working on ones own code the developer can do a lot of commits to memorize the various stages of the local code. These commits will not interfere with other peoples work and will not be visible to others until the developer sends them to the central repository with "git push". One should remember though that until the code is pushed to the central repository it is developers responsibility to back it up. Since by its nature Git doesn't force the developer to send changes to the central repository as often as CVS does, it is advisable that one keeps the local copy of the code on a filesystem which is regularly backed up.

Setting Up

To work with Git you need to have it installed on your computer. We are planning to use some functionality which was added in 1.7.0 version of git. So preferably you should install version no older than 1.7.0 . Older version will work for most operations but you may see some problems in the future. On Discover the latest version of Git is available as a module. You can load it with:

  module load other/git-1.7.3.4
If you want to install it on your own workstation or laptop, git is available through Mac ports on Mac and as a rpm package on Linux (use EPEL repository on Red Hat 5). Of course, you can always compile it from the source (real easy) which you can get from
  http://git-scm.com/download
You also have to set the following environment variables:
  export GIT_AUTHOR_NAME="your_name"
  export GIT_AUTHOR_EMAIL=your_email
  export GIT_COMMITTER_NAME="your_name"
  export GIT_COMMITTER_EMAIL=your_email
where your_name is the name Git will be using to identify you (in commit info etc.), so preferably use your full name like "John Smith" to avoid confusion. your_email is the email which will be stored together with your name and which people can use to contact you. If you are going to use Git on more than one computer make sure that these variable are set to identical values on all computers. <\p>

Useful Git commands

    (apart from the initial "git clone ..." all commands are executed from inside of modelE directory tree)

  1. to check out the main branch of modelE

    (equivalent of "cvs checkout modelE")

        git clone username@simplex.giss.nasa.gov:/giss/gitrepo/modelE.git
    
    where username is your username on simplex. This will create a directory modelE with all model code in it. It will also create a hidden directory modelE/.git with Git version control information in it (this among other things will include entire history of the code so that a lot of Git operations can be performed without access to the main server).

  2. to switch to a branch

    (after you downloaded the code with "git clone...")

        git checkout branch_name
    
    this will switch the entire directory tree to the branch branch_name. To see the list of available branches type
        git branch -a
    
    If you want to switch to a remote branch (prefixed with "remotes") use a "short" branch_name for this branch (omitting "remotes/origin"). I.e. if you want to work with the branch remotes/origin/AR5_branch just do
        git checkout AR5_branch
    
    The first time you execute this command Git will say that it has created a local branch AR5_branch which is "tracking" a remote branch. Next time it will just switch the branch.

  3. to update the code in your working directory to the latest code in the repository on simplex

    (equivalent of "cvs update"):

        git pull
    
    This will "pull" the latest changes from the repository you cloned from (simplex.giss.nasagov:/giss/gitrepo/modelE.git in our case). Typically one should always use this simple command, unless one needs to do something fancy, like pulling from a private repository of other user. In that case one can use more explicit command
        git pull username@host_name:/path_to_modelE_dir branch_name
    
    But be carefull when using this explicit command, for example, if you omit branch_name you will be pulling from the master branch, even if locally you are on a different branch.

  4. to commit your code to the central repository on simplex

    you have to execute two commands:

        git commit -a
        git push
    
    The first one commits the code "locally", i.e. it stores this information in .git subdirectory. The second command "pushes" this information to the central repository on simplex. Once the information is successfully pushed to simplex Git will send a message to the list with the commit info (similar to commit messages we are getting now from CVS). So if you don't receive such message you may want to check if your "commit" and "push" went through correctly.

    If you are working on more than one branch, then instead of "git push" it is more safe to execute

        git push origin HEAD
    
    This will push only your current local branch to the remote branch with the same name, while "git push" will push changes on all your branches which were committed but not pushed yet (and that may be not what you want).

    It is possible that when you try to "push" Git will complain about possible conflicts and refuse to push. This situation is similar to trying to do "cvs commit" when your code in not up-to-date. In this case you have to do "git pull", resolve conflicts in your local directory and then repeat "git commit", "git push".

    Sometimes "git pull" will request that you do local "git commit" first. This is to prevent your local code from being corrupted by conflicts with imported code. In this case do "git commit -a" as advised by Git and repeat "pull". Typically Git produces useful messages when executing the commands. If something doesn't work as expected read them and most likely you will know what to do.

This small set of commands should get you started. Eventually we will post a more complete list here. You can also read comprehensive Git manuals and tutorials at

  http://git-scm.com/documentation
Also, typing
    git command --help
will produce manual pages for the particular Git command. If you have questions related to Git send me an email, or, better, post them to Modeling Guru forum:
  https://modelingguru.nasa.gov/thread/4743?tstart=0
so that others could profit from the answers. You also may get your answers quicker since other people familiar with Git may read it. As with CVS we will have a Git repository viewer installed at
   http://simplex.giss.nasa.gov/cgi-bin/gitweb.cgi

Working with branches

Git treats branches as local objects, which means that by default information about a new branch is not pushed to parent repository. Also, git clone ... doesn't add the branches from the remote repository to the local repository. To see all local branches one can execute

    git branch
Typically for a fresh clone this will show just a master branch. One can though see the branches in a remote repository with
    git branch -r
But if one wants to work with a remote branch one has to set up a local branch which is "tracking" a remote branch. For example, to work with origin/AR5_branch branch one can create a local branch with
    git checkout --track -b AR5_branch origin/AR5_branch
Starting with the Git version 1.7 this command can be shortened to
    git checkout AR5_branch
Once a local branch has been created one can always switch to it with
    git checkout branch_name
The fact that AR5_branch we have just created is tracking a remote branch means that git pull will update your local branch from a remote repository and git push will send your local changes to a remote branch. One should mention that if for a local branch we have chosen a name which is different from the name of a remote branch then git pull will still update the local branch from the remote one, but git push will have no effect. One can use this functionality to create ones own branch which one wants to periodically update from a public remote branch.

To create a simple local branch (which will start from the current checked out state) typically one does

    git checkout -b branch_name
If one wants then to commit it to central repository (to make it available to other users) one can do it with
    git push origin branch_name

Using Git as a CVS server

Git is capable of simulating the behavior of CVS server, which means that one can access central Git repository using "cvs" (instead of "git") on local machine. The use of such method is not encouraged and should be avoided if at all possible, but there may be circumstances when one can't use "git" (you have no control over local machine and can't install Git, connection is too slow to download the entire repository, you use a regression script which was written for CVS and was not yet converted to Git etc.).

If you decide to use this method keep in mind that only limited set of CVS operation is supported (pretty much just simple "checkout", "update" and "commit"). The execution may be very slow (may have to wait for several minutes) since Git has to build a special database for it. Set the following environment variable:

  export CVS_SERVER="git cvsserver"
then you can check out the main branch of the model with:
  cvs -d username@simplex.giss.nasa.gov:/giss/gitrepo/modelE.git \
      checkout -d modelE master
To check out a particular branch "branch_name" do:
  cvs -d username@simplex.giss.nasa.gov:/giss/gitrepo/modelE.git \
      checkout -d branch_name branch_name
Keep in mind that "cvs status" will always show that you are on the main trunk even if you have checked out a branch. It is up to you to remember this info. Also, this "simulated" CVS repository has nothing to do with the original CVS repository we were using before the switch to Git. Don't try to use these commands on the modelE directory checked out from the old CVS repository - it will destroy your code. This method should be treated just as a temporary hack.