You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

Overview

Some of the Apache Commons components use GIT Source Code Management System instead of Subversion.

Both systems allow collaborative development and both systems maintain an history of file changes. There are however several differences.

Distributed Version Control

Git is a distributed version control system. This means that instead of a single central repository holding the full history of project files and numerous clients connecting to it to check out some versions, Git uses a symmetrical view were everyone connecting to the repository clones the full history and from then on could (at least theoretically) act as a new server that another user could clone and so on. Each repository is created by cloning an origin repository. Once cloned, the origin repository is the first remote repository known to the clone. It is possible to add later on several other remote, so a complete web of repositories can be created. Of course, for collaborative development, some policy has to be decided and modifications made by one user on its own cloned repository must be pushed back to a public repository.

At Apache, the policy is that the official reference is the one held by Apache servers (for example https://git-wip-us.apache.org/repos/asf/commons-math.git). Therefore, all users who want to get the latest version know this is were they should point at to retrieve it, and developers who have commit access must push their modifications back to this repository for official publication.

Distributed version control allow some additional features.

A first use case is a user who do not have commit access but would like to contribute something to the project. This user would clone the Apache origin repository on a publicly accessible computer where he would have commit access, then he would commit his changes there. Once the features are complete, the user would propose to the project that they import his changes back to the official Apache repository. In order to do so, he would make his repository available (even read-only). Then an Apache committer willing to review the work would declare this repository as a remote for his own working clone and would pull the proposed changes. He could review everything on his computer, and if satisfied could push the to the Apache origin repository, as he has write access to it. In a way, the Apache committer acts here as a proxy for the contributor, and makes sure everything is good to include. (Consequently Git can have different authors and commiters of a commit. If pulled in as git-patch or pull request the author is preserved).

A second use case is an Apache committer working either on a long experimental stuff not yet ready for publication or working without internet access for some time (typically during a business trip). In both cases, the committer would simply commit his work on his laptop, using the full features of the source code management system (branches, version comparisons, commits, ...). Once the experiment is completed or internet access is recovered, the committer would push his work from the past few hours, days or week back to official repository, with all independent commits preserved instead of being forced to push a big blob representing a tremendous work all at once, which would be impossible for his peer committers to review.

A third use case is a user who do not have commit access (and don't want to), but needs to maintain some local changes. This user would clone the Apache origin repository on a private computer, use Git on this computer to manage his local changes, and from time to time will merge changes from origin into his clone. This user would never push anything back.

Git References

There are numerous references available online for Git. The first one is the official Pro Git book.

A quick Git reference: Git Reference.

An Apache specific page is here.

There is also a wiki at kernel.org: Git Wiki Homepage.

Also a quick tutorial on Git for SVN users. Eclipse users could have a look at Git version control with Eclipse (EGit) - Tutorial.

Git configuration

The configuration for the local Git client is stored at two different levels. There is a global configuration (typically in your home directory) where you can put everything that will remain the same accross all repositories you clone, and there is a local configuration in each repository. You can modify and list configuration keys and values using the git config command. Before you try anything else (even before you clone your first repository), you should configure at least configure one parameter: the core.autocrlf setting. This setting will adapt the line-ending conversion done between the Apache repository and your workspace.

If you are using MacOSX or Linux, you should run:

  git config --global core.autocrlf input

If you are using Windows, you should run:

  git config --global core.autocrlf true

The first setting forces Git to only strip accidental CR/LF when committing into the repository, but never when cheking out files from the repository. The second setting forces Git to convert CR/LF line endings in the workspace while maintaining LF only line endings in the repository.

If for some reason some specific files needs to be checked in with specific conversion (or without conversion at all), they can be specified in a `.gitattributes` file. For example, you can force files to be handled as text, or on the contrary to never be considered as text and therefore not converted:

  # general pattern for files known to be text: End Of Line transformations will be done
  *.apt                               text
  *.html                              text
  *.java                              text
  *.properties                        text
  *.puml                              text
  *.svg                               text
  *.txt                               text
  *.xml                               text
  *.fml                               text

  # general pattern for files known to not be text: no End Of Line transformations will be done
  *.gz                               -text
  *.zip                              -text
  *.ico                              -text
  *.xcf                              -text
  *.jpg                              -text
  *.odg                              -text
  *.png                              -text

  # specific data files known to be text: End Of Line transformations will be done
  .gitattributes                      text
  .gitignore                          text
  .checkstyle                         text

  # specific data files known to not be text: no End Of Line transformations will be done
  src/*/resources/*/weird-*-binary-file  -text

Two other important parameters should be set, but they may need to be set on a per-repository basis, in case you have different usernames and mail addresses for Apache and non-Apache projects. In this case, the following comands must be run after you have cloned the repository, and they should be run from inside the cloned workspace:

  git config --local user.name "You Name"
  git config --local user.email apacheID@apache.org

Comparison with Subversion commands

One of the most important difference from a user point of view is that since there is always one local repository and one or several remote repositories, there is a distinction in git between saving some work only locally on a private computer and making it available to other people who see only public remote computers. The first action is called commit, and it is therefore completely different from a subversion commit. The second action is called push. The equivalent to svn commit is therefore a pair of two commands, git commit followed by git push. It is possible to perform several git commits without doing any git push, which is impossible to do with subversion. Note that most commands (including log
and status) only work on the local copy of a remote repository. You should therefore use git fetch regularly to refresh your copy (if you do not want to pull).

We first list a subversion command, and after that the equivalent git command (we use Apache Commons Math git repository as an example, of course you should adapt it depending on the project).

git clone https://git-wip-us.apache.org/repos/asf/commons-math.git (read-only access)

  • svn diff
    git diff (shows only unstaged changes, git diff --cached shows prepared commit)
  • svn add
    svn add – used to stage for commit
  • svn update
    git pull
  • svn commit
    git commit, followed by git push. You need to stage (aka add all files which should be commited)
  • svn status
    (optionally refresh local state git fetch then) git status
  • {{svn revert }}path
    {{git checkout -- }}path
  • svn info
    git remote -v and git remote origin
  • svn cp _https://svn.apache.org/.../trunk_ _https://svn.apache.org/.../tags/my-tag_ -m message
    git tag -m message my-tag (or better, add also -s or -u option to create a cryptographically signed tag)

if you want the tag to also be on Apache servers, you should git push origin my-tag to push it to the origin remote repository)

if you want the branch to also be on Apache servers, you should git push origin my-branch to push it to the origin remote repository)

  • {{svn help }}sub-command
    git sub-command --help
  • No labels