Posts Tagged Subversion

Subversion Migration Tools

If you are planning to migrate existing repositories (on other SCMs) to Subversion here are a couple of open source migration tools that I found useful:

  • Cvs2svn: As the name indicates migrates CVS repositories to SVN.
  • Importer for SVN: This tool works on a wide variety of version control systems.


The tool claims that the main design goals are robustness and 100% data preservation. My experience tend to confirm that claim, and also in agreement with the following description

cvs2svn infers what happened in the history of your CVS repository and replicates that history as accurately as possible in the target SCM. All revisions, branches, tags, log messages, author names, and commit dates are converted. cvs2svn deduces what CVS modifications were made at the same time, and outputs these modifications grouped together as changesets in the target SCM. cvs2svn also deals with many CVS quirks and is highly configurable.

As far as the setup goes:

  • download cvs2svn tar from
  • if CVS and SVN are on different machines, copy the CVS data onto the target machine where SVN is installed
  • make sure python is there on the classpath
  • run cvs2svn. Here there are lot of command line options and well documented too. Example:
    python cvs2svn --svnrepos /svn/repo/path --fs-type=fsfs \
    --encoding=utf_8  --fallback-encoding=windows-1252 \
    --default-eol=CRLF --cvs-revnums /path/to/cvsdata

Make sure you read up about the end-of-line translation before migrating. The process could take hours to days depending on the amount of data you are migrating. You could actually narrow down the amount of data that you want to convert by specifying something like trunk only option at the command line.

Importer for SVN

This tool can import from wide variety of version conrol systems, like -- CVS, PVCS, VSS, Clearcase, MKS, StarTeam. Used this one for migrating a PVCS repository.

Setup and running this is a trivial. There is some decent documentation available on the site. Steps wise:

  • download and extract the binary archive.
  • edit specifying SVN details and the details about the source repository. The file is commented well about what each of the properties mean. (One issue I had with log.dateformat property, changed the date format)
  • run like
    run full

Again, based on the amount of data the process is going to take hours to days.  The conversion happens in three steps. First one is it reads the metadata of the source repository and generates an SVN model. Here it determines how many commits, revisions are needed and the file paths each of those commits are happening, etc.  Second step is to checkout the data for each of the revisions of the source repository (you can constrain it to use only the most recent revision if you don't need the history). The data that is checkout is written out in SVN dump format. The last step, loading the dumps into the target SVN repository. There are couple options here for the third step: the tool can provide you the dumps and you can manually load the dumps OR if you have access to the target machine from the process you may configure the tool to load it for you.

The only complain that I have with this tool is that there is no easy way to tell what percentage of the migration is complete. However, you can make a decent determination by looking at the current dump file and the process log file (svnimporter).


Subversion: LDAP Integration using Apache


It is one of those how-to posts where I had to try different methods to achieve something, and when achieved, log my experiences so that it may help somebody else. So the task at hand is that I have Subversion setup fine with Apache server, but now need LDAP support. This actually means you need two new Apache modules -- mod_ldap and mod_authnz_ldap. This post is an attempt to explain how to build those modules and to compile Apache with LDAP enabled.


All the good advice that I have received is to compile Apache from the source rather than relying on the third-party RPMs. This is an installation on Redhat Linux 5 using Apache 2.2. I haven't found good RPMs for the purpose, even the ones that I got has dependencies on other binaries of specific versions. Although it sounded like a daunting task at first, this can be achieved in a few steps once you figured out what is needed (surprise!).

Install OpenLDAP

OpenLDAP has a dependency on Berkley database. So install that first --

Berkley database

Here are the steps to install Berkley database

$ tar -xzvf db-version.tar.gz
$ cd db-version/unix-build
$ ../dist/configure
$ make
$ make install


1.Replace version with the version of the binary that you are working with.
2. Execute configure from unix-build directory as described above.

Now install OpenLDAP

Download OpenLDAP source. I'm using version 2.4.16, and the Berkley database is of version 4.7. Those reflect the steps below (no need to say, that you need to modify the paths for your environment).

# cd to openldap source directory

$ CPPFLAGS="-I/usr/local/BerkeleyDB.4.7/include"
$ export CPPFLAGS

$ LDFLAGS="-L/usr/local/lib -L/usr/local/BerkeleyDB.4.7/lib -R/usr/local/BerkeleyDB.4.7/lib"
$ export LDFLAGS

$ LD_LIBRARY_PATH="/usr/local/BerkeleyDB.4.7/lib"

$ ./configure

# build dependencies first
$ make depend

$ make
$ make install

Note: Most important point from the above is to set the environment variables CPPFLAGS, LDFLAGS, LD_LIBRARY_PATH to the appropriate Berkley DB paths.

Apache Portable Runtime Utilities (APR-Util)

Building Apache Portable Runtime (APR) is more straightforward and doesn't need any change for the LDAP stuff. Navigate to the apr directory and execute configure (with a prefix, if you have to) and then make clean, make and make install. However, APR-Util has to be built --with-ldap flag. This is one of the things that took a few iterations for me to understand.

So for APR-Util the following steps work:

# cd to apr-util

# prefix points to where you want to install apr-util and with-apr points to APR installation directory
$ ./configure --prefix=/opt/apache2/apr-util --with-apr=/opt/apache2/apr --with-ldap

$ make clean

$ make
$ make install

Build Apache with LDAP modules

Now compile Apache with LDAP modules. Here are some changes that you need for configure.

# cd to Apache source home directory

# Before running configure make sure that you clear all the environment variables that were set above.

$ ./configure --prefix=/opt/apache2  --enable-dav --enable-dav-fs \
--with-included-apr --with-ldap --enable-ldap --enable-authnz-ldap \
--with-ldap-include=/opt/openldap-2.4.16/include \
# make clean (optional)
# make
# make install


1. prefix points to where you want to install Apache. So change that to reflect your environment.
2. If you have to enable more modules feel free to do so.
3. with-ldap-include and with-ldap-lib points to OpenLDAP's include and libraries directories respectively.


CollabNet's blog post explains the changes needed to the Apache's config file (httpd.conf), and those instructions worked perfectly fine for me.

Tags: , ,

Sventon – A Nice Subversion Repository Browser

sventon-screenshot1Build vs buy (free!) decision -- I was looking for an application that searches SVN repository with out checking out files on to the local disk, wrote about what I found earlier. But recently heard about Sventon, a nice repository browser that has search capabilities, although the search is limited to file/directory name search, and logs. That's good enough for my needs but if you really need search with in the files than go for SupoSE or write your own (using something like Lucene).

Sventon can do lot more than searching, it is a good alternative to ViewSVN. In my opinion, some what better interface and searching capabilities (no search in ViewSVN). FishEye is my favorite, great interface and lot more features including some cool reporting. FishEye is reasonably priced but you can't beat free, and most importantly the features in Sventon are good enough for me to get started.

What I liked so far

  • Web application and so WAR installation is a breeze.
  • Easy configuration.
  • Support for multiple repositories.
  • Files/directories can be downloaded as compressed archives (zip).
  • Diff is good, three modes -- inline, side-by-side and unified diff.
  • If you are using Hudson for build management Sventon is supported.
  • If you are using Jira and using Subversion plugin, relatively easy configuration.

I will report back after extensive use, meanwhile if you are using Sventon feel free to leave your feedback in the comments section.

Tags: ,

Search Subversion Repository using SupoSE

The problem is simple, all that I wanted is to search a Subversion repository without checking out files onto the local machine. Just when I was thinking of build vs open-source-solution for the task stumbled on SupoSE via Google search (also suggested by Dhananjay Nene on Twitter).  Here are my observations so far:

SupoSE is a Subversion repository search engine, built using Java. Indexing and search is backed up with Lucene, and SVN interactions are being carried out using SVNKit. SVNKit, a Java library, is a pretty good alternative for anybody using command line interface accessing SVN repository.


To scan and index the repository the following command is used -- scan --create --url repository_url --index index_directory \

--username username --password password

Note: --create option is only needed if you are scanning and indexing from the scratch.

This command took a long time indexing my repository. I guess it depends on the repository size, but my feeling is there is some scope for improvement here in terms of speed. Another thing that I have observed is when I terminated half-way through the indexing process and continued indexing at a later point of time it did not pick up as smoothly as I expected. But these are minor issues and can be improved upon.

During the scanning process a specialized document handler is used based on the type of the file. A document handler indexes the parts of the document that it determines fit to index. For example, for Java files method names and comments are indexed.


Searching from the command line works something like the following -- search --index indexes_directory --query query

If the interest is only to find the authors and revisions of the files --fields option can be used, like -- search --index indexes_directory --query query \

--fields author revision

When I started working on this my primary reason to use it is to find the files by their names. For that the following command works -- search --index indexes_directory \

--query "+filename:/*"

There are quite a few other useful queries you can perform, see some examples, although a lot of these can be achieved from an IDE. Again, my goal was not to check out the files from the repository to find some of this information.

There are other features like index-merging and scan-scheduling, which I haven't tried yet. For version 0.50, SupoSE seems to do a lot of things right. I'd still like to see an API-based approach than a command line one so that the client programs can interact nicely with it.

Tags: ,