The problem is simple, all that I wanted is to search a Subversion repository without checking out files onto the local machine. Just when I was thinking of build vs open-source-solution for the task stumbled on SupoSE via Google search (also suggested by Dhananjay Nene on Twitter). Here are my observations so far:
SupoSE is a Subversion repository search engine, built using Java. Indexing and search is backed up with Lucene, and SVN interactions are being carried out using SVNKit. SVNKit, a Java library, is a pretty good alternative for anybody using command line interface accessing SVN repository.
To scan and index the repository the following command is used --
supose.sh scan --create --url repository_url --index index_directory \ --username username --password password
Note: --create option is only needed if you are scanning and indexing from the scratch.
This command took a long time indexing my repository. I guess it depends on the repository size, but my feeling is there is some scope for improvement here in terms of speed. Another thing that I have observed is when I terminated half-way through the indexing process and continued indexing at a later point of time it did not pick up as smoothly as I expected. But these are minor issues and can be improved upon.
During the scanning process a specialized document handler is used based on the type of the file. A document handler indexes the parts of the document that it determines fit to index. For example, for Java files method names and comments are indexed.
Searching from the command line works something like the following --
supose.sh search --index indexes_directory --query query
If the interest is only to find the authors and revisions of the files --fields option can be used, like --
supose.sh search --index indexes_directory --query query \ --fields author revision
When I started working on this my primary reason to use it is to find the files by their names. For that the following command works --
supose.sh search --index indexes_directory \ --query "+filename:/*JavaClassName.java"
There are quite a few other useful queries you can perform, see some examples, although a lot of these can be achieved from an IDE. Again, my goal was not to check out the files from the repository to find some of this information.
There are other features like index-merging and scan-scheduling, which I haven't tried yet. For version 0.50, SupoSE seems to do a lot of things right. I'd still like to see an API-based approach than a command line one so that the client programs can interact nicely with it.