March 2011 – Promit's Ventspace

I’ve used Subversion for a long time, even CVS before that. Recently there’s a lot of momentum behind moving away from Subversion to a distributed system, like git or Mercurial. I myself wrote a series of posts on the subject, but I skipped over the reasons WHY you might want to switch away from Subversion. This post is motivated in part by Richard Fine’s post, but it’s a response to a general trend and not his entry specifically.

SVN is a long time stalwart as version control systems go, created to patch up the idiocies of CVS. It’s a mature, well understood system that has been and continues to be used in a vast variety of production projects, open and closed source, across widely divergent team sizes and workflows. Nevermind the hyperbole, SVN is good by practically any real world measure. And like any real world production system, it has a lot of flaws in nearly every respect. A perfect product is a product no one uses, after all. It’s important to understand what the flaws are, and in particular I want to discuss them without advocating for any alternative. I don’t want to compare to git or explain why it fixes the problems, because that has the effect of lensing the actual problems and additionally the problem of implying that distributed version control is the solution. It can be a solution, but the problems reach a bit beyond that.

Committing vs publishing
Fundamentally, a commit creates a revision, and a revision is something we want as part of the permanent record of a file. However, a lot of those revisions are not really meant for public consumption. When I’m working on something complex, there are a lot of points where I want to freeze frame without actually telling the world about my work. Subversion understands this perfectly well, and the mechanism for doing so is branches. The caveat is that this always requires server round-trips, which is okay as long as you’re in a high availability environment with a fast server. This is fine as long as you’re in the office, but it fails the moment you’re traveling or your connection to the server fails for whatever reason. Subversion cannot queue up revisions locally. It has exactly two reference points: the revision you started with and the working copy.

In general though, we are working on high availability environments and making a round trip to the server is not a big deal. Private branches are supposed to be the solution to this problem of work-in-progress revisions. Do everything you need, with as many revisions as you want, and then merge to trunk. Simple as that! If only merges actually worked.

SVN merges are broken
Yes, they’re broken. Everybody knows merges are broken in Subversion and that they work great in distributed systems. What tends to happen is people gloss over why they’re broken. There are essentially two problems in merges: the actual merge process, and the metadata about the merge. Neither works in SVN. The fatal mistake in the merge process is one I didn’t fully understand until reading HgInit (several times). Subversion’s world revolves around revisions, which are snapshots of the whole project. Merges basically take diffs from the common root and smash the results together. But the merged files didn’t magically drop from the sky — we made a whole series of changes to get them where they are. There’s a lot of contextual information in those changes which SVN has completely and utterly forgotten. Not only that, but the new revision it spits out necessarily has to jam a potentially complicated history into a property field, and naturally it doesn’t work.

For added impact, this context problem shows up without branches if two people happen to make more than trivial unrelated changes to the same trunk file. So not only does the branch approach not work, you get hit by the same bug even if you eschew it entirely! And invariably the reason this shows up is because you don’t want to make small changes to trunk. Damned if you do, damned if you don’t.

Newer version control systems are typically designed around changes rather than revisions. (Critically, this has nothing at all to do with decentralization.) By defining a particular ‘version’ of a file as a directed graph of changes resulting in a particular result, there’s a ton of context about where things came from and how they got there. Unfortunately the complex history tends to make assignment of revision numbers complicated (and in fact impossible in distributed systems), so you are no longer able to point people to r3359 for their bug fix. Instead it’s a graph node, probably assigned some arcane unique identifier like a GUID or hash.

File system headaches
.svn. This stupid little folder is the cause of so many headaches. Essentially it contains all of the metadata from the repository about whatever you synced, including the undamaged versions of files. But if you forget to copy it (because it’s hidden), Subversion suddenly forgets all about what you were doing. You just lost its tracking information, after all. Now you get to do a clean update and a hand merge. Overwrite it by accident, and now Subversion will get confused. And here’s the one that gets me every time with externals like boost — copy folders from a different repository, and all of a sudden Subversion sees folders from something else entirely and will refuse to touch them at all until you go through and nuke the folders by hand. Nope, you were supposed to SVN export it, nevermind that the offending files are marked hidden.

And of course because there’s no understanding of the native file system, move/copy/delete operations are all deeply confusing to Subversion unless it’s the one who handles those changes. If you’re working with an IDE that isn’t integrated into source control, you have another headache coming because IDEs are usually built for rearranging files. (In fact I think this is probably the only good reason to install IDE integration.)

It’s not clear to me if there’s any productive way to handle this particular issue, especially cross platform. I can imagine a particular set of rules — copying or moving files within a working copy does the same to the version control, moving them out is equivalent to delete. (What happens if they come back?) This tends to suggest integration at the filesystem layer, and our best bet for that is probably a FUSE implementation for the client. FUSE isn’t available on Windows, though apparently a similar tool called Dokan is. Its maturity level is unclear.

Changelists are missing
Okay, this one is straight out of Perforce. There’s a client side and a server side to this, and I actually have the client side via my favorite client SmartSVN. The idea on the client is that you group changed files together into changelists, and send them off all at once. It’s basically a queued commit you can use to stage. Perforce adds a server side, where pending changelists actually exist on the server, you can see what people are working on (and a description of what they’re doing!), and so forth. Subversion has no idea of anything except when files are different from their copies living in the .svn shadow directory, and that’s only on the client. If you have a couple different live streams of work, separating them out is a bit of a hassle. Branches are no solution at all, since it isn’t always clear upfront what goes in which branch. Changelists are much more flexible.

Locking is useless
The point of a lock in version control systems is to signal that it’s not safe to change a file. The most common use is for binary files that can’t be merged, but there are other useful situations too. Here’s the catch: Subversion checks locks when you attempt to commit. That’s how it has to work. In other words, by the time you find out there’s a lock on a file, you’ve already gone and started working on it, unless you obsessively check repository status for files. There’s also no way to know if you’re putting a lock on a file somebody has pending edits to.

The long and short of it is if you’re going to use a server, really use it. Perforce does. There’s no need to have the drawbacks of both centralized and distributed systems at once.

I think that’s everything that bothers me about Subversion. What about you?

Month: March 2011

How to Block Ads in IE9

Understanding Subversion’s Problems