File versioning should be easy, but it quickly gets complicated. Let me tell you about the easy scenario, and then I’ll add in the complications. The easy scenario is a file (actually it could be any discrete artifact, but we can talk about files for now) that you create one day, then change the next day, then change the next day, and so on. Not too bad, we’re talking version 1 on the first day, version 2 on the next day, and version 3 on the third day. On day 3, you decide the file is ready to show to your boss, so you want to label it with a special label like “ready to show to my boss”. This is good – we have a simple versioning scheme which is a monotonically increasing number, plus a labelling method. Your boss says, “Love the file, but I need a few changes”. So the next day you change the file to create version 4, and you label it again as “after boss’ feedback”, which, since you know all about punctuation requires the apostrophe after the s in boss.
The next day, your boss having enthusiastically exclaimed the merits of your file to his friend in another division, you get a visit from his friend. He asks for a copy of the file that you gave to him on day 3, the one labelled “ready to show to my boss”, but he wants a few of his own changes. Not the ones your boss asked for, of course – that would be easy! No, he wants a different set of changes. So you “branch” the file, which just means you make an identical copy of version 3, and you edit it to create a new version for your boss’ friend. This newly edited file can’t be version 4, because you already have a version 4. It’s also a good idea to somehow indicate that it was branched off version 3, so 3.xxx. So let’s call the branched file 3.1, and after the first edit you can call it 3.2.
Now let’s repeat the exact same scenario: Your boss, who really loves you very much and has more than one friend, has convinced yet another colleague that your file is worth much more than the sum of its bytes. So friend #2 makes the same request. This time when you branch version 3, you can’t call it version 3.1 because there already is a version 3.1. So let’s reconsider our previous decision to name the first branch 3.1. What we really want to call it is the starting point (version 1) of the first branch off version 3. Let’s call that 3.1.1. And the other one should be called the starting point of the second branch off version 3 – let’s call it 3.2.1. This makes more sense now:
The version of the file your boss has is 4. His first friend got 3.1.2 (the second revision of the first branch off version 3) and his second friend got 3.2.2 (the second revision of the second branch off version 3).
It can quickly get quite busy, if for instance you branch of 3.2.1 like this:
Here we’ve created a new branch 3.2.1.1.1, which is the first revision of the first branch off 3.2.1.
Any revision in any sequence can be labelled, but in this view of the world each label must be unique.
What do I mean by “this view of the world”? Well there’s another way to see these revisions. The above view tells you about all the branches of this file from inception. But let’s say I am looking at revision 4. In many situations I don’t really care about version 3.2.1.1.1. All I really care about is that there were 3 prior revisions, a couple of labels, and oh by the way a couple of branches off version 3 which if I want to investigate them further I am free to do that.

Most of the time, this is all I care about. If I am my boss’ first friend, my view on this file is like this:
What happened to the 3.1.1 business? Well, it is mostly irrelevant. My view of the world is that I have a slightly modified version of a file that was based on some original file, and if I really cared about the history of evolution of the original I could go find out. So I have versions 1, 2, and so on.
Most legacy source-code control systems support only the archive-centric view of the evolution history of the file. The term archive refers to that unique item which started off as version 1 and evolved and branched over time, but which regardless of version could always be identified as somehow that specific entity in the repository. Some people still care greatly about that view of the world. But people whose job it is to simply contribute to the content of the file usually don’t care about the intricacies of its evolution. They want the simplified, stream-centric view that is starting to become more common in the world of version control systems.
Unfortunately the stream-centric view is a little too easy – I’ll beef it up a bit in the next post to address the needs of all stakeholders.
Coming up soon: Merging, roles, version identifiers, and stream identifiers.



[...] Basic versioning [...]
[...] information may be found in articles that are related to each other through ancestry. Each time you branch an article you create a new possible search engine hit. They are definitely not the same, and the search [...]
Cool post. Thanks for it.