Software Development – Doug Wheeler

Programmer, Developer, Engineer – What’s the Difference?

Recently, my manager mentioned he was trying to decide out how to properly differentiate between software engineers and programmers. We didn’t have much time to discuss it, but another nearby manager suggested that engineers designed software that could then be implemented by programmers. The implication being that the programmer was just following directions, whereas the engineer was actually doing the “thinking” behind the design. This view is very common among larger, “old-school” companies with a traditional waterfall process history, where product development is divided into discrete steps: plan, design, implement, test, release, maintenance.

After 35+ years of professional programming, I knew this wasn’t an accurate description about how real software development works, but I hadn’t really sat down and thought about it enough to figure out why. It was usually just a question of what title to put on someone’s business card and didn’t have any real-world impact on anyone’s actual work. This recent conversation, however, caught my attention and led me to think about this issue in more detail.

I’m going to jump straight to the end and state up front that I have concluded that there really isn’t any distinction between a programmer, software developer, and software engineer. Now, let me explain my rationale.

Several years ago, I came across an article written by Jack W. Reeves called What Is Software Design? This article was a revelation for me. The basic premise of the article is that software is design. This simple concept resolves so many issues with the way many companies have treated the software development process. There is more information in the article then can be explained here, but the basic idea is that programming, testing, and debugging are all integral parts of the design process. Interestingly, the agile practice of Scrum include aspects of this without really discussing it in the same terms. In its theoretically ideal implementation, Scrum teams consist of several skilled generalists, who are each capable of design, implementation, testing, debugging, etc. These are all part of the same process.

Anyone who’s actually tried to produce a complete written detailed design suitable to be given to an entry-level “programmer” to implement without them having to really understand the design knows this doesn’t work. It has never worked, but companies keep thinking they can do it. To produce a design that can be implemented in a “paint-by-numbers” fashion would require that almost every line of code be described in advance, at which point you have essentially written the software. This is analogous to asking Stephen King to write detailed enough instructions for someone else to write one of his books. As soon as you start abstracting the ideas even a little, the book would become the writer’s book, not a Stephen King book. The same thing happens with software—at any practical level of up-front design, the resulting implementation is actually designed by the individual programmers and the resulting source code is effectively the final detailed design.

It is still important to do high-level architecture and design up front, but the software itself is the ultimate detailed design. I equate software source code to blueprints or schematics. The actual manufacturing process is performed by the compiler and linker, following the detailed design in the source code.

This leads to the obvious conclusion that anyone involved in the writing of software, whether they are called programmers, developers, or engineers, are actually software designers. For those who insist that there be some distinction, it would be reasonable to consider these titles as different points along a continuum of software development. A “programmer” is typically a junior developer, who focuses on designing smaller, self-contained portions of the software. An “engineer” would be at the other extreme, and is responsible for larger component or product-wide design. Additionally, there many be a separate “architect” who focuses on system-level design and the interactions between major components. Each of these individuals is designing parts of the product, but the scope of their design is limited based on their experience and knowledge.

My First Published Program

I just wrote up a brief article about the first program I ever had published, The Catacombs of Your Mind. Check it out here. Over time, I’ll add some more of my published or well-known projects on my new Projects page. I also have some current projects that I’ll start documenting as I find the time/inclination.

Getting Git

Moving from a traditional source control system like Perforce or Subversion to Git seems like it shouldn’t be that hard. In fact, Git is supposed to be very easy to use, so why is it that when I first started using Git, I kept getting tripped up? I had read Joel Spolsky’s Mercurial tutorial (not exactly like Git, but close enough) and thought I understood it. Joel also made it very clear that there was an unlearning process that had to take place, but I didn’t realize how much I needed to unlearn.

Git tries to use familiar concepts in an effort to make the transition easier, but I think it really does more harm than good. You end up with a situation where familiar terms are used to describe things that are similar only at a very superficial level. It doesn’t take long for the differences to trip you up.

I’m not going to focus on the differences between centralized vs. distributed version control systems (that is covered in other places and is fairly straightforward). I’m going to concentrate on which “old” concepts you need to unlearn, specifically branch, changeset, head, and checkout. Whatever you think these terms mean is very likely wrong in the world of Git. Read on and be enlightened.

COMMITs

The fundamental unit of data in Git is the commit. In most systems, when you commit changes, those changes are packaged into a changeset or something similar. Whenever you commit changes to Git, Git effectively creates a snapshot of your entire workspace in its current state. To save space, these snapshots only contain the delta between a “parent” commit and the new commit. Each commit contains a link to its parent commit. In the case of a merge, the commit contains a link to each of the two parent commits used to create this one. Other than the very first commit, each commit has one or two parents.

So, a Git repository is essentially just a collection of commits connected by backward links to their parent commits. Here’s an example with 12 commits and the backward links from each to its parent:

Just to orient yourself, the oldest commits are at the bottom and the newest are at the top (some documentation/software shows this inverted, or horizontally).

Perhaps you’ll be surprised to learn that from Git’s perspective, this diagram does not show any branches. Your first reaction may be “of course it does!” In fact, this figure shows three commit history paths, but in Git parlance, these are not called branches.

Branches

So, what is a Git branch? A branch is simply a named pointer to a commit. Let’s add a few to our example:

We now have two branches, “main” and “feature”. The hardest part to understand is that these branches are just pointers to a single commit. They don’t refer to the entire line of changes. For example, although there are two paths leading to “main”, there is no record of which of those was the main branch. Branches can be created, destroyed, and moved without affecting any of the commits. You can also have several branches pointing to the same commit. These combine to allow for easy changes to the branch names, the ability to “retroactively” create a new branch, and more.

You can also see in this diagram that there is not a branch corresponding to commit 8. Once you merge two branches, you can safely delete the extra branch without affecting the repository (other than the branch “label” disappearing). You can even recreate the branch at the same point in the future if you want. The only thing to be aware of, is that unreferenced commits (those without a branch label or a child commit) will eventually be deleted (Git has a built-in garbage collector). So, if we move the “feature” branch to commit 11, then commit 12 would eventually be garbage collected. Normally Git waits some time before deleting these orphaned commits to allow for situations where the user is simply moving branches around and has only temporarily orphaned a commit.

Let’s take a look at a very common situation. Let’s say you are developing a new feature but forgot to create a new branch for your work and accidentally commit your changes to the main branch. After your commit, the repository looks like this:

where your new changes were committed as commit 13. This can easily be fixed by creating a new branch at commit 13, then moving the “main” branch back to commit 10.

Head

The head is a pointer to your “current” branch. So, the head is a pointer to a branch, and the branch is a pointer to a commit:

The head is used primarily when you commit changes. When you commit changes, Git follows the head pointer to the branch, then follows that to a commit. It then creates a new commit as a child of that one and moves the branch to the new commit:

Since the head points to the branch, it effectively moves with it. If you want to change the head explicitly, you can do that by doing a checkout.

Checkout

Performing a checkout on a branch does two things: it moves the head pointer to that branch and updates your local workspace files to match the corresponding commit snapshot. If you wanted to switch to the “feature” branch, you can do a checkout of that branch to move the head pointer there and update your local workspace files to match the associated commit snapshot:

Summary

The key points are:

Each commit creates a snapshot that effectively captures the state of your local workspace at the time of the commit.
A branch is simply a pointer to a commit.
The head is a pointer to the current branch.
Checkout moves the head to a new branch and updates your workspaces to match the corresponding commit snapshot.

It is also important to be aware that some Git tools introduce some complications into this. For example, the process of moving a branch pointer (using a reset command) may cause a checkout at the same time. There are usually command options to control these interactions. Another concept that you’ll have to deal with (which isn’t covered here) is how uncommitted changes to local files are handled during these operations.

The best part of Git is that you can try out commands on your local repository before pushing your changes to the server. Even if you completely mess up your local repository, you can simply throw it out and get a fresh copy from the server.