Understanding software and its development

Technical health - why it matters

Software codebases start small, but can blossom into incredibly complex systems that can be difficult for developers to understand and change confidently. This can result in a lack of productivity, agility, and inability to react to market demands. Code can also be unreliable or brittle, leading to defects, accidents, and disasters.

Book by Silverthread co-Founder & Harvard Professor Carliss Baldwin

Much like a human body, codebases can be healthy or they can be sick. Their ‘technical health’ is a key driver of long term effectiveness of the development organization and success of the product. Keeping a software codebase healthy from the outset will allow it to grow, evolve, and be worked on by many developers in concert even as the codebase and the development organization scale. An unhealthy codebase will ultimately fail to thrive, as developers become hamstrung and customers are left unhappy and waiting.

 

Our team’s research at MIT and Harvard Business School has established a strong link between technical health and ‘software economics’. Over 20 years, we measured different aspects of technical health from enterprise, government, DoD, and open source software. We correlated technical health with development productivity, defects, staff turnover, revenue generation, and other things of business value. Based on this experience, we are able to talk about what makes code healthy and what makes it sick based on quantitative evidence.

Things to know about code and the people who make it

  • Large software systems are the most complex and complicated engineered artifacts in existence.

  • A useful metaphor is that software development is akin to community poetry writing at massive scale.

  • Software is complex because a large codebase can contain billions of parts, with each part being completely different, and each part having the potential to freely interact with all the others.

The Ford F150 has 150M lines of code
  • Software is complicated because there is no way for a single human to grasp all of what is going on inside a large codebase. Developing code is an intensely analytical activity that taxes the mind and butts up against the limits of human cognitive capacity. To illustrate, a Ford F-150 contains 150 million lines of code (LOC). If all that code is printed on notebook paper, the stack of paper produced will rise 2.25 miles in the air. Each developer might only have mastery of 10 feet in that stack of paper

 

  • In 2015, Google’s 25,000 engineers committed 15 million LOC - equivalent to Linux - every day.  (Source) The majority of LOC produced on a software project may be changed at points, but are not deleted. Software systems rarely die. We build on legacy code. Hence today’s designers inherit past design decisions.

  • Stuart Feldman, of the IBM Institute for Advanced Commerce (2001) says that “Writing code is like writing poetry: every word, each placement counts. Except that software is harder, because digital poems can have millions of lines which are all somehow interconnected… So far, nobody has found a silver bullet to kill the beast of complexity.”

  • A software codebase is a collection of source code files stored in a directory or folder structure. These files are edited by developers working as a team. Developers use an editor or an Integrated Development Environment (IDE) to navigate this folder structure and edit code. This lens into the code can give developers a myopic view. They see the ‘trees’ they are working on, and have more difficulty seeing the forest.

  • Files can be written in different languages, generally indicated by the file extension. For example, Java files will end in ‘.java’ and JavaScript files will end in ‘.js’.

  • Each of these files contain different types of ‘entities.' Each entity in code can be thought of as a software ‘part’… in the same way you would think of a carburetor as a car part. A developer will use a file editor to create or modify ‘entities’ inside source code files. Functions, classes, data structures, are different kinds of entities.

  • These different entities have relationships with each other. They call each other, pass data to each other, and modify each others' behavior. When a program runs, there is a flow of control that is defined by these relationships. Different functions execute as they pass control back and forth. There is also a flow of data. As programs run, data is moved and transformed.

  • Developers tends to work and gain mastery of the parts of a complex codebase they have responsibility for, each having a view that is relatively - and necessarily - myopic. Developers must work together to get a large system to do its job and behave well even though no one really has a great idea how the whole thing works. This can be incredibly challenging. It’s sometimes impossible for developers to anticipate the bugs or flaws they are introducing. It’s sometimes impossible to get an existing system to do new things without major upheaval.

  • The problem lies in both humans and in the code itself. Humans are limited in their ability to conceptualize what is going on when making changes. The geometric properties of the code itself (the structural layout of entities and their interactions) also may also make it brittle or difficult to change. When these issues collide, major problems ensue.

  • Developers and software architects try to overcome the challenges posed by their individually myopic views by thinking architecturally. Sometimes they do this by creating architecture diagrams showing the top-down structure that is often hidden to them. They try to partition a system into distinct parts (called modules, subsystems, or components) and understand the relationships between them. However, if large ‘geometric’ structures inside a codebase are ‘monolithic’ (i.e. can’t be broken down geometrically or conceptually) and they contain more code than can fit in a developer’s head at once, then things can go downhill quickly.

  • In our research, we call these nasty geometric structures ‘cores.’ When being less formal we call them spaghetti code, fur-balls, and other epithets. We have also called them ‘tumors’, an analogy that actually makes sense both practically and theoretically when thinking about code as a complex and living (even if not organic) system.

  • Our research has shown that architectural degradation (and other types of technical health problems) significantly impact developer productivity, quality, agility, and organizational effectiveness in almost every imaginable way.

Why does technical health drive organizational effectiveness?

  • Understandability: The humans enhancing it can understand it, make changes confidently, and not break things. Even if the overall system is enormously complex, healthy code can be conceptually broken down so one thing can be focused on at a time.

  • Simplicity: The ‘geometry’ of the code itself makes it easy to evolve or adapt, in the same way an organism might. The code geometry limits the amount of upheaval required to make a change, the ripple effects of that change. The codebase - thought of as an organism - is capable of adapting to take on new valuable functionality in response to evolutionary pressures in the business marketplace or strategic environment.

  • Organizational effectiveness: As a result, the developers remain productive and happy. They spend the majority of their time on ‘offense’ (delivering new capabilities) instead of on ‘defense’ (diagnosing bugs, or bogged down in unending refactoring). The developers are capable of communicating and coordinating effectively across boundaries - both team boundaries and code/conceptual boundaries (which often mirror each other to an extent). Overall, technical health gives the development organization a chance to be productive, avoid constant fire-fighting due to nasty surprises, have high morale, and high retention.

  • Other factors: leaders, teams, and processes: Of course, organizational effectiveness is not solely driven by code health. It is also driven by leadership quality, team quality, and effective processes and tools. We have observed poor organizational effectiveness in teams with good technical health for these other reasons. However, technical health is necessary, even if not sufficient. In 20 years of academic research and commercial activity, we have never encountered a team that was operating effectively while maintaining a codebase with significant health issues. In reality, there is a dynamic interplay between team issues, process issues, and technical health, as all evolve over time. In a challenged environment, you are likely to encounter multiple issues. However, technical health - especially once a codebase gets larger or has been around for longer, is often a dominating force. The best Agile processes don’t do anything to guarantee that healthy code emerges, but bad code health guarantees that processes will be ineffective. Similarly, if you drop a great team into an unhealthy legacy codebase it will be hard for the team to improve code health, and it will be easier for the code to break the team if the situation is not properly managed.