Measuring technical health
Properties of a healthy codebase
A large codebase is a complex system that can be healthy or unhealthy for many reasons in the same way a human body can be. Regular checkups and a focus on fitness and prevention are the best way to go in both cases. That being said, it’s (almost) never too late to get back on track by working with doctors and going to the gym.
Code health can be viewed along many dimensions including
Architecture / Design Quality | Code Quality | Reuse & Commonality | Test Quality |
---|---|---|---|
Health of skeleton and ligaments of a codebase | Health of each of the microscopic cells in a codebase | Fitness of a codebase - is it lean or is it obese? | Immune system of a codebase |
The structural layout of the “body” from a holistic top-down perspective, focused on the interaction between parts | The health of individual components in the “body” from a reductionist bottom-up perspective, focused on each part in isolation | The health of the “body” from the perspective that the amount of code required should be minimized | The health of the body from the perspective that defects should be prevented from being introduced, festering and growing |
|
|
|
|
Applies to system as a whole at multiple levels | Applies to individual lines of code, functions, files, etc. | Applies at system level and traceable to individual parts | Applies at system level and to individual files, functions, etc. |
Measured using network / graph theory | Measured using code quality checkers | Measured using code duplication checkers | Measured using test coverage tools |
Architecture / Design Quality
Architecture Quality Principles
Design Quality can be significantly improved by adhering to certain well-understood principles
Design quality impact
Poor Design Quality leads to difficulty understanding the system, lost productivity, and increased bugs and defects
Fixing Design Quality problems is relatively difficult – “malignant” problems propagate through dependencies across the system
Thinking about a codebase as a network
Seeing systems through as a network lens can be useful. For example, the DC Metro Subway map can be used in navigation. It shows high level information about stops (nodes) and routes (arcs), while throwing out detail not needed to understand its “skeleton”.
Networks are established tools for representing and analyzing code because they are a natural means of capturing hierarchical relationships, modularity, coupling, cohesion, and cyclicality, and other important patterns. When networks are used for this purpose, network nodes designate the “parts” at possibly multiple levels - functions, files, components. Arcs or lines between nodes designate relationships between those parts - such as function calls, data flow, or other use. If these arcs are directed (as they often are are in software, but not in the subway map) they can represent one-way dependencies between parts. Directed networks are useful because they represent unidirectional flows and dependencies.
Imagine you have a simple codebase with only 2 files, with code in “rectangle_functions” calling code with a network of parts and interactions with “math_functions” :
File name: rectangle_functions | procedure area = rectangle_area(length, height)
area = multiply(length, height)
end
procedure perimeter = rectangle_perimeter(length, height)
perimeter = add(multiply(length, 2), multiply(height, 2))
|
---|---|
File name: math_functions | procedure sum = add(num1, num2)
sum = num1 + num2
end
procedure multiple = multiply(num1, num2)
multiple = num1 * num2
end
|
If you draw a network of the codebase above, it would be simple, but real software systems can have millions of entities and billions of interconnections.
Dependency structure from example | Dependency structure of Linux |
---|---|
|
Direct and indirect dependencies
A codebase is a collection of individual entities -
These entities are connected by relationships, which may be a call, subclass, data typing, etc. If these dependencies between files or entities are traced out, you can see that some dependencies are direct, while others are indirect.
Dependency count metrics - FI, FO, VFI, VFO
If a file is used a lot, we can think of it as a shared utility. If a file calls out a lot, we might think of it as a control element that directs the actions of many other files.
This way of thinking allows us to introduce some architecture metrics for each file:
Fan In (FI) | How many other nodes depend upon it directly? Computed by counting the number of arrows pointing into that node |
Fan Out (FO) | How many other nodes does it depend upon directly? Computed by counting the number of arrows pointing out from that node |
Visibility Fan In (VFI) | How many other nodes depend upon it directly or indirectly? |
Visibility Fan Out (VFO): | How many other nodes does it depend upon directly or indirectly? |
For example:
File L is a control element.
| File F is in the middle
| File A is a utility
|
|
|
|
God Files / God Classes
A file with a very high FO score directly depends on many other things in the codebase. This can sometimes (but not always) indicate a problem. In object oriented languages, these files are said to contain ‘god classes’. Our statistical analysis has shown that god files tend to have elevated defect rates. This may have to do with the quality of the file’s contents. However, it is also likely do to the fact that if a file depends on many other files, then changes or problems in those other files impact the god file because of the dependency.
Modularity
As a general principle, codebases should be modular. A modular system
Is composed distinct modules, subsystems or components (whatever you want to call them)
Modules contain highly cohesive elements with strong interconnections inside the module
Modules are loosely coupled to other modules with weaker interconnections between them
Interconnections between modules should be routed through simple interfaces or APIs that hide the complexity inside
A module should be small and simple enough that a human being is capable of understanding and modifying it
Attributes of modularity | Benefits of modularity | Managing modules |
---|---|---|
|
|
|
If you scan a codebase without any knowledge of its architecture and find the interlinkages, sometimes the natural boundaries of modules suggest themself. If a code scan reveals a structure such as this, then you may conclude that four natural modules might be present:
Hierarchies
As a general principle, a software system should be hierarchical. The example above is a hierarchy. From a graph theory perspective, hierarchies can come in several flavors, including trees and layers. The common feature of a hierarchy is that dependencies flow in one direction from top, through middle, to bottom, without upward facing links.
Here are some other pictures of hierarchies:
Tree Hierarchy | Layered Hierarchy |
---|---|
|
|
Hierarchy incorporates linear (non-circular) dependencies between modules, significantly reducing perceived complexity
Attributes of hierarchy | Benefits of hierarchy | Managing hierarchy |
---|---|---|
|
|
|
Reuse & Utilities
Reuse is the gift that keeps on giving. Reuse refers to shared utilities that are widely used by other modules, but do not rely upon other modules themselves.
A file with a very high FI or VFI score (provided it has a low VFO score) is a likely a utility. It contains generic abstract functionality that is is heavily reused. Utilities are likely to be well tested, battle hardened, and have served the test of time. Our statistical analysis has shown that utility files have low defect rates.
Attributes of reuse | Benefits of reuse | Managing reuse |
---|---|---|
|
|
|
Layers, Platforms, & Plug-in architectures
Some systems are layered, but not all are or need to be. Layers combine modularity and hierarchy to create a well-structured, easy-to-understand, and maintainable codebase.
Attributes of layers | Benefits of layers | Managing layers |
---|---|---|
|
|
|
In some cases, a system will contain a generic engine that is then used for common or generic tasks, while individual plugins that depend on it will specialize its behavior for specific use. One example might be a video game engine, with each game that depends on it being considered a plug in. Another might be tax and accounting software that has common code in an engine, but specialized code to help you do your taxes for your individual state in one of 50 plug ins.
The breakdown of architecture health: Cores and cyclic groups
When a developer violates Design Quality principles, large cycles called cores emerge and radically increase complexity.
By its nature, healthy code ‘wants’ to be a hierarchy of modules. That being said, most systems are not - in the same way that humans want to be healthy, but most don’t go to the gym every day.
When we scan code, we often find ‘cycles’ at different levels. For example, we may look at the contents of each file and discover that calls between some of them form a ring. (A contrived example to be sure.)
The codebase above contains a ‘core’, otherwise known as a ‘cyclic group’. A core is a collection of entities - in this case files - in which every entity is ‘reachable’ from every other entity in a circular fashion. By following arrows, we can visually trace how A calls B, B calls C, and via some path, C calls back to the original file A. File cores can be detected by looking at only the source code, without any formal architecture description
Cores represent a breakdown of hierarchy, because hierarchies must flow in one direction by definition. They also represent a breakdown of modularity if they become too big (more than a handful of files) because they may encapsulate code that should be distinct, but is instead coupled in a manner that is non-obvious and very difficult to manage.
Cores or cyclic groups can exist at any level of abstraction. For example, we may think of modules as collections of files, and then look at the dependencies that cross module boundaries to find component cores.
Attributes of cores | Detriments of cores | Managing cores |
---|---|---|
|
|
|
Architecture Integrity
In some codebases, the modular structure is explicitly defined. In this picture of the open Source Axis2 codebase each module is shown as a box and dependencies between modules are shown as arrows.
Code Quality
Code quality principles
Code quality is a concept that helps you think about health of each ‘tree’ in the ‘forest.’
Code quality measures apply to individual entities (procedures, classes, methods, files, data structures, specific lines, etc.)
Large codebases could contain millions of entities
Two procedures with same function can have widely varying code quality
To illustrate look at these two procedures that do the same thing. The first is the classic ‘hello world’ program - the first program written by every student in Computer Science 101. The second is also ‘hello world’, but the implementation is from a submission to the ‘Obfuscated Coding Competition’.
Procedure 1 - Great! |
---|
print(“Hello World!”) |
Procedure 2 - Not good - Does the same thing! |
It is obvious from this example that identical functionality can be provided by good or bad code. One hopes that a codebase is filled with code that looks like Procedure 1 and few that look like Procedure 2. To some extent, every codebase has some bad code, however.
Note also that two pieces of code with very different ‘form’ can perform identical ‘function.' Procedure 1 contains only essential complexity, while Procedure 2 contains lots of non-essential complexity. For that reason, be somewhat skeptical when a developer says their code must be overly complex because they wrote it to solve a hard problem.
Managing Code Quality
Poor code quality leads to difficulty understanding, waste, and defects
Contractors may be delivering on functionality, but at the cost of complexity
Fixing code quality problems is relatively simple – “benign” problems can be locally addressed within the entity
Code quality metrics
McCabe Cyclomatic Complexity
McCabe assigns a number to a “structured program” or block of executable code based on a static analysis of the number linearly independent execution paths that can be followed as a program executes. In modern programming languages, McCabe scores typically apply to procedures (called functions in C) or class methods. Alternative paths through a procedure result from conditional branching statements (if statement, switch/case statement, while loops, etc.). The following is a four-step recipe for computing the original version of McCabe’s metric:
Increment one for every IF, CASE or other alternate execution construct
Increment one for every Iterative DO, DO-WHILE or other repetitive construct
Add two less than the number of logical alternatives in a CASE
Add one for each logical operator (AND, OR) in an IF
McCabe asserted that his number could be used to estimate the effort required in test coverage. He also suggested that cyclomatic complexity for procedures or methods should be kept below the value 10 so that they remain understandable and testable. A classification scheme has been devised to bin procedures into four general types based on their McCabe scores.
Definition | Example calculation | What is good? |
---|---|---|
The McCabe Cyclomatic Complexity is the number of linearly independent execution paths through a program | McCabe Complexity can be calculated as M = E - N + 2, where:
The system to the above has a McCabe score of 10 - 8 + 2 = 4 | According to NIST:
|
McCabe’s metric has been positively related to defect density and the productivity of developers doing maintenance on previously shipped code. Many firms now use McCabe’s scores as a means of identifying problematic code.
A common variant (the one used in CodeMRI) excludes switch/case statements from consideration in the McCabe score. This is often referred to as “Modified McCabe cyclomatic complexity.” McCabe Cyclomatic Complexity is commonly used as a Code Quality metric for executable blocks of code. In modern languages, it gives complexity scores to functions or methods.
Code Comments
Source code should contain comments describing what the code does, and the reasoning behind decisions that might be tricky to understand. Appropriate commenting and documentation are important (along with good use of naming conventions) to teach developers what it does. This is critical because ‘design intent’ does not flow easily. If engineers are given system requirements, it is often easy to use them to redesign or understand a system. However, given only a system, it is almost impossible to use code inspection to reverse engineer its requirements. It is very important to tell future developers what something is supposed to do and why. Otherwise, this information will likely be lost to time.
Commonality
A codebase is beneficial because it provides capabilities, not because it is big. In fact, if two codebases deliver the same capabilities but one is significantly smaller, the smaller one will be more valuable. This is for a very simple reason:
Value -> Benefits - Costs
Value -> Capabilities - Cost of development & maintenance
Value -> Capabilities - Volume of code that must be developed and maintained
For this reason, you want to have a codebase that follows the Don’t Repeat Yourself (DRY) principle
Code Duplication
Code duplication checks can compare blocks of code in your codebase against other blocks in the same file, or against blocks of code in other files. They can identify places where a developer copied code from one place to another. Good duplication checkers can find duplication that is similar but not necessarily identical. This is important because copies may be slightly modified when initially copied or they may drift apart.
For a system, we can find all the similarities and then compute the percent of duplicative code in the system. We can also compute the amount of code that would remain if duplications were eliminated. Duplication metrics can be given for individual files.
Exact information about the location of each duplication, the differences, and the amount of drift can be used by developers when planning efforts to eliminate it.
Test Quality
Tests are the immune system of a codebase. They should ideally be written before or at the same time as the code being developed. If you have poor testing, the best time to start to improve is now. Our statistical studies have shown a critical relationship between good tests and quality, efficiency, and effectiveness.
Healthy testing | Unhealthy testing |
---|---|
|
|
The goal of unit tests is to ensure that parts work individually. The goal of system tests is to exercise the combined behavior of several parts or a fully integrated system.
Test coverage metrics
Test coverage can be measured by running your suite of automated tests and identifying which code is executed by tests and which is not. The simplest test coverage metric is simply a count of the lines of code tested (at least once) vs not. Other more complicated metrics make sense as well. You might care about whether specific branches are followed, specific types of data are passed in, etc. All test coverage metrics are ultimately a ratio of code/conditions exercised vs not.
Prioritizing test development
Most important when:
When code is changing rapidly or you anticipate lots of future changes
When introducing new employees into a codebase
When the code is complex or degraded along one or multiple quality dimensions
When refactoring or rearchitecting