Measuring technical health

Properties of a healthy codebase

A large codebase is a complex system that can be healthy or unhealthy for many reasons in the same way a human body can be. Regular checkups and a focus on fitness and prevention are the best way to go in both cases. That being said, it’s (almost) never too late to get back on track by working with doctors and going to the gym.

Code health can be viewed along many dimensions including

Architecture / Design Quality

Code Quality

Reuse & Commonality

Test Quality

Architecture / Design Quality

Code Quality

Reuse & Commonality

Test Quality

Health of skeleton and ligaments of a codebase

Health of each of the microscopic cells in a codebase

Fitness of a codebase - is it lean or is it obese?

Immune system of a codebase

The structural layout of the “body” from a holistic top-down perspective, focused on the interaction between parts

The health of individual components in the “body” from a reductionist bottom-up perspective, focused on each part in isolation

The health of the “body” from the perspective that the amount of code required should be minimized

The health of the body from the perspective that defects should be prevented from being introduced, festering and growing

  • Modularity

  • Hierarchy

  • Layering

  • APIs

  • Simplicity

  • Readability

  • Parsimony

  • Share common solutions

  • Don’t repeat yourself

  • Lock down behavior you want

  • Find and fix problems ASAP

Applies to system as a whole at multiple levels

Applies to individual lines of code, functions, files, etc.

Applies at system level and traceable to individual parts

Applies at system level and to individual files, functions, etc.

Measured using network / graph theory

Measured using code quality checkers

Measured using code duplication checkers

Measured using test coverage tools

Architecture / Design Quality

Architecture Quality Principles

Design Quality can be significantly improved by  adhering to certain well-understood principles

Design quality impact

  • Poor Design Quality leads to difficulty understanding the system, lost productivity, and increased bugs and defects

  • Fixing Design Quality problems is relatively difficult – “malignant” problems propagate through dependencies across the system

Thinking about a codebase as a network

Seeing systems through as a network lens can be useful. For example, the DC Metro Subway map can be used in navigation. It shows high level information about stops (nodes) and routes (arcs), while throwing out detail not needed to understand its “skeleton”.

Networks are established tools for representing and analyzing code because they are a natural means of capturing hierarchical relationships, modularity, coupling, cohesion, and cyclicality, and other important patterns.  When networks are used for this purpose, network nodes designate the “parts” at possibly multiple levels - functions, files, components. Arcs or lines between nodes designate relationships between those parts - such as function calls, data flow, or other use. If these arcs are directed (as they often are are in software, but not in the subway map) they can represent one-way dependencies between parts. Directed networks are useful because they represent unidirectional flows and dependencies.

Imagine you have a simple codebase with only 2 files, with code in “rectangle_functions” calling code with a network of parts and interactions with “math_functions” :

File name: rectangle_functions

procedure area = rectangle_area(length, height) area = multiply(length, height) end procedure perimeter = rectangle_perimeter(length, height) perimeter = add(multiply(length, 2), multiply(height, 2))

File name:

math_functions

procedure sum = add(num1, num2) sum = num1 + num2 end procedure multiple = multiply(num1, num2) multiple = num1 * num2 end

If you draw a network of the codebase above, it would be simple, but real software systems can have millions of entities and billions of interconnections.

Dependency structure from example

Dependency structure of Linux

Dependency structure from example

Dependency structure of Linux

 

Direct and indirect dependencies

A codebase is a collection of individual entities -

These entities are connected by relationships, which may be a call, subclass, data typing, etc. If these dependencies between files or entities are traced out, you can see that some dependencies are direct, while others are indirect.

Dependency count metrics - FI, FO, VFI, VFO

If a file is used a lot, we can think of it as a shared utility. If a file calls out a lot, we might think of it as a control element that directs the actions of many other files.

This way of thinking allows us to introduce some architecture metrics for each file:

Fan In (FI)

How many other nodes depend upon it directly? Computed by counting the number of arrows pointing into that node 

Fan Out (FO)

How many other nodes does it depend upon directly?  Computed by counting the number of arrows pointing out from that node

Visibility Fan In (VFI)

How many other nodes depend upon it directly or indirectly?

Visibility Fan Out (VFO):

How many other nodes does it depend upon directly or indirectly?

For example:

File L is a control element.

  • It might contain GUI code

  • It depends on 10 files directly

  • It depends on 11 files directly and indirectly

  • Nothing depends on it

File F is in the middle

  • It might contain application specific business logic

  • It depends on one file

  • One file depends on it

File A is a utility

  • It might contain OS, database driver, or logging code

  • 10 files depend on it directly

  • 11 files depend on it indirectly

  • It depends on nothing

  • FI = 0

  • FO = 10

  • VFI = 0

  • VFO = 11

  • FI = 1

  • FO = 1

  • VFI = 1

  • VFO = 1

  • FI = 10

  • FO = 0

  • VFI = 11

  • VFO = 0

God Files / God Classes

A file with a very high FO score directly depends on many other things in the codebase. This can sometimes (but not always) indicate a problem. In object oriented languages, these files are said to contain ‘god classes’. Our statistical analysis has shown that god files tend to have elevated defect rates. This may have to do with the quality of the file’s contents. However, it is also likely do to the fact that if a file depends on many other files, then changes or problems in those other files impact the god file because of the dependency.

Modularity

As a general principle, codebases should be modular. A modular system

  • Is composed distinct modules, subsystems or components (whatever you want to call them)

  • Modules contain highly cohesive elements with strong interconnections inside the module

  • Modules are loosely coupled to other modules with weaker interconnections between them

  • Interconnections between modules should be routed through simple interfaces or APIs that hide the complexity inside

  • A module should be small and simple enough that a human being is capable of understanding and modifying it

Attributes of modularity

Benefits of modularity

Managing modules

Attributes of modularity

Benefits of modularity

Managing modules

  • Low coupling

  • High cohesion

  • Private internals

  • Well-defined interfaces

  • Keep similar functionality in one location

  • Bound cognitive burden

  • Bound communication requirements

  • Make the system more evolvable

  • Changes in one module should not affect others (unless there are changes made to the interface between modules)

  • If a module gets too big (such as Module B), it should be split into smaller modules

If you scan a codebase without any knowledge of its architecture and find the interlinkages, sometimes the natural boundaries of modules suggest themself. If a code scan reveals a structure such as this, then you may conclude that four natural modules might be present:

Hierarchies

As a general principle, a software system should be hierarchical. The example above is a hierarchy. From a graph theory perspective, hierarchies can come in several flavors, including trees and layers. The common feature of a hierarchy is that dependencies flow in one direction from top, through middle, to bottom, without upward facing links.

Here are some other pictures of hierarchies:

Tree Hierarchy

Layered Hierarchy

Tree Hierarchy

Layered Hierarchy

 

 

Hierarchy incorporates linear (non-circular) dependencies between modules, significantly reducing perceived complexity

Attributes of hierarchy

Benefits of hierarchy

Managing hierarchy

Attributes of hierarchy

Benefits of hierarchy

Managing hierarchy

  • Dependencies between modules flow linearly in one direction

  • Facilitate top-down control

  • Reduce cognitive burden

  • Code is infinitely scalable

  • Prevents non-linear feedback loops

  • Greatly reduces system complexity

  • Hierarchies should not contain cyclic connections between modules in order to prevent feedback loops and non-linear dynamic behavior

Reuse & Utilities

Reuse is the gift that keeps on giving. Reuse refers to shared utilities that are widely used by other modules, but do not rely upon other modules themselves.

A file with a very high FI or VFI score (provided it has a low VFO score) is a likely a utility. It contains generic abstract functionality that is is heavily reused. Utilities are likely to be well tested, battle hardened, and have served the test of time. Our statistical analysis has shown that utility files have low defect rates.

Attributes of reuse

Benefits of reuse

Managing reuse

Attributes of reuse

Benefits of reuse

Managing reuse

  • Reuse refers to utilities that are “called” often by other modules and entities

  • An example is the printf() function in C

  • Prevents wasted effort through redundant functionality

  • Allows capabilities to mature over time

  • Reuse can break down when multiple branches emerge, replicating similar functionality across multiple entities

  • Utilities should never call other modules, unless they are other utilities

Layers, Platforms, & Plug-in architectures

Some systems are layered, but not all are or need to be. Layers combine modularity and hierarchy to create a well-structured, easy-to-understand, and maintainable codebase.

Attributes of layers

Benefits of layers

Managing layers

Attributes of layers

Benefits of layers

Managing layers

  • Layers are rigid and defined interfaces partition a system

  • There is interaction between layers without skipping

  • Provide a powerful ability to abstract away complexity from lower layers

  • Make the system more flexible

  • Insulate people working in different domains

  • Well-defined APIs should provide access to lower layers

  • Dependencies should flow in one direction

In some cases, a system will contain a generic engine that is then used for common or generic tasks, while individual plugins that depend on it will specialize its behavior for specific use. One example might be a video game engine, with each game that depends on it being considered a plug in. Another might be tax and accounting software that has common code in an engine, but specialized code to help you do your taxes for your individual state in one of 50 plug ins.

The breakdown of architecture health: Cores and cyclic groups

When a developer violates Design Quality principles, large cycles called cores emerge and radically increase complexity.

By its nature, healthy code ‘wants’ to be a hierarchy of modules. That being said, most systems are not - in the same way that humans want to be healthy, but most don’t go to the gym every day.

When we scan code, we often find ‘cycles’ at different levels. For example, we may look at the contents of each file and discover that calls between some of them form a ring. (A contrived example to be sure.)

The codebase above contains a ‘core’, otherwise known as a ‘cyclic group’. A core is a collection of entities - in this case files - in which every entity is ‘reachable’ from every other entity in a circular fashion. By following arrows, we can visually trace how A calls B, B calls C, and via some path, C calls back to the original file A. File cores can be detected by looking at only the source code, without any formal architecture description

Cores represent a breakdown of hierarchy, because hierarchies must flow in one direction by definition. They also represent a breakdown of modularity if they become too big (more than a handful of files) because they may encapsulate code that should be distinct, but is instead coupled in a manner that is non-obvious and very difficult to manage.

Cores or cyclic groups can exist at any level of abstraction. For example, we may think of modules as collections of files, and then look at the dependencies that cross module boundaries to find component cores.

Attributes of cores

Detriments of cores

Managing cores

Attributes of cores

Detriments of cores

Managing cores

  • Integral clusters with high levels of connectedness due to large cycles

  • Modules and/or entities are directly or indirectly co-dependent

  • Extremely architecturally complex

  • Incredibly difficult to cognitively understand

  • Increased maintenance costs

  • Reduced developer productivity

  • Increased bugs

  • Problems grow as cores grow.

  • Becomes unmanageable if they become too large

  • These regions are difficult to decompose into smaller parts

Architecture Integrity

In some codebases, the modular structure is explicitly defined. In this picture of the open Source Axis2 codebase each module is shown as a box and dependencies between modules are shown as arrows.

Code Quality

Code quality principles

  • Code quality is a concept that helps you think about health of each ‘tree’ in the ‘forest.’

  • Code quality measures apply to individual entities (procedures, classes, methods, files, data structures, specific lines, etc.)

  • Large codebases could contain millions of entities

  • Two procedures with same function can have widely varying code quality

To illustrate look at these two procedures that do the same thing. The first is the classic ‘hello world’ program - the first program written by every student in Computer Science 101. The second is also ‘hello world’, but the implementation is from a submission to the ‘Obfuscated Coding Competition’.

Procedure 1 - Great!

Procedure 1 - Great!

print(“Hello World!”)

Procedure 2 - Not good - Does the same thing!

It is obvious from this example that identical functionality can be provided by good or bad code. One hopes that a codebase is filled with code that looks like Procedure 1 and few that look like Procedure 2. To some extent, every codebase has some bad code, however.

Note also that two pieces of code with very different ‘form’ can perform identical ‘function.' Procedure 1 contains only essential complexity, while Procedure 2 contains lots of non-essential complexity. For that reason, be somewhat skeptical when a developer says their code must be overly complex because they wrote it to solve a hard problem.

Managing Code Quality

  • Poor code quality leads to difficulty understanding, waste, and defects

  • Contractors may be delivering on functionality, but at the cost of complexity

  • Fixing code quality problems is relatively simple – “benign” problems can be locally addressed within the entity

Code quality metrics

McCabe Cyclomatic Complexity

McCabe assigns a number to a “structured program” or block of executable code based on a static analysis of the number linearly independent execution paths that can be followed as a program executes.  In modern programming languages, McCabe scores typically apply to procedures (called functions in C) or class methods.  Alternative paths through a procedure result from conditional branching statements (if statement, switch/case statement, while loops, etc.).  The following is a four-step recipe for computing the original version of McCabe’s metric:

  1. Increment one for every IF, CASE or other alternate execution construct

  2. Increment one for every Iterative DO, DO-WHILE or other repetitive construct

  3. Add two less than the number of logical alternatives in a CASE

  4. Add one for each logical operator (AND, OR) in an IF

McCabe asserted that his number could be used to estimate the effort required in test coverage.  He also suggested that cyclomatic complexity for procedures or methods should be kept below the value 10 so that they remain understandable and testable. A classification scheme has been devised to bin procedures into four general types based on their McCabe scores.

Definition

Example calculation

What is good?

Definition

Example calculation

What is good?

The McCabe Cyclomatic Complexity is the number of linearly independent execution paths through a program

McCabe Complexity can be calculated as

M = E - N + 2, where:

  • E = number of edges

  • N = number of nodes

The system to the above has a McCabe score of

10 - 8 + 2 = 4

According to NIST:

  • 0-10: Low complexity, in compliance with NIST reccomendaitons

  • 11-20: Medium complexity

  • 21-50: High complexity

  • 51+: Very high complexity

  • NOTE: Some safety critical systems require max scores of 7

McCabe’s metric has been positively related to defect density and the productivity of developers doing maintenance on previously shipped code. Many firms now use McCabe’s scores as a means of identifying problematic code.

A common variant (the one used in CodeMRI) excludes switch/case statements from consideration in the McCabe score.  This is often referred to as “Modified McCabe cyclomatic complexity.” McCabe Cyclomatic Complexity is commonly used as a Code Quality metric for executable blocks of code. In modern languages, it gives complexity scores to functions or methods.

Code Comments

Source code should contain comments describing what the code does, and the reasoning behind decisions that might be tricky to understand. Appropriate commenting and documentation are important (along with good use of naming conventions) to teach developers what it does. This is critical because ‘design intent’ does not flow easily. If engineers are given system requirements, it is often easy to use them to redesign or understand a system. However, given only a system, it is almost impossible to use code inspection to reverse engineer its requirements. It is very important to tell future developers what something is supposed to do and why. Otherwise, this information will likely be lost to time.

Commonality

A codebase is beneficial because it provides capabilities, not because it is big. In fact, if two codebases deliver the same capabilities but one is significantly smaller, the smaller one will be more valuable. This is for a very simple reason:

  • Value -> Benefits - Costs

  • Value -> Capabilities - Cost of development & maintenance

  • Value -> Capabilities - Volume of code that must be developed and maintained

For this reason, you want to have a codebase that follows the Don’t Repeat Yourself (DRY) principle

Code Duplication

Code duplication checks can compare blocks of code in your codebase against other blocks in the same file, or against blocks of code in other files. They can identify places where a developer copied code from one place to another. Good duplication checkers can find duplication that is similar but not necessarily identical. This is important because copies may be slightly modified when initially copied or they may drift apart.

For a system, we can find all the similarities and then compute the percent of duplicative code in the system. We can also compute the amount of code that would remain if duplications were eliminated. Duplication metrics can be given for individual files.

Exact information about the location of each duplication, the differences, and the amount of drift can be used by developers when planning efforts to eliminate it.

Test Quality

Tests are the immune system of a codebase. They should ideally be written before or at the same time as the code being developed. If you have poor testing, the best time to start to improve is now. Our statistical studies have shown a critical relationship between good tests and quality, efficiency, and effectiveness.

Healthy testing

Unhealthy testing

Healthy testing

Unhealthy testing

 

 

The goal of unit tests is to ensure that parts work individually. The goal of system tests is to exercise the combined behavior of several parts or a fully integrated system.

Test coverage metrics

Test coverage can be measured by running your suite of automated tests and identifying which code is executed by tests and which is not. The simplest test coverage metric is simply a count of the lines of code tested (at least once) vs not. Other more complicated metrics make sense as well. You might care about whether specific branches are followed, specific types of data are passed in, etc. All test coverage metrics are ultimately a ratio of code/conditions exercised vs not.

Prioritizing test development

Most important when:

  • When code is changing rapidly or you anticipate lots of future changes

  • When introducing new employees into a codebase

  • When the code is complex or degraded along one or multiple quality dimensions

  • When refactoring or rearchitecting