Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Seeing systems through as a network lens can be useful. For example, the DC Metro Subway map can be used in navigation. It shows high level information about stops (nodes) and routes (arcs), while throwing out detail not needed to understand its “skeleton”.

Networks are established tools for representing and analyzing code because they are a natural means of capturing hierarchical relationships, modularity, coupling, cohesion, and cyclicality, and other important patterns.  When networks are used for this purpose, network nodes designate the “parts” at possibly multiple levels - functions, files, components. Arcs or lines between nodes designate relationships between those parts - such as function calls, data flow, or other use. If these arcs are directed (as they often are are in software, but not in the subway map) they can represent one-way dependencies between parts. Directed networks are useful because they represent unidirectional flows and dependencies.

Imagine you have a simple codebase with only 2 files, with code in “rectangle_functions” calling code implenetwork with a network of parts and interactionsmented in interactions with “math_functions” :

File name: rectangle_functions

Code Block
procedure area = rectangle_area(length, height)
	area = multiply(length, height)
end

procedure perimeter = rectangle_perimeter(length, height)
	perimeter = add(multiply(length, 2), multiply(height, 2))

File name:

math_functions

Code Block
procedure sum = add(num1, num2)
	sum = num1 + num2
end

procedure multiple = multiply(num1, num2)
	multiple = num1 * num2
end

If you draw a network of the codebase above, it would be simple, but real software systems can have millions of entities and billions of interconnections.

Dependency structure from example

Dependency structure of Linux

Image RemovedImage Added

 

Direct and indirect dependencies

A codebase is a collection of individual entities -

...

These entities are connected by relationships, which may be a call, subclass, data typing, etc. If these dependencies between files or entities are traced out, you can see that some dependencies are direct, while others are indirect.

...

Dependency count metrics - FI, FO, VFI, VFO

...

If you scan a codebase without any knowledge of its architecture , and find the interlinkages, sometimes the natural boundaries of modules suggest themself. If a code scan reveals a structure such as this, then you may conclude that four natural modules might be present:

...

Attributes of hierarchy

Benefits of hierarchy

Managing hierarchy

  • Dependencies between modules flow linearly in one direction

  • Facilitate top-down control

  • Reduce cognitive burden

  • Code is infinitely scalable

  • Prevents non-linear feedback loops

  • Greatly reduces system complexity

  •  

  • Hierarchies should not contain cyclic connections between modules in order to prevent feedback loops and non-linear dynamic behavior

...

In some codebases, the modular structure is explicitly defined. In this picture of the open Source Axis2 codebase each module is shown as a blue box and dependencies between modules are shown as arrows.

...

  • Code quality is a concept that helps you think about health of each ‘tree’ in the ‘forrest‘forest.’

  • Code quality measures apply to individual entities (procedures, classes, methods, files, data structures, specific lines, etc.)

  • Large codebases could contain millions of entities

  • Two procedures with same function can have widely varying code quality

...

McCabe assigns a number to a “structured program” or block of executable code based on a static analysis of the number linearly independent execution paths that can be followed as a program executes.  In modern programming languages, McCabe scores typically apply to procedures (called functions in C) or class methods.  Alternative paths through a procedure result from conditional branching statements (if statement, switch/case statement, while loops, etc.).  The following is a four-step recipe for computing the original version of McCabe’s metric:

...

A common variant (the one used in CodeMRI) excludes switch/case statements from consideration in the McCabe score.  This is often referred to as “Modified McCabe cyclomatic complexity.” McCabe Cyclomatic Complexity is commonly used as a Code Quality metric for executable blocks of code. In modern languages, it gives complexity scores to functions or methods.

Code Comments

Source code should contain comments describing what the code does, and the reasoning behind decisions that might be tricky to understand. Appropriate commenting and documentation is are important (along with good use of naming conventions) to teach developers what it does. This is critical because ‘design intent’ does not flow easily. If engineers are given system requirements, it is often easy to use them to redesign or understand a system. However, given only a system, it is almost impossible to use code inspection to reverse engineer its requirements. It is very important to tell future developers what something is supposed to do and why. Otherwise, this information will likely be lost to time.

...

Tests are the immune system of a codebase. They should ideally be written before or at the same time as the code being developed. If you have poor testing, the best time to start to improve is now. Our statistical studies have shown a critical relationship between good tests and quality, efficiency, and effectiveness.

Healthy testing

Unhealthy testing

 

 

The goal of unit tests is to ensure that parts work individually. The goal of system tests is to exercise the combined behavior of several parts or the a fully integrated system.

...

Test coverage metrics

Test coverage can be measured by running your suite of automated tests and identifying which code is executed by tests and which are is not. The simplest test coverage metric is simply a count of the lines of code tested (at least once) vs not. Other more complicated metrics make sense as well. You might care about whether specific branches are followed, specific types of data are passed in, etc. All test coverage metrics are ultimately a ratio of code/conditions exercised vs not.

...

  • When code is changing rapidly or you anticipate lots of future changes

  • When introducing new employees into a codebase

  • When the code is complex or degraded along one or multiple quality dimensiondimensions

  • When refactoring or rearchitecting

...