...
Seeing systems through as a network lens can be useful. For example, the DC Metro Subway map can be used in navigation. It shows high level information about stops (nodes) and routes (arcs), while throwing out detail not needed to understand its “skeleton”.
Networks are established tools for representing and analyzing code because they are a natural means of capturing hierarchical relationships, modularity, coupling, cohesion, and cyclicality, and other important patterns. When networks are used for this purpose, network nodes designate the “parts” at possibly multiple levels - functions, files, components. Arcs or lines between nodes designate relationships between those parts - such as function calls, data flow, or other use. If these arcs are directed (as they often are are in software, but not in the subway map) they can represent one-way dependencies between parts. Directed networks are useful because they represent unidirectional flows and dependencies.
Imagine you have a simple codebase with only 2 files, with code in “rectangle_functions” calling code implenetwork with a network of parts and interactionsmented in interactions with “math_functions” :
File name: rectangle_functions |
| ||
---|---|---|---|
File name: math_functions |
|
If you draw a network of the codebase above, it would be simple, but real software systems can have millions of entities and billions of interconnections.
Dependency structure from example | Dependency structure of Linux |
---|---|
|
Direct and indirect dependencies
A codebase is a collection of individual entities -
...
These entities are connected by relationships, which may be a call, subclass, data typing, etc. If these dependencies between files or entities are traced out, you can see that some dependencies are direct, while others are indirect.
...
Dependency count metrics - FI, FO, VFI, VFO
...
If you scan a codebase without any knowledge of its architecture , and find the interlinkages, sometimes the natural boundaries of modules suggest themself. If a code scan reveals a structure such as this, then you may conclude that four natural modules might be present:
...
Attributes of hierarchy | Benefits of hierarchy | Managing hierarchy |
---|---|---|
|
|
|
...
In some codebases, the modular structure is explicitly defined. In this picture of the open Source Axis2 codebase each module is shown as a blue box and dependencies between modules are shown as arrows.
...
Code quality is a concept that helps you think about health of each ‘tree’ in the ‘forrest‘forest.’
Code quality measures apply to individual entities (procedures, classes, methods, files, data structures, specific lines, etc.)
Large codebases could contain millions of entities
Two procedures with same function can have widely varying code quality
...
McCabe assigns a number to a “structured program” or block of executable code based on a static analysis of the number linearly independent execution paths that can be followed as a program executes. In modern programming languages, McCabe scores typically apply to procedures (called functions in C) or class methods. Alternative paths through a procedure result from conditional branching statements (if statement, switch/case statement, while loops, etc.). The following is a four-step recipe for computing the original version of McCabe’s metric:
...
A common variant (the one used in CodeMRI) excludes switch/case statements from consideration in the McCabe score. This is often referred to as “Modified McCabe cyclomatic complexity.” McCabe Cyclomatic Complexity is commonly used as a Code Quality metric for executable blocks of code. In modern languages, it gives complexity scores to functions or methods.
Code Comments
Source code should contain comments describing what the code does, and the reasoning behind decisions that might be tricky to understand. Appropriate commenting and documentation is are important (along with good use of naming conventions) to teach developers what it does. This is critical because ‘design intent’ does not flow easily. If engineers are given system requirements, it is often easy to use them to redesign or understand a system. However, given only a system, it is almost impossible to use code inspection to reverse engineer its requirements. It is very important to tell future developers what something is supposed to do and why. Otherwise, this information will likely be lost to time.
...
Tests are the immune system of a codebase. They should ideally be written before or at the same time as the code being developed. If you have poor testing, the best time to start to improve is now. Our statistical studies have shown a critical relationship between good tests and quality, efficiency, and effectiveness.
Healthy testing | Unhealthy testing |
---|---|
|
|
The goal of unit tests is to ensure that parts work individually. The goal of system tests is to exercise the combined behavior of several parts or the a fully integrated system.
...
Test coverage metrics
Test coverage can be measured by running your suite of automated tests and identifying which code is executed by tests and which are is not. The simplest test coverage metric is simply a count of the lines of code tested (at least once) vs not. Other more complicated metrics make sense as well. You might care about whether specific branches are followed, specific types of data are passed in, etc. All test coverage metrics are ultimately a ratio of code/conditions exercised vs not.
...
When code is changing rapidly or you anticipate lots of future changes
When introducing new employees into a codebase
When the code is complex or degraded along one or multiple quality dimensiondimensions
When refactoring or rearchitecting
...