Rules and Groups Explication

User interaction and design

Overview

A common pattern of user input in the system is that of defining views of a network, and how the network via these views should look. Here are some examples:

File property values are a view of the codebase network where files are grouped by distinct values of a given file property.
Components are a view of a network where files are grouped by a special property named "component".
Rules are guidelines about how nodes are intended to connect.
Extra dependencies are associations between anonymous groups of entities.

We can then reduce the common elements into:

Subjects
Groups
Associations
Rules

Figure 1: User Generated Content Basic Model

Subjects

A subject is a class of thing in the system. For example, a specific file property would be a subject, files in the system would be another subject, entities in the system would be yet another subject.

This document defines 3 subjects:

property, all values of a specific named property.
file, all files in the codebase.
entity, all entities in the codebase.

At a later date, more subjects may be added.

Groups

Groups are collections of elements of a given subject. For example, a specific collection of files matching the mask "src/main/java/*" would be a group within the "file" subject.

In order to allow the most flexibility possible, we define groups as a series of "inclusions" and "exclusions", which are processed in order.

To evaluate this in terms of set theory:

Begin with an empty set.
Inclusions are treated as unions.
Exclusions are treated as subtractions.

It is possible to have only a single exclusion entry, although it would effectively be a no-op as there would be nothing to exclude from and should be addressed as part of validation sanity checks.

Inclusions and exclusions are processed in order, and a are structured as:

{
  "type": "inclusion|exclusion",
  "matchers": {
    "field": {
      "operator": ["operand"]
      "_comment": "Operand types vary by operator."
  }
}

At the time of writing, there is one operator defined: match, which takes in a list of GLOB expressions as defined by Python's fnmatch https://docs.python.org/3.4/library/fnmatch.html interface. Expression lists are combined with OR semantics.

Fields are combined using AND semantics. For example:

{
  "type": "inclusion",
  "matchers": {
    "name": { "match": ["clientSocket.send"] },
    "language": { "match": ["Java"] }
  }
}

would resolve to, in english "Everything named clientSocket.send whose language is Java."

If a C++ entity was named clientSocket.send it would not be included.
Similarly if a Java entity was named clientSocket.receive, it would not be included.

Now say we defined an exclusion afterward:

{
  "type": "exclusion",
  "matchers": {
    "type": {"match": ["Variable"]}
  }
}

combined with the inclusion above, this would read: "Everything named clientSocket.send whose language is Java, except those whose type is Variable."

Associations

Figure 2: Expanded View of Associations with different types & fields.

Associations are any relationship between subjects.

Multiple associations may be mixed in the same collection, provided that there are no overlapping groups. If two dependency associations have disjoint groups and have the same type & handling, they are treated as a union. Inclusions and exclusions are only scoped to a given association's group, and do not apply across associations. For example, if two mixed associations exist, one excludes "*.c" and the other includes a "test.c" file, the test.c file will be included.

Assignment

Assignments state that one group is a child of a given element. For example, an assignment would be used when assigning property values to a group of files, or when assigning sub-components to a parent component.

Dependency

Dependencies indicate communications or expected communication between two groups or an element and a group. Dependencies always point in the direction of usage. The dependency association has two optional fields:

dependency_type, which indicates the type of communication.
- compile means that the dependency is required at compile time. These are static dependencies such as a specific method invocation or class instantiation.
- runtime indicates a dependency that is required, and may only be seen, at runtime. For example, socket communication, or the use of certain inversion of control frameworks.
- documentation indicates a dependency that is required for the purpose of documentation.
- test indicates that a dependency is required for testing purposes, but is not used at runtime.

Rules

Rules define how and what associations are allowed to be made within the system. For example, take the rule deny components *Plugin *Plugin. This is expressed as "There may be no association of type "dependency" between any component ending in the word "Plugin", and any other component ending in the word "Plugin". Rules can affect both the user configuration (i.e. it is possible for a configuration to violate its own rules), and the actual system (i.e. when there is a dependency that is not declared, but explicitly disallowed).

Anonymous Groups

Anonymous Groups are unnamed collections, typically used to define dependencies to other anonymous groups en masse. For example, explicit entity dependencies would be specified using anonymous groups, as would entity & file dependency exclusions.

Comments

At any level if we find a "_comment" field , we will ignore that field's contents. This allows for users to specify single-line comments in the form of strings and multi-line comments in the form of an array. Here is an example of a multi-line comment array:

JSON has no standard mechanism for comments. While there is a JSON-C superset that allows for comments, it is not widely supported among JSON tools and libraries. In order to make things easier for our customers, we'll want to conform to standard JSON.

Matchers

Matchers consist of a comparison operation

Basic structure of a matcher:

This allows us to add different match types (e.g. regexp, numerical range) later without breaking compatibility.

Expressions within match are combined using OR semantics.

File Property and Component Definitions

Overview

File properties and components are very similar. Components are a special case of file properties that are currently handled in a different way by our system. For convenience, we will have components specified as a "component" file property.

Figure 4: Expanded view of properties.

Property

A Property is a named subject that contains one or more values. Property values can then be associated with files via an assignment to mark files with a given file property.

Value

Values are specific names under the property that can be assigned to groups. Property values can have dependencies on each-other, as well as contain other property values as children. For example a component of "graphics" can depend on the component "utilities".

Here is a sample component definition for a single component. This is fully pretty-printed with each element on 1 line to show nesting. In practice, some of these would likely be combined onto a single line for brevity.

Note that a child value may only have one parent of the same property. The same goes for files, only one specific property may be assigned to a given file. It is an error for a file to have two conflicting property values.

Component Definitions

Components are defined by defining a "component" property, and a specific value for the component to be defined. Dependencies & file matches are defined as associations:

This property component has a value graphics.

The value graphics is assigned to files under the folders src/opengl, src/buffer, src/draw. Files beginning with */test_* are excluded.

Property values can also contain arbitrary metadata that may be interpreted by our system. For example, a property value that is a component may want to specify that it is a test component. This would be done by including a "data" field in the value definition:

Graphics has a dependency on other "component" values matching the literal string "utilities". Using this scheme, we are free to add other component fields, and free to add matching on any component field without changing fundamentally how this configuration is expressed.

Component Hierarchy

The same mechanism can be used to assign defined components to a parent component. Note that whereas in the past CodeMRI would create components that do not exist when scanning relationships, this new configuration type requires that the component exist before using it in a relationship.

Rules

Rules define how a system ought to be structured, rather than component or property definitions which describe the current understanding of the system. The user defining rules would typically be an architect, or a similar person who lays out the overarching "big picture" of the system. For example, a plugin-based architecture may stipulate that plugins cannot communicate with each-other.

Figure 5: Expanded View of Rules

Here is an example rule definition for a deny & allow rule in the proposed model. Note that JSON is fully expanded in order to better visualize nesting:

Mark Rules Against CSVs

Authors may specify mark rules pointing to a set of relationships in CSV or TSV form. Generally users doing so are supplying mass-relationships returned from our query engine in order to, for example, mark existing architectural errors as warnings. In the previous rule format, such a definition would look like this:

In the User Defined Architecture format, authors would write:

As with the previous incarnation of mark rules, it is illegal to combine input and "from"/"to" matchers in the same rule. Attempts to do so will result in a validation error.

This also assumes that the input file will only contain relationships between the provided subjects. If the input format changes to accommodate mixed from/to subjects in the same file, we will need to extend or modify the input object. This would be a minimal change, and would net need to break existing configurations.

Multiple input files will require multiple rules.

Anonymous Groups

Figure 6: Expanded View of Anonymous Groups

Anonymous groups are collections of elements that can depend on other anonymous groups of elements. These groups are the mechanism by which users manually enter "extra dependencies" or "runtime entity dependencies" and specify network exclusions.

Defining Extra Runtime Dependencies

Entities are uniquely identified by:

Name
Containing File
Line Number
Column

Here is an example extra dependency:

Defining Exclusions

On dependencies, there are two optional parameters:

dependency_type, which defaults to compile.
dependency_handling, which defaults to inclusion.

If dependency_handling is given the value of "exclusion", then CodeMRI will remove the dependencies from the network.

We could read this as: "Exclude all dependencies coming from anything in Java to a Java "Package" entity. If this collides with an explicit dependency declaration, we would raise an error as the user must explicitly exclude their declaration from this "exclusion" mask.

Defining Common Component Dependencies

Anonymous groups can also be used to define component definitions en-masse. This can be useful for projects that have a number of very similar components, such as Plugin architectures. Associations always combine additively. Conflicts in dependency_handling/dependency_type are errors.

Modular Configurations

For very large codebases, or codebases that span multiple repositories, editing configurations inside of distinct modules that are later recombined may be desirable. If anything, to avoid merge conflicts caused by two people editing the same file. This presents us with a few design challenges:

How do we recombine these modules into an unified configuration.
Assuming that we do, how do we treat apparent conflicts across modules?

Some apparent "conflicts" are not conflicts at all. For example, it's reasonable to assume that two modules may want to mark different sets of files as being "third party". There may also be a large component that spans multiple areas in the code. For example, a test component, or a large UI component owned by multiple people.

For this reason, we will follow these rules when combining module definitions:

Associations are additive when they contain disjoint groups. For example, module A defines files as 3rd party if they are in its local "src/third_party" folder. Module B defines files as "third party" if they are in its local "vendor" folder. The final configuration will have both the files in "module_a/src/third_party" and "module_b/vendor" as "third party". This is a logical assumption for a user to make, and will not change depending upon order.
Data fields cannot disagree across modules. For example, it is an error if Module A says that the TEST component's component_type is test and then Module B says that the TEST component's component_type is documentation.
Rules cannot be defined across modules as order matters in rule application.

There also needs to be a way to "discover" modules. Apache Maven has a system for this that consists of a "parent" file that explicitly references one of many "modules". We can apply this system to our configuration:

Rules must be defined in the "parent" module. These are "big-picture" definitions that affect the codebase as a whole, versus isolated things such as the addition of components or the application of a given file property value to a set of files.
Properties & components can either be defined in the parent (as would be the case for a large cross-module component), or in individual modules (as would be the case for smaller isolated components).
Anonymous Groups would almost certainly be defined in module files.
Module files may cascade, i.e. modules can import their own sub-modules.

The proposed configuration for modules would be a top-level object that specifies a set of paths to include:

This structure is logically consistent with the "top-down" structure we employ for relationships. Note that we should have consistent naming for our JSON configuration file. I chose to leave out suggestions for the name of the configuration file from this document in order to keep the focus on the structure of the configuration.