CodeMRI Administration Examples: Moving Data and Managing DataVaults

Send information in your DataVault to another vault or to someone else

Purpose

There are many reasons to move data from one vault to another:

  • You want to put it in a new vault that has more disk space

  • You want to send it to a centralized location where portfolio analysis can be done

  • You want to sent it to Silverthread for help with analysis, report generation, or debugging

What we will do in this worked example

  • Create 2 DataVaults: vault1 and vault2

  • Analyze codebases in vault 1

  • Pack data contained in vault 1

  • Send that data to a new location, or possibly another person

  • Import that data into vault2

Example

Creating DataVaults

Choose locations for your DataVaults

cd $HOME mkdir vault1 mkdir vault2

Create those vaults

cmri vault create --vault $HOME/vault1 cmri vault create --vault $HOME/vault2

Setting up a project and 3 systems for processing

Add a project and 3 systems to vault1

cmri system add --selection Linux_Kernel --name Linux_Kernel --version 0.01 --origin /home/dan/Documents/test_sourcecode/Linux_Kernel/linux-0.01/ --vault $HOME/vault1 cmri system add --selection Linux_Kernel --name Linux_Kernel --version 0.11 --origin /home/dan/Documents/test_sourcecode/Linux_Kernel/linux-0.11/ --vault $HOME/vault1 cmri system add --selection Linux_Kernel --name Linux_Kernel --version 0.12 --origin /home/dan/Documents/test_sourcecode/Linux_Kernel/linux-0.12/ --vault $HOME/vault1

Case 1: Produce a silverthread database for systems in vault 1

The job to produce_silverthread_database kicks off a job chain with many stages that you can watch stream by on the terminal. Each job extracts information from the code, puts that information into databases, and runs math operations. You can watch as each job starts and passes. At then end you can see that the databases have been produced:

You do not need a licensed version of CodeMRI to run produce_silverthread_database. This allows you to created metadata at a location where code resides, port that metadata to a computer with a licensed version of CodeMRI, and then produce reports for that system on the licensed computer. It also allows you to produce portfolio level reports on that central system if metadata is being collected from many remote locations.

Case 2: Generate reports for systems in vault 1

The job to produce_reports kicks off a job chain with many stages that you can watch stream by on the terminal. produce_reports runs all the jobs that produce_silverthread_database did, and then goes on to create Excel and other user-facing outputs. You can watch as each job starts and passes:

NOTE: You don’t need a licensed version of CodeMRI to capture portable metadata

Towards the end of the job chain you will see an ERROR because vault1 has not been set up with a CodeMRI license. This is not a problem for this example. In fact, this allows you to process code anywhere and then port metadata to a computer with a licensed version of CodeMRI. All jobs required to produce portable metadata have already been completed:

Pack the metadata for a system into a portable file for transport

Make a ‘pack’ for one of the systems in the vault

Now let’s look at the vault

Note that you have a ‘packs’ directory containing a 1.3 Megabyte file called Linux_Kernel.zip

Move that pack somewhere else. In this example, I’ll put it in a temporary directory and give it a new name

Let’s make another pack containing metadata for the other two systems. We’ll do it using the cmri interactive shell instead of running the code from the commandline

Get the list of systems in the vault

Select those two other systems. Note the use of '*' (the glob operator) to match multiple system names.

The second pack, which contains 2 systems is a little bigger - 3.7 Megabytes.

Move this second pack to the temporary directory as well:

Transport packed data to another computer with CodeMRI

Doing this is easy! With ZIP files, you can send them via email, an ftp site, a shared drive, a document storage system, or a secure copying application such as scp.

Import packed data into a second CodeMRI DataVault

Let’s start a CodeMRI interactive shell that uses the second DataVault: vault2

Note that vault2 is configured but empty

In the interactive CodeMRI shell, let’s unpack the zip file containing the first system

Now see that vault2 has data in it for the first system

Let’s unpack the zip file containing the other systems

Here is the result

Configure a license for vault2

If vault2 is not licensed, this command will fail in the same way that it did when you ran it in the vault1 (above in ‘Alternative 2’)

Let’s set up the license for vault2 now.

For online line licenses (typical)

Silverthread will create an account for http://codemri.com. Your login will be your email address. Set your password.

In the cmri interactive shell, log in to your account.

Result:

For offline licenses (classified systems without internet access)

Run cmri on your machine and connected to the DataVault you want to license

Then generate a machine ID file

  • Forward the ID file that gets generated to Silverthread

  • receive a license file back

  • Store file in C:\Program Files\Silverthread

Install the license

Produce CodeMRI Diagnostic reports in the second vault

Resulting Excel files can now be found in vault2

Evaluate software developed by third parties without access to their source code

The produce_reports command run in vault2 (example above) kicks off the same job chain we saw earlier in vault1. Note however, that this time many of the early jobs are skipped. This is because early jobs in the job chain were already executed in vault1.

In fact:

  • All jobs that scan source code were executed before the portable ‘pack’ was created

  • No source code is contained inside the ‘pack’

  • Producing reports in vault2 requires no access to the source code

  • This enables report generation, analysis, and evaluation to be done without direct access to the source code

Analyzing code distributed across an organization and centralizing CodeMRI data to produce Portfolio Reports

Go through the example above.

Imagine that the ‘packs’ or ZIP files came from multiple teams distributed across your organization or outside third parties. They could have come from hundreds of organizations.

Then, run the following command to produce Excel versions of the CodeMRI Portfolio reports

When this job is complete, your vault will have a new reports/Portfolio directory containing a portfolio report

Sanitizing data about your system before sharing with others

Most data in your packs (the Zipfiles) or in the reports that are generated about your system are not highly sensitive. Neither contains source code. The packs contain information about the geometry (network structure) of your codebase and metrics associated with files and other entities it contains.

Nevertheless, in some situations entity names, such as the names of source code files themselves, might be too sensitive to transmit. In this event, it is possible to mangle the names of identifiers in the packs and in the reports they generate. Outsiders will only be able to see reports with de-identified names.

Let’s continue with our example from above.

CodeMRI is run by the code owner, who is using vault1

CodeMRI is run by a person who needs to understand system health, but who cannot cave access to sensitive filename information. This person is using vault2

Assume we have already completed steps above in earlier examples

What’s in the vault before anonymization?

Let’s look at the contents of vault1 to see where portable metadata is stored. The directory contains a database file called silverthread_system.sqlite. This database is a very important component of the portable metadata that gets shipped in a pack:

This sqlite file has tables including a table that stores the names of source code files, code-specific entities (names of classes, functions, etc.), and information about the system itself - such as its name (Linux) and version (0.01). Other information in this database stores metrics for these files and entities.

Create an anonymous version of the silverthread database

The data sender sender connects to vault1 and runs the cmri interactive shell

Inside cmri attached to vault1, sender should do the following to anonymize all three codebases

What’s in the vault after anonymization?

Mapping between real and anonymous names

Open the ‘decoder’ Excel file to see the mapping between normal and anonymous names. The file contains 4 tabs - for files, directories, entities, and system metadata.

File names

Directory names

Entity names

File names

Directory names

Entity names

 

 

 

 

Portable database contents

File information - plaintext

File information - anonymous

Entity information - plaintext

Entity information - anonymous

Pack anonymized portable metadata to send to a third party

Once you have anonymized the database, simply run the pack command as you did before. This time the resulting ZIP file will have desensitized data.

Produce reports with anonymized names to send to a third party

Once the database is anonymized, just run the usual command to produce reports. The resulting Excel files will have anonymized names

For example, one of the CodeMRI Diagnostics reports for Linux-0.01 now looks like the following

Reset your vault to use normal names again

To reset a system in your DataVault so that it again exports ZIP files and produces reports with normal names, undo the job that produced the anonymized database:

Review CodeMRI data before release in sensitive or classified environments

In some situations, security practices must be put in place to ensure that sensitive information about a codebase is not released publicly. Examples of such systems include banking software, control code for nuclear power plants, and Top Secret software developed by the department of defense.

Silverthread’s portable metadata formats were designed in partnership with the United States Air Force. A process was devised to audit and authorize the release of CodeMRI’s ‘pack’ data in ZIP files.

  • Top Secret code can be scanned in a protected environment

  • ZIP files can be can be reviewed to ensure that they do not contain sensitive data

  • Those ZIP files have been sent to Silverthread and others.

If you are in the DoD community and want to know more about this authorization and the process involved, please contact Silverthread.

If you are in a sensitive or highly regulated industry and want to know how we can meet your security requirements, please contact Silverthread.