Send information in your DataVault to another vault or to someone else

Purpose

There are many reasons to move data from one vault to another:

You want to put it in a new vault that has more disk space
You want to send it to a centralized location where portfolio analysis can be done
You want to sent it to Silverthread for help with analysis, report generation, or debugging

What we will do in this worked example

Create 2 DataVaults: vault1 and vault2
Analyze codebases in vault 1
Pack data contained in vault 1
Send that data to a new location, or possibly another person
Import that data into vault2

Example

Creating DataVaults

Choose locations for your DataVaults

cd $HOME
mkdir vault1
mkdir vault2

Create those vaults

cmri vault create --vault $HOME/vault1
cmri vault create --vault $HOME/vault2

Setting up a project and 3 systems for processing

Add a project and 3 systems to vault1

 cmri system add --selection Linux_Kernel --name Linux_Kernel --version 0.01 --origin /home/dan/Documents/test_sourcecode/Linux_Kernel/linux-0.01/ --vault $HOME/vault1
 cmri system add --selection Linux_Kernel --name Linux_Kernel --version 0.11 --origin /home/dan/Documents/test_sourcecode/Linux_Kernel/linux-0.11/ --vault $HOME/vault1
 cmri system add --selection Linux_Kernel --name Linux_Kernel --version 0.12 --origin /home/dan/Documents/test_sourcecode/Linux_Kernel/linux-0.12/ --vault $HOME/vault1

Case 1: Produce a silverthread database for systems in vault 1

cmri job run produce_silverthread_database --selection 'Linux_Kernel/*'

The job to produce_silverthread_database kicks off a job chain with many stages that you can watch stream by on the terminal. Each job extracts information from the code, puts that information into databases, and runs math operations. You can watch as each job starts and passes. At then end you can see that the databases have been produced:

You do not need a licensed version of CodeMRI to run produce_silverthread_database. This allows you to created metadata at a location where code resides, port that metadata to a computer with a licensed version of CodeMRI, and then produce reports for that system on the licensed computer. It also allows you to produce portfolio level reports on that central system if metadata is being collected from many remote locations.

Case 2: Generate reports for systems in vault 1

cmri job run produce_reports --selection 'Linux_Kernel/*'

The job to produce_reports kicks off a job chain with many stages that you can watch stream by on the terminal. produce_reports runs all the jobs that produce_silverthread_database did, and then goes on to create Excel and other user-facing outputs. You can watch as each job starts and passes:

NOTE: You don’t need a licensed version of CodeMRI to capture portable metadata

Towards the end of the job chain you will see an ERROR because vault1 has not been set up with a CodeMRI license. This is not a problem for this example. In fact, this allows you to process code anywhere and then port metadata to a computer with a licensed version of CodeMRI. All jobs required to produce portable metadata have already been completed:

Pack the metadata for a system into a portable file for transport

Make a ‘pack’ for one of the systems in the vault

cmri job run pack --selection 'Linux_Kernel/Linux_Kernel-0.01'

Now let’s look at the vault

[dan@fedora ~]$ ls -l $HOME/vault1
total 40
-rw-rw-r--. 1 dan dan   612 Dec  1 14:42 config.json
drwxrwxr-x. 1 dan dan    54 Dec  1 13:20 events
-rw-rw-r--. 1 dan dan   440 Dec  1 14:43 job_history.csv
-rw-rw-r--. 1 dan dan 32684 Dec  1 14:43 job_log.csv
drwxrwxr-x. 1 dan dan  1020 Dec  1 13:03 locks
drwxrwxr-x. 1 dan dan  1018 Dec  1 14:43 logs
drwxrwxr-x. 1 dan dan    32 Dec  1 14:40 packs
drwxrwxr-x. 1 dan dan    24 Dec  1 12:57 projects
drwxrwxr-x. 1 dan dan    24 Dec  1 13:18 reports
-rwxrwxr-x. 1 dan dan     0 Dec  1 13:03 update_tracker

Note that you have a ‘packs’ directory containing a 1.3 Megabyte file called Linux_Kernel.zip

[dan@fedora ~]$ ls -lh $HOME/vault1/packs
total 1.3M
-rw-rw-r--. 1 dan dan 1.3M Dec  1 14:40 Linux_Kernel.zip

Move that pack somewhere else. In this example, I’ll put it in a temporary directory and give it a new name

mv $HOME/vault1/packs/Linux_Kernel.zip /tmp/CodeMRI_Pack1.zip

Let’s make another pack containing metadata for the other two systems. We’ll do it using the cmri interactive shell instead of running the code from the commandline

cmri shell --vault $HOME/vault1

Get the list of systems in the vault

vault list

Select those two other systems. Note the use of '*' (the glob operator) to match multiple system names.

select Linux_Kernel/Linux_Kernel-0.1*
vault list

The second pack, which contains 2 systems is a little bigger - 3.7 Megabytes.

Move this second pack to the temporary directory as well:

mv $HOME/vault1/packs/Linux_Kernel.zip /tmp/CodeMRI_Pack2.zip

Transport packed data to another computer with CodeMRI

Doing this is easy! With ZIP files, you can send them via email, an ftp site, a shared drive, a document storage system, or a secure copying application such as scp.

Import packed data into a second CodeMRI DataVault

Let’s start a CodeMRI interactive shell that uses the second DataVault: vault2

cmri shell --vault $HOME/vault2

Note that vault2 is configured but empty

[dan@fedora ~]$ ls $HOME/vault2
config.json  locks  logs  projects

[dan@fedora ~]$ ls $HOME/vault2/projects
... empty ...

In the interactive CodeMRI shell, let’s unpack the zip file containing the first system

job run unpack /tmp/CodeMRI_Pack1.zip

Now see that vault2 has data in it for the first system

[dan@fedora ~]$ ls $HOME/vault2/projects
Linux_Kernel

[dan@fedora ~]$ ls $HOME/vault2/projects/Linux_Kernel/systems
Linux_Kernel-0.01

[dan@fedora ~]$ ls $HOME/vault2/projects/Linux_Kernel/systems/Linux_Kernel-0.01/
config.json  data  job_history.csv  locks  logs

Let’s unpack the zip file containing the other systems

job run unpack /tmp/CodeMRI_Pack2.zip

Here is the result

[dan@fedora ~]$ ls $HOME/vault2/projects/Linux_Kernel/systems
Linux_Kernel-0.01  Linux_Kernel-0.11  Linux_Kernel-0.12

Configure a license for vault2

If vault2 is not licensed, this command will fail in the same way that it did when you ran it in the vault1 (above in ‘Alternative 2’)

Let’s set up the license for vault2 now.

For online line licenses (typical)

Silverthread will create an account for http://codemri.com. Your login will be your email address. Set your password.

In the cmri interactive shell, log in to your account.

account login --email dan@silverthreadinc.com --password MYPASSWORD

Result:

For offline licenses (classified systems without internet access)

Run cmri on your machine and connected to the DataVault you want to license

cmri shell --vault $HOME/vault2

Then generate a machine ID file

machine-id generate

Forward the ID file that gets generated to Silverthread
receive a license file back
Store file in C:\Program Files\Silverthread

Install the license

license add -f /path/to/license_file_name.lic

Produce CodeMRI Diagnostic reports in the second vault

cmri job run produce_reports --selection '*/*' --vault $HOME/vault2

Resulting Excel files can now be found in vault2

[dan@fedora ~]$ ls $HOME/vault2/reports
Linux_Kernel

[dan@fedora ~]$ ls $HOME/vault2/reports/Linux_Kernel/
Linux_Kernel-0.01
Linux_Kernel-0.11
Linux_Kernel-0.12

[dan@fedora ~]$ ls $HOME/vault2/reports/Linux_Kernel/Linux_Kernel-0.01/CPP
Code-Duplication-Linux_Kernel-0.01-CPP.xlsx
CodeMRI-Refactoring-ROI-Linux_Kernel-0.01-CPP.xlsx
Schedule-Estimator-Linux_Kernel-0.01-CPP.xlsx
CodeMRI-Linux_Kernel-0.01-CPP.xlsx
Detail-Worksheet-Linux_Kernel-0.01-CPP.xlsx

Evaluate software developed by third parties without access to their source code

The produce_reports command run in vault2 (example above kicks) off the same job chain we saw earlier in vault1. Note however, that this time many of the early jobs are skipped. This is because early jobs in the job chain were already executed in vault1.

In fact:

All jobs that scan source code were executed before the portable ‘pack’ was created
No source code is contained inside the ‘pack’
Producing reports in vault2 requires no access to the source code
This enables report generation, analysis, and evaluation to be done without direct access to the source code

Analyzing code distributed across an organization and centralizing CodeMRI data to produce Portfolio Reports

Go through the example above.

Imagine that the ‘packs’ or ZIP files came from multiple teams distributed across your organization or outside third parties. They could have come from hundreds of organizations.

Then, run the following command to produce Excel versions of the CodeMRI Portfolio reports

cmri job run produce_portfolio --selection '*/*' --vault $HOME/vault2

When this job is complete, your vault will have a new reports/Portfolio directory containing a portfolio report

[dan@fedora ~]$ ls $HOME/vault2/reports/
Linux_Kernel
Portfolio

[dan@fedora ~]$ ls $HOME/vault2/reports/Portfolio
Portfolio.xlsx

Sanitizing data about your system before sharing with others

Most data in your packs (the Zipfiles) or in the reports that are generated about your system are not highly sensitive. Neither contains source code. The packs contain information about the geometry (network structure) of your codebase and metrics associated with files and other entities it contains.

Nevertheless, in some situations entity names, such as the names of source code files themselves, might be too sensitive to transmit. In this event, it is possible to mangle the names of identifiers in the packs and in the reports they generate. Outsiders will only be able to see reports with de-identified names.

Let’s continue with our example from above.

CodeMRI is run by the code owner, who is using vault1

CodeMRI is run by a person who needs to understand system health, but who cannot cave access to sensitive filename information. This person is using vault2

Assume we have already completed steps above in earlier examples

What’s in the vault before anonymization?

Let’s look at the contents of vault1 to see where portable metadata is stored. The directory contains a database file called silverthread_system.sqlite. This database is a very important component of the portable metadata that gets shipped in a pack:

$ ls -l vault1/projects/Linux_Kernel/systems/Linux_Kernel-0.01/data/
...
-rw-rw-r--. 1 dan dan  2265088 Dec  1 16:58 silverthread_system.sqlite
...

This sqlite file has tables including a table that stores the names of source code files, code-specific entities (names of classes, functions, etc.), and information about the system itself - such as its name (Linux) and version (0.01). Other information in this database stores metrics for these files and entities.

Create an anonymous version of the silverthread database

The data sender sender connects to vault1 and runs the cmri interactive shell

[sender@fedora ~]$ cmri shell --vault $HOME/vault1

Inside cmri attached to vault1, sender should do the following to anonymize all three codebases

# $
select */*
job run anonymize_database

What’s in the vault after anonymization?

$ ls -l vault1/projects/Linux_Kernel/systems/Linux_Kernel-0.01/data/
...
-rw-rw-r--. 1 dan dan    44052 Dec  1 18:27 silverthread_system_decoder.xlsx
-rw-rw-r--. 1 dan dan  2265088 Dec  1 18:27 silverthread_system_known.sqlite
-rw-rw-r--. 1 dan dan  2265088 Dec  1 18:27 silverthread_system.sqlite
...

Mapping between real and anonymous names

Open the ‘decoder’ Excel file to see the mapping between normal and anonymous names. The file contains 4 tabs - for files, directories, entities, and system metadata.

File names	Directory names	Entity names

Portable database contents

File information - plaintext
File information - anonymous

Entity information - plaintext
Entity information - anonymous

Pack anonymized portable metadata to send to a third party

Once you have anonymized the database, simply run the pack command as you did before. This time the resulting ZIP file will have desensitized data.

cmri job run pack --selection 'Linux_Kernel/Linux_Kernel-0.01'

Produce reports with anonymized names to send to a third party

Once the database is anonymized, just run the usual command to produce reports. The resulting Excel files will have anonymized names

job run produce_reports

For example, one of the CodeMRI Diagnostics reports for Linux-0.01 now looks like the following

Reset your vault to use normal names again

To reset a system in your DataVault so that it again exports ZIP files and produces reports with normal names, undo the job that produced the anonymized database:

job clean anonymize_database

Review CodeMRI data before release in sensitive or classified environments

In some situations, security practices must be put in place to ensure that sensitive information about a codebase is not released publicly. Examples of such systems include banking software, control code for nuclear power plants, and Top Secret software developed by the department of defense.

Silverthread’s portable metadata formats were designed in partnership with the United States Air Force. A process was devised to audit and authorize the release of CodeMRI’s ‘pack’ data in ZIP files.

Top Secret code can be scanned in a protected environment
ZIP files can be can be reviewed to ensure that they do not contain sensitive data
Those ZIP files have been sent to Silverthread and others.

If you are in the DoD community and want to know more about this authorization and the process involved, please contact Silverthread.

If you are in a sensitive or highly regulated industry and want to know how we can meet your security requirements, please contact Silverthread.

CodeMRI Administration Examples: Moving Data and Managing DataVaults