Managing a research project with Sumatra

Before reading this, we recommend reading Getting started for a quick introduction to Sumatra. This document expands on the information presented there, giving a fuller picture of the available options.

Setting up your project

To create a Sumatra project, run:

$ smt init ProjectName

in your working directory. smt init has many options (see smt command reference for a full list), but in general, smt configure has the same options, so any of the options can be changed later. To see the current configuration of your project, run smt info.

Telling Sumatra about your code

Sumatra expects to find a version control repository in your working directory or one of its parent directories. If you haven’t yet cloned your repository, Sumatra will do it for you if you give the --repository argument:

$ smt init --repository=https://github.com/someuser/MyCode

If you usually run the same main program, you can store this as a default using the --executable option:

$ smt init --executable=python

Here you can give a full path, or just the name; in the latter case Sumatra will use the PATH environment variable to search for an executable with that name, and use the first one it finds.

For interpreted languages, it is also possible to set a default script file using the --main option:

$ smt init --main=myscript.py

In this case you must give the full path, either relative to the working directory or an absolute path.

Any time you run some code with Sumatra, it checks whether the code has changed (using your version control system). If it has, then by default Sumatra will refuse to run until you have committed your changes. This is so that the exact version of the code you run is always recorded. As an alternative, Sumatra can store the difference between the code in version control and the code you run. To enable this option, run:

$ smt init --on-changed=store-diff

You can always change back to the “strict” setting with:

$ smt configure --on-changed=error

Handling input and output data

Sumatra tracks both input and output data files. For output data files it is important to tell Sumatra in which directory the files will be created, to avoid it having to search the entire disk:

$ smt init --datapath=results

Now Sumatra will recursively search results and all sub-directories for any new files created during the computation. If your computations overlap in time there is a risk that Sumatra will mix up the files. A solution to this problem is given in the Frequently asked questions. If you don’t specific the output data directory, Sumatra will assume it is a directory called “Data”.

The default directory for input data files is the filesystem root; this can also be changed with the --input option. Note that input data files are only tracked if they are passed to your program as command-line arguments. This limitation will be removed in future.

For more on data handling, see Input and output data.

Storing Sumatra records

Sumatra supports multiple back-end databases for storing records. More information about how to choose a storage back-end is given in Record stores.

Running your code

To track a computation with Sumatra, you can either use the smt tool or write your own Python scripts using the Sumatra API (see Using the Sumatra API within your own scripts).

A typical way to run a computation with smt is:

$ smt run --executable=matlab --main=myscript.m input_file1 input_file2

or:

$ smt run -e matlab -m myscript.m input_file1 input_file2

using the short versions of the arguments. Note that input_file1 and input_file2 may be parameter/configuration files or data files. If the former, they will be treated specially, see Parameter files.

Note that if you are not using an interpreted language, only the –executable argument is needed. If you have set default values for the executable and/or main script in the program configuration, the smt run command can be simplified, e.g.:

$ smt configure -e matlab -m myscript.m
$ smt run input_file1 input_file2

Running different versions

If you want to run a previous version of your code, rather than the currently checked-out version, use the --version option:

$ smt run --version=3e6f02a

Note that this will not overwrite any uncommitted changes; rather Sumatra will refuse to run until you have committed, stashed, reverted, etc. your changes. Sumatra will also not return to the most recent version after the run: future runs with no version specified will continue to use the older version.

Labels

To identify your computation, a unique label is required. Sumatra can generate this for you automatically, or this can be specified using the --label option:

$ smt run --label=test0237

Two formats are available for automatically-generated labels, timestamp-based (the default), and uuid-based.

$ smt configure –labelgenerator=uuid $ smt configure –labelgenerator=timestamp –timestamp_format=%Y%m%d-%H%M%S

Command-line options

If your own program has its own command-line options of the form --option=value, smt run will try to interpret these as Sumatra command-line parameters (options of the form --option value, without the equals sign, are fine). To avoid this, use the --plain configuration option:

$ smt configure --plain

(--no-plain turns this off).

Reading from stdin, writing to stdout

If your program reads from stdin and/or writes to stdout, i.e. you would normally run it using:

$ myprog < input.txt > output.txt

then you can tell Sumatra to run it the same way, but in addition to track the input/output file, using:

$ smt run -e myprog -i input.txt -o output.txt

Commenting

Sumatra offers two ways to attach comments to your computations. When you launch a computation, you can give the reason for running it, e.g. what hypothesis you are testing:

$ smt run --reason="Test the effect of using a low-pass filter"

Once the computation is finished, you can comment on the outcome:

$ smt comment "Doesn't seem to make much of a difference"

By default, the comment is attached to the most recent computation. You can also comment on an older record, by giving its label:

$ smt comment 20150423-235351 "Didn't work due to a bug"

You can comment multiple times on the same record. By default, the new comment will be appended to the old one. To overwrite the old comment, use the “–replace” flag. If you would like to attach a longer comment than will fit on one line, or a more structured comment, you can write your comment in a temporary text file and then attach that to the record:

$ smt comment --file comment.txt

Both the “reason” and “outcome” fields can be edited in the web browser interface. To add headings, sub-headings, hyperlinks, emphasis, etc. in a comment, you can use reStructuredText markup, which will be rendered as HTML.

Tagging

To structure your project, and make it easier to find the most interesting results, you can add tags to your records, either through the web browser interface or on the command line, e.g.:

$ smt tag "Figure 5" 20141203-093401

If you omit the record label, the most recent computation will be tagged.

Tags may contain spaces, but in this case must be contained in quotes. You can tag multiple records at the same time:

$ smt tag modelA 20141203-093401 20141203-122344 20150109-194344

You can also remove tags:

$ smt tag --remove modelA 20141203-122344

Viewing and searching results

The easiest way to review your project is to use the web browser interface - see Using the web interface for more information on this. It is also possible, however, to view computation records on the command line, or to export the information to other formats such as HTML and LaTeX.

$ smt list

lists the labels of all records. When you have a lot of records, it will probably be more useful to filter by tags:

$ smt list tag1 tag2 tag3

will only show records that have been tagged with all the tags in the list. To show fuller information about each record, use the “–long/-l” option:

$ smt list -l

The order of records can be reversed using the “–reverse/-r” flag.

By default, the output is formatted for the console. Several other output formats are also available, for example LaTeX:

$ smt list --long --format=latex > myproject.tex
$ pdflatex myproject

You can customize the LaTeX output by copying the default template from sumatra/formatting/latex_template.tex to the .smt subdirectory of your project, and then modifying it. The template uses the Jinja2 templating language.

Comparing records

The web browser interface allows side-by-side comparison of pairs of records. A more limited comparison is available on the command-line with smt diff.

Deleting records

If your last computation failed because of a silly bug:

$ smt delete

will remove it. Older records can be deleted by giving a list of labels:

$ smt delete 20141203-093401 20141203-122344

or a list of tags:

$ smt delete --tag tag1

If you want to also delete the data files generated by the computations, add the “–data” flag. Records can also be deleted through the web browser interface.

Relocating a project

If you need to move a Sumatra project to a new directory or a new computer, first copy all the files, ensuring that the .smt directory and its contents are also copied. We strongly suggest you also take a backup, and check carefully that everything is working correctly in the new location before deleting the original.

You will next need to update the project configuration to reflect the new location. Supposed you created your project with the default settings, so that your record store is in .smt/records and your output data is stored under the Data subdirectory, run the following in the new location:

$ smt configure --store .smt/records --repository . --datapath Data

Alternatively, you can manually edit .smt/project.

If you have also moved the data associated with your project, you will need to update the paths stored in the record store:

$ smt migrate --datapath Data

If you are using the archive or mirror options for your data (see Input and output data) you may also need to migrate those paths/URLs.