Using the Sumatra API within your own scriptsΒΆ
Using the smt run
command is quick and convenient, but it does require you to
change the way you launch your simulations/analyses.
One way to avoid this, if you use Python, is to use the sumatra
package within your own
scripts to perform the record-keeping tasks performed by smt run
.
You may also wish to write your own custom script for creating a Sumatra project,
instead of using smt init
, but we do not cover this scenario here.
We will start with a simple example script, a dummy simulation, that reads a parameter file, generates some random numbers, and writes some data to file:
import numpy
import sys
def main(parameters):
numpy.random.seed(parameters["seed"])
distr = getattr(numpy.random, parameters["distr"])
data = distr(size=parameters["n"])
output_file = "example.dat"
numpy.savetxt(output_file, data)
parameter_file = sys.argv[1]
parameters = {}
execfile(parameter_file, parameters) # this way of reading parameters
# is not necessarily recommended
main(parameters)
Let’s suppose this script is in a file named myscript.py
, and that we have a
parameter file named defaults.param
, which contains:
seed = 65784
distr = "uniform"
n = 100
Without Sumatra, we would normally run this script using something like:
$ python myscript.py defaults.param
To run the script using the smt
command line tool, we would use:
$ smt run --reason="reason for running this simulation" defaults.param
(This assumes we have previously used smt init
or smt configure
to specify that
our executable is python
and our main file is myscript.py
.)
To benefit from the functionality of Sumatra without having to use smt run
, we
have to integrate the steps performed by smt run
into our script.
First, we have to load the Sumatra project:
from sumatra.projects import load_project
project = load_project()
We’re going to want to record the simulation duration, so we import the standard
Python time
module and record the start time:
import time
start_time = time.time()
We need to slightly modify the procedure for reading parameters. Sumatra stores
the parameters for later use in searching and comparison, so they need to be
transformed into a form Sumatra can use. This is very simple, we just replace
the execfile()
call with a build_parameters()
call:
from sumatra.parameters import build_parameters
parameters = build_parameters(parameter_file)
Now we create a new Record
object, telling it that the script is the
current file; this automatically registers information about the simulation environment:
record = project.new_record(parameters=parameters,
main_file=__file__,
reason="reason for running this simulation")
Now comes the main body of the simulation, which is unchanged except that we take the opportunity to give the output data file a more informative name by adding the record label to the parameter file:
output_file = "%s.dat" % parameters["sumatra_label"]
At the end of the simulation, we calculate the simulation duration and search for newly created files:
record.duration = time.time() - start_time
record.output_data = record.datastore.find_new_data(record.timestamp)
Now we add this simulation record to the project, and save the project:
project.add_record(record)
project.save()
Putting this all together:
import numpy
import sys
import time
from sumatra.projects import load_project
from sumatra.parameters import build_parameters
def main(parameters):
numpy.random.seed(parameters["seed"])
distr = getattr(numpy.random, parameters["distr"])
data = distr(size=parameters["n"])
output_file = "%s.dat" % parameters["sumatra_label"]
numpy.savetxt(output_file, data)
parameter_file = sys.argv[1]
parameters = build_parameters(parameter_file)
project = load_project()
record = project.new_record(parameters=parameters,
main_file=__file__,
reason="reason for running this simulation")
parameters.update({"sumatra_label": record.label})
start_time = time.time()
main(parameters)
record.duration = time.time() - start_time
record.output_data = record.datastore.find_new_data(record.timestamp)
project.add_record(record)
project.save()
Now you can run the simulation in the original way:
python myscript.py defaults.param
and still have the simulation recorded in your Sumatra project. For such a
simple script and simple run environment there is no advantage to doing it this
way: smt run
is much simpler. However, if you already have a fairly complex run
environment, this provides a straightforward way to integrate Sumatra’s
functionality into your existing system.
You will have noticed that much of the Sumatra code you have to add is
effectively boilerplate, which will be the same for all your scripts. To save time,
and typing therefore, Sumatra provides a @capture
decorator for your
main()
function:
import numpy
import sys
from sumatra.parameters import build_parameters
from sumatra.decorators import capture
@capture
def main(parameters):
numpy.random.seed(parameters["seed"])
distr = getattr(numpy.random, parameters["distr"])
data = distr(size=parameters["n"])
numpy.savetxt("%s.dat" % parameters["sumatra_label"], data)
parameter_file = sys.argv[1]
parameters = build_parameters(parameter_file)
main(parameters)
This is now hardly any longer than the original script.