bdpg Top Level Control

Bill Langford

2018-10-18

CAUTION about spacing issues and errors from yaml

Many examples in this vignette are taken from yaml files. If you try to cut and paste code from the text boxes in this document into yaml files of your own, you may find that they don’t work. Yaml is hypersensitive to spacing and indentation and the formatting during generation of this vignette may not preserve spaces and indentation.

If you’re having trouble getting the tzar project.yaml file to work, one thing to try is to cut and paste from the example yaml files associated with this vignette rather than from this document. Even then, you have to be careful. If things still don’t work right, the first things to check are:

Introduction to generating sets of runs

There are currently two main ways of doing experiments using bdpg. One is to do a single type of action repeatedly, e.g, create a new biodiversity problem or run a reserve selector on an existing problem. The other is perform many different actions at one time, e.g., create a new problem then add error to it and wrap another problem around it and run various reserve selections on all of these different problems. These two views are represented by the bdpg functions called single_action_using_tzar_reps and gen_4_basic_variants.

Single action runset

The single action view is most useful when running a large number of experiments that will take a long time and you want to be able to examine the all of the results of one type of action while other actions are still running. For example, computing igraph network metrics on a problem is currently quite slow, but the results are only relevant to one kind of experimental question. Consequently, it may be more practical to generate a set of problems and run reserve selectors on them before doing any of the network calculations. Then, you can examine the reserve selector performance while the network metrics are running. Similarly, you may want to add a new kind of reserve selector after all of the other experiments have been done and so you want to just run that reserve selector across all of the existing problems.

4 variants runset

The 4 variants view is most useful when developing some new option that needs exercising over the full set of problem types, i.e., the four variants. These are a basic correct problem, a basic apparent problem, a wrapped correct problem, and an apparent version of a wrapped problem. A different use of the 4 variants approach is in generating a large initial set of problems and results all at one time. In that case, a generator would just run the 4 variants code many times with different random seeds.

Unlike the single action view, the 4 variants view does not generally need a list of input files to act over.

Custom runset

While these two views are the most commonly used ones, you can write a custom handler that does something more specific to your own needs, e.g., a method that generates many apparent problems for a given existing correct problem and then runs network metrics and reserve selectors on each of the apparent problems.

================================================================================

Top level R code required to use these runset controls

Here is some example mainline code for calling these runset controllers (without any error checking) given an input parameter list generated by tzar or by your own code.

if (parameters$gen_4_basic_variants)
    {
    gen_4_basic_variants (parameters, round)

    } else if (parameters$single_action_using_tzar_reps)
    {
    single_action_using_tzar_reps (parameters, round)

    } else
    {
    stop ("No matching actions found.")
    }

================================================================================

Specifics of each view

Single action overview

Except for the gen_COR_prob action, all of the single actions require you to specify the location of an rds file or set of locations of rds files to use as input to the desired action. If you’re specifying more than one rds file, then you will also have to give a value for the cur_input_prob_idx variable to index into the set of rds files. All of this is detailed below, but in short, it means that most single actions will include commands similar to the following in the yaml file:

base_params:
    single_action_using_tzar_reps: TRUE
    run_network_metrics_on_prob: TRUE
    prob_src:  rds_file_set_from_file
    rds_file_set_path: "~/Work/net_rds_input_file_paths.txt"

repetitions:
    generators:
        - key: cur_input_prob_idx
          generator_type: linear_step
          start: 1
          step_size: 1
          count: 3

Details of each aspect of single actions are given below, however, if you prefer to start from examples, examples of each action are given at the end of this section. Very specific details of exactly how the single action view works can be found in the code for the function single_action_using_tzar_reps() and other functions in the source file single_action_using_tzar_reps.R.

Finally, note that each single action creates a separate tzar run for each value of parameters$cur_input_prob_idx when running under tzar.

Control of a single action runset

  • Input parameter single_action_using_tzar_reps set to TRUE
  • One and only one of the recognized actions below can be requested if single_action_using_tzar_reps parameter is TRUE

Recognized single action parameters

  • gen_COR_prob
    • Calls gen_single_bdprob_COR ()
    • no input file specification required
  • gen_WRAP_prob
    • Calls gen_single_bdprob_WRAP ()
    • Requires ???
  • gen_APP_prob
    • Calls gen_single_bdprob_APP ()
    • Requires ???
  • run_rs_on_COR_prob
    • One or more reserve selectors will be run based on the settings of reserve selector variables elsewhere in the parameters list.
    • Calls do_COR_rs_analysis_and_output ()
    • Requires ???
  • run_rs_on_APP_prob
    • One or more reserve selectors will be run based on the settings of reserve selector variables elsewhere in the parameters list.
    • Calls do_APP_rs_analysis_and_output ()
    • Requires ???
  • run_network_metrics_on_prob
    • One or more network metric calculations will be run based on the settings of network metric variables elsewhere in the parameters list.
    • Calls init_object_graph_data ()
    • Requires:
      • Setting net_batch_out_dir_name parameter to specify the location where the modified problems will be written out after the network results are attached to them.
      • Setting problem source parameters to specify the location where the input problem(s) “.rds” files will be found. See “Problem source specification for a single action runset”

Problem source specification for a single action runset

Any problem that is to be acted on by one of the single action commands, is read in from a “.rds” file. These are files in the format R uses to save objects to disk. Whenever bdpg creates a problem, it saves that problem to disk in a “.rds” file using R’s saveRDS() function and that saved file can be used as input to a single action command.

There are 3 boolean parameters that can be used to specify the problem source for a single action and each of these has a complementary parameter specifying the location of the associated information.

  • rds_file
    • When TRUE, it specifies that just one problem will be read.
    • The location will of the rds file is given in a parameter named rds_file_path.
    • Example yaml:

      prob_src:  "rds_file"
      rds_file_path: "~/Work/some_problem.rds"
  • rds_file_set_from_file
    • Specifies that one or more problems will be read and their locations will come from a text file that contains one location per line.
    • The location of the text files is given in a parameter named rds_file_set_path.
    • Example yaml:

      prob_src:  "rds_file_set_from_file"
      rds_file_set_path: "~/Work/Tzar_input_files/NET_rds_input_file_paths.txt"
    • Example input text file “~/Work/Tzar_input_files/NET_rds_input_file_paths.txt” containing locations:

      ~/tzar/outputdata/bdpgxupaper/default_runset/49_marxan_simulated_annealing/RSprob-COR-Base.fb9e78fc-5a6d-49b7-b8a9-74e918a86c92/saved.RSprob-COR-Base.fb9e78fc-5a6d-49b7-b8a9-74e918a86c92.rds
      ~/tzar/outputdata/bdpgxupaper/default_runset/51_marxan_simulated_annealing/RSprob-COR-Base.7151efe2-e13b-428b-ac76-62edc3b6b6a5/saved.RSprob-COR-Base.7151efe2-e13b-428b-ac76-62edc3b6b6a5.rds
      ~/tzar/outputdata/bdpgxupaper/default_runset/50_marxan_simulated_annealing/RSprob-COR-Base.0881452f-709a-4a4e-bd4c-9f502e1306c5/saved.RSprob-COR-Base.0881452f-709a-4a4e-bd4c-9f502e1306c5.rds
  • rds_file_set_from_yaml_array
    • Specifies that one or more problems will be read and their locations will come from an array that contains one location per line and the array is given directly in the yaml file .
    • The locations of the rds files are given in a parameter rds_file_set_yaml_array.
    • Example yaml:

      prob_src: "rds_file_set_from_yaml_array"
      rds_file_set_yaml_array:
      - "~/tzar/outputdata/bdpgxupaper/default_runset/49_marxan_simulated_annealing/RSprob-COR-Base.fb9e78fc-5a6d-49b7-b8a9-74e918a86c92/saved.RSprob-COR-Base.fb9e78fc-5a6d-49b7-b8a9-74e918a86c92.rds"
      - "~/tzar/outputdata/bdpgxupaper/default_runset/51_marxan_simulated_annealing/RSprob-COR-Base.7151efe2-e13b-428b-ac76-62edc3b6b6a5/saved.RSprob-COR-Base.7151efe2-e13b-428b-ac76-62edc3b6b6a5.rds"
      - "~/tzar/outputdata/bdpgxupaper/default_runset/50_marxan_simulated_annealing/RSprob-COR-Base.0881452f-709a-4a4e-bd4c-9f502e1306c5/saved.RSprob-COR-Base.0881452f-709a-4a4e-bd4c-9f502e1306c5.rds"

Building sets of “.rds” file locations

While the sets of “.rds” file locations can be generated by hand, an easier way is often to pipe the unix find and grep commands into a file, as in the following example:

find ~/tzar/outputdata/single_action_WRAP_prob/default_runset | 
    grep "saved" > 
        WRAP_rds_input_file_paths.txt

Required index variable

Single actions that are to be run across a set of problems need to have an index variable for that set passed in as an element of the parameters list. This is essentially an index to use in a for loop over the elements of the set. Tzar’s generators load this variable automatically if you set up a repetitions section in the project.yaml file that specifies cur_input_prob_idx as the key. The index just specifies which element of the set will be accessed on a given run.

  • This variable must be called cur_input_prob_idx to satisfy what single_action_using_tzar_reps() expects.
  • The only thing you need to change in the example yaml code below is the value given to count. It just has to match the number of lines/files given in the yaml array or text file of rds file locations that you are feeding to bdpg.
repetitions:
    generators:
        - key: cur_input_prob_idx
          generator_type: linear_step
          start: 1
          step_size: 1
          count: 3

Running under tzar vs. under tzar emulation

If you’re running repetitions that have a count > 1, then tzar emulation can’t handle that. It will run tzar to generate a directory for each of the runs, but it will only run your code on the last run of the set. If you’re going to run the code under tzar instead of under the emulator, there are several things you need to do:

  • Set the tzar emulation flag to FALSE in tzar_emulation.yaml.
emulating_tzar:                         FALSE     #TRUE
  • Copy model.R.tzar into model.R so that tzar will find a model.R. (The tzar emulator normally deletes the model.R file so that it doesn’t interfere with building a package.)
cp model.R.tzar model.R
  • Run the tzar execlocalruns command if you’re going to run the code on your local machine. If you’re going to do the runs on a cluster of machines using tzar’s pollandrun facility, a different command will be necessary and for simplicity, is not explained here.
> java -jar <your-tzar-jar-dir>/tzar.jar execlocalruns <your-R-code-dir>

Created 3 runs. 

Outputdir: ~/tzar/outputdata/Easy/default_runset/15_Easy_single_action_gen_3_COR_prob.inprogress 
Running model: ~/D/Projects/ProblemDifficulty/pkgs/bdpgxupaper/R, run_id: 15, Project name: Easy, Scenario name: Easy_single_action_gen_3_COR_prob, Flags:  
Run 15 succeeded. 

Outputdir: ~/tzar/outputdata/Easy/default_runset/16_Easy_single_action_gen_3_COR_prob.inprogress 
Running model: ~/D/Projects/ProblemDifficulty/pkgs/bdpgxupaper/R, run_id: 16, Project name: Easy, Scenario name: Easy_single_action_gen_3_COR_prob, Flags:  
Run 16 succeeded. 

Outputdir: ~/tzar/outputdata/Easy/default_runset/17_Easy_single_action_gen_3_COR_prob.inprogress 
Running model: ~/D/Projects/ProblemDifficulty/pkgs/bdpgxupaper/R, run_id: 17, Project name: Easy, Scenario name: Easy_single_action_gen_3_COR_prob, Flags:  
Run 17 succeeded. 

Executed 3 runs: 3 succeeded. 0 failed 

Tzar-based examples

This section gives an example of the tzar project.yaml file commands for each of the recognized single action types.

  • gen_COR_prob

    gen_COR_prob: TRUE
  • gen_WRAP_prob on base COR problem

    gen_WRAP_prob: TRUE
    prob_src: "rds_file_set_from_file"
    rds_file_set_path: "~/Work/Tzar_input_files/COR_rds_input_file_paths.txt"
  • gen_APP_prob on base COR problem

    gen_APP_prob: TRUE
    prob_src: "rds_file_set_from_file"
    rds_file_set_path: "~/Work/Tzar_input_files/COR_rds_input_file_paths.txt"
  • gen_APP_prob on wrap COR problem

    #find ~/tzar/outputdata/bdpgxupaper_single_action_WRAP_prob/default_runset | grep saved > WRAP_rds_input_file_paths.txt
    
    gen_APP_prob: TRUE
    prob_src: "rds_file_set_from_file"
    rds_file_set_path: "~/Work/Tzar_input_files/WRAP_rds_input_file_paths.txt"
  • run_rs_on_COR_prob on base COR problem

    run_rs_on_COR_prob: TRUE
    prob_src: "rds_file_set_from_file"
    rds_file_set_path: "~/Work/Tzar_input_files/COR_rds_input_file_paths.txt"
  • run_rs_on_APP_prob on base APP problem

    #find ~/tzar/outputdata/bdpgxupaper_single_action_APP_of_cor_base_prob/default_runset | grep saved >     ~/Work/Tzar_input_files/APP_cor_rds_input_file_paths.txt
    
    run_rs_on_APP_prob: TRUE
    prob_src: "rds_file_set_from_file"
    rds_file_set_path: "~/Work/Tzar_input_files/APP_cor_rds_input_file_paths.txt"
  • run_rs_on_APP_prob on wrap APP problem

    #find ~/tzar/outputdata/bdpgxupaper_single_action_APP_of_cor_base_prob/default_runset | grep saved >     ~/Work/Tzar_input_files/APP_cor_rds_input_file_paths.txt
    
    run_rs_on_APP_prob: TRUE
    prob_src: "rds_file_set_from_file"
    rds_file_set_path: "~/Work/Tzar_input_files/APP_wrap_rds_input_file_paths.txt"
  • run_network_metrics_on_prob

    #find ~/tzar/outputdata/bdpgxupaper_single_action_APP_of_cor_base_prob/default_runset | grep saved >     ~/Work/Tzar_input_files/APP_cor_rds_input_file_paths.txt
    
    run_network_metrics_on_prob: TRUE
    prob_src: "rds_file_set_from_file"
    rds_file_set_path: "~/Work/Tzar_input_files/APP_wrap_rds_input_file_paths.txt"

R only examples

This section gives an example of R code to invoke each of the recognized single action types if you were not using the tzar project.yaml file to control the bdpg code.

  • gen_COR_prob

    num_runs = 3
    parameters <- list (gen_COR_prob = TRUE)
    for (cur_idx in 1:num_runs)
    {
    parameters$cur_input_prob_idx = cur_idx
    single_action_using_tzar_reps (parameters, round)
    }
  • gen_WRAP_prob

  • gen_APP_prob

  • run_rs_on_COR_prob

  • run_rs_on_APP_prob

  • run_network_metrics_on_prob

4 variants view

Simple examples

Control of a 4 variants runset

  • Input parameter gen_4_basic_variants set to TRUE

Recognized 4 variants parameters

Input source specification for a 4 variants runset

Custom runset

Simple examples

Control of a Custom runset

  • a yaml or parameters list keyword
  • a function to call when the keyword is invoked
  • possibly a list of input files or objects

Recognized Custom parameters

Input source specification for a Custom runset

================================================================================

Figures

The figure sizes have been customised so that you can easily put two images side-by-side.

plot(1:10)
plot(10:1)

You can enable figure captions by fig_caption: yes in YAML:

output:
  rmarkdown::html_vignette:
    fig_caption: yes

Then you can use the chunk option fig.cap = "Your figure caption." in knitr.

More Examples

You can write math expressions, e.g. \(Y = X\beta + \epsilon\), footnotes1, and tables, e.g. using knitr::kable().

mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

Also a quote using >:

“He who gives up [code] safety for [code] speed deserves neither.” (via)

================================================================================


  1. A footnote here.