Skip to content

Configuration Files

Albert Huang edited this page Jun 25, 2014 · 5 revisions

Introduction

Configuration files are the best way to configure the tool. The alternative way of configuring the tool is via command-line arguments. Although powerful, command-line arguments are known to be very verbose, and can consequently, if used, cause confusion in the future.

Command-line arguments are implemented so that they can serve as overrides. If there are any command line arguments specified, they override any existing configuration specified in the configuration file.

Typical usage can look like:

pyradmon.py --config-file=config.yaml plot

If overrides are needed:

pyradmon.py --config-file=config.yaml plot --plot-define-axes="plot1|sub1|x:ticks=5,label=Hello world!"

Users should always use the configuration file if possible. The configuration file offers a much easier way to configure things, and offers as much power as the command line does. The command line arguments are only to override settings, such as when testing changes to configuration without modifying the configuration file itself.

Introduction to Configuration

"OK, OK, I'm convinced. I'm going to write configuration files. But what are they?"

Configuration files have two main components:

  • Data sourcing configuration
  • Plot generation configuration

The former is a very short configuration defining your source data. The latter is a slightly longer configuration defining the plots to be created.

Both parts of the configuration are stored in one configuration file. Only one configuration file is loaded on execution. The configuration is stored in an easy-to-read format called YAML. That said...

YAML - YAML Ain't Markup Language

(Yes, that's correct. It's a recursive acronym!)

Introduction

YAML is a markup language (despite its self-denial). In their own words:

YAML is a human friendly data serialization standard for all programming languages.

Why YAML? Why not _____?

YAML is a very clean and organized format, and works perfectly for PyRadmon configuration. Other solutions didn't seem to do well in achieving simple and concise configuration.

  • XML: XML is an old but commonly used markup language. For PyRadmon, it has a hierarchical structure and various configuration styles that would prove very useful for us. However, it is very verbose. It also isn't fun to write XML, nor is it very human readable. For data sourcing configuration, this may be fine, but for plot generation configuration, this would be a nightmare to work with.
  • JSON: JSON is not as old, and in many places it is also a commonly used markup language. (This is especially true for web applications, as it translates directly to Javascript objects.) For PyRadmon, it has a strong hierarchical structure and various configuration styles and types that would prove very useful for us. However, it is a bit verbose, especially with regards to the brackets and commas required. It isn't as bad as XML to write, but it can be tedious, and it may not look very nice unless it is really spaced out.
  • ConfigParser: ConfigParser is a configuration used often in Python, and its style is well known and used everywhere in Windows .ini files. It is a very easy to use and easy to read configuration standard. However, it lacks the hierarchical features that we would need, and it only stores strings for configuration. (To be precise, if you stored "[ 1, 2, 3 ]", it would not be turned into a native type of array(1, 2, 3), and just remain as a string. You could eval() it, but that would be a bad idea...)

And finally:

  • YAML: YAML is a new-ish configuration format. It was created around the time JSON was created, but only received attention very recently. It inherits all of the positives of JSON, plus it is easy to read, write, and use! The only catch is that there are some strange aspects to its format (and it's VERY strict about it), but otherwise, it's a very capable format!

For samples to compare with, see footer.

Convinced? Let's learn some YAML!

YAML Primer

Let's use an example to work off of:

data:
  name: Grocery Store Sales
  items: [ pizza, chicken, bacon ]
  specials:
    - soup
    - ice cream
  locations:
    chicago:
      phone_number: 123-456-7890
      address: 123 Main St., Chicago, IL USA
      inventory: 9000
      open: true
      comments: Serves Chicago style pizza.
    new_york:
      phone_number: 987-654-3210
      address: 321 Main St., New York, NY USA
      inventory: 1337
      open: false
      comments: null
  slogan: |-
    We don't just sell grocery...
    ...we live and breathe it!

Organization

The first thing that you might notice is the spacing. It's a very clean spacing, with increased spacing for a certain block of data, and less for others. YAML is organized by spacing - to be specific, each level of data is 2 spaces ahead or backward.

In our example above, our block data has its elements indented with 2 spaces to indicate that those are its own elements. This can be repeated again and again to further levels, therefore forming the hierarchy that we've mentioned previously.

The Data

There are 5 basic kinds of data:

  • Integers
  • Strings
  • Booleans
  • Nulls
  • Lists
  • Associative arrays

Integers are simply numbers. For instance, in the example above, 9000 is a number.

Strings are just strings. In YAML, you may specify them with or without quotes. Quotes are useful when you want to force something to be a string. (For instance, specifying '5' instead of 5 makes a string instead of an integer.) In our example, the address is an example of a string. If you want to store a multi-line string, you can use either a pipe and a dash:

  key_goes_here: |-
    multi value
    string goes here

... or a string with blank lines representing line breaks, surrounded by quotes:

  slogan: "We don't just sell grocery...

    ...we live and breathe it!"

The latter is preferred, since PyYAML (the library for parsing YAML in Python) is able to understand this output better. The former may work, but the latter is the only kind that has been tested. (Outputting YAML from PyYAML uses the latter format as well.)

Booleans store either a true or false value. They are stored simply as true or false. For instance, in the example above, open is a boolean.

Nulls store nothing. They indicate a blank value. They are stored simply as null. In our example, a null can be found in one of our comment attributes, indicating that the comment field is blank.

Lists store an array of data. They translate into an array with any kind of data stored inside of it, homogeneous or non-homogeneous. There are two ways to specify lists - with the traditional bracket syntax, or with dashed bullet points. In our example, items is a traditional array, and specials is a bullet point array. Either way will work!

Associative arrays are arrays that store a key and a value. They are useful for storing attributes. In the example above, anything that has a "key" and a "value" is an associative array! (Specifically, you are looking for some text, followed by a colon, and followed by additional text. The entire YAML example is a large example of an associative array, since data is an associative array. Their values can either be indented on a new line, or placed on the same line.

Configuration Structure and Options

Recall that we have two kinds of components for our configuration:

  • Data sourcing configuration
  • Plot generation configuration

Data Sourcing Configuration

This configuration, as mentioned before, is really simple. It starts with the base, config, and has multiple elements inside it:

config:
  base_directory: MERRA2/
  data_step: anl|ges
  data_start_date: 1991-01-01 00z
  data_end_date: 1991-02-28 18z
  experiment_id: d5124_m2_jan91
  data_instrument_sat: ssmi_f08
  data_channels: 1-7
  data_assim_only: true

Some quick information about each part:

  • base_directory: Directory location of the data. (Mandatory)
  • data_step: The type of data to use. You can specify either anl, ges, or anl|ges. If you need to make a plot or grab some data with a particular type, you need to specify it here! (Mandatory)
  • data_start_date: The start date and time of the range of data you want to use. This is specified in YYYY-MM-DD HHz format.
  • data_start_date: The end date and time of the range of data you want to use. This is specified in YYYY-MM-DD HHz format.
  • experiment_id: The experiment ID you would like to use. (Mandatory)
  • data_instrument_sat: The instrument/satellite combination you would like to use. (Mandatory)
  • data_channels: An integer or an integer range specifying the channels you would like to use.
  • data_assim_only: A boolean (true or false) indicating whether you want to use assimilated data only or not. (This checks if iuse is <0 or not. If it's <0, it's not assimilated. If it's >0, it is.)

Plot Generation Configuration

Ah, the fun part! This configuration, as mentioned before, is NOT as simple. However, once you "get" it, it becomes relatively easy to use!

Introduction to Plotting

In PyRadmon, we use the concept of plot and subplot.

A plot is a single "page" of graphs. Specifically, it is a container of graphs. It has a title, some settings to determine the plot size and formatting, and a output file to save its "page" to.

A subplot is a single graph. It has data, a legend, axes, and a title. You can apply labels to the axes and the data, and set colors for the data. You can also set the number of ticks for the axes. Each axes has an x and a y.

A plot contains one or more subplots.

Plot configuration contains at least one plot.

Lots of italicized words! We definitely want to use an example for this one:

plot1:
  output: plots/test_plot_1_ch4_2plots_only.png
  settings:
    dpi: 50
    target_size: [595, 770]
  title: '%INSTRUMENT_SAT%   %START_DATE%-%END_DATE%

    Channel %CHANNEL%  %FREQUENCY%       %ASSIMILATION_STATUS%

    Global  All    %EXPERIMENT_ID%'
  plots:
  - subplot1_id:
      axes:
        x: {label: null, ticks: 6}
        y: {label: null, ticks: 5}
      data:
        colors: [blue, red]
        labels: ['Avg (K)

            %AVERAGE%', 'Sdv (K)

            %AVERAGE%']
        x: timestamp
        y: [ges|bc_total|mean, ges|bc_total|stddev]
      legend: {border: false, line: true}
      title: Total Bias

Ahh, that's where those italicized words were coming from!

Now that you know the properties of a plot and subplot, let's go into details. The following is a structure of a plot configuration:

PLOT_ID:
  output: OUTPUT_FILE
  settings:
    dpi: DPI
    target_size: [X, Y]
  title: PLOT_TITLE
  plots:
  - SUBPLOT_ID:
      axes:
        x: {label: LABEL, ticks: NUM_OF_TICKS}
        y: {label: LABEL, ticks: NUM_OF_TICKS}
      data:
        colors: [COLOR1, COLOR2]
        labels: [LABEL1, LABEL2]
        x: X_DATA
        y: [Y_DATA_1, Y_DATA_2]
      legend: {border: TRUE_OR_FALSE, line: TRUE_OR_FALSE, title: My Legend Title}
      title: SUBPLOT_TITLE

Lowercase indicates property titles, and uppercase indicates values to be plugged in.

Additional details: (values to be changed are bolded)

  • PLOT_ID - a string indicating an ID for the plot. Can be anything as long as it is a string.
    • output: OUTPUT_FILE - output file path for graph. Can be a PNG or JPG file. PNG is preferred as it has the best quality.
    • settings:
      • dpi: DPI - the DPI, or Dots Per Inch, of the plot. Generally, this indicates how big the graph will look. Experiment carefully! We recommend a value of 50.
      • target_size: [X, Y] - the target (or desired) size of the plot, in pixels, specified as an array of X and Y. If you want a 500x400 px image, you would specify [500, 400] here.
    • title: PLOT_TITLE - the plot title. This will appear on the top. You can have newlines in the title.
    • plots: (This contains a list of associative arrays - note the dashes behind each ID!)
      • SUBPLOT_ID - a string indicating an ID for the subplot. Can be anything as long as it is a string.
        • axes:
          • x:
            • label: LABEL - the X axis label. This will appear underneath the X axis.
            • ticks: NUM_OF_TICKS - the number of ticks for the X axis. This is an integer.
          • y:
            • label: LABEL - the Y axis label. This will appear underneath the Y axis.
            • ticks: NUM_OF_TICKS - the number of ticks for the Y axis. This is an integer.
        • data:
          • colors: [COLOR1, COLOR2] - the colors for each Y data. If there's one, just specify one. If there's multiple, specify multiple. Use canonical color names, e.g. "red", "green", "blue", etc. Avoid strange color names, such as "pink jellyfish", "striking red", "sandwich brown", etc.
          • labels: [LABEL1, LABEL2] - the labels for each Y data. This will show up in the legend. If there's one, just specify one. If there's multiple, specify multiple.
          • x: X_DATA - the data column to use for the X axis. Generally, you would want to use timestamp here for a time-series graph.
          • y: [Y_DATA_1, Y_DATA_2] - the data column(s) to use for the Y axis. If there's only one, just specify one. If there's multiple, specify multiple.
        • legend:
          • border: TRUE_OR_FALSE - specify whether to have a border around the legend or not. (DISABLED - this is not an available option at the moment.)
          • line: TRUE_OR_FALSE - specify whether to have a line for the data in the legend or not. (DISABLED - this is not an available option at the moment.)
          • title: LEGEND_TITLE - the title for the legend.
        • title: SUBPLOT_TITLE - the subplot title.

Finishing up

Save your configuration file as myconfigfile.yaml, where myconfigfile can be any name you'd want.

Then run:

pyradmon.py --config-file=myconfigfile.yaml plot

If all goes well, you'll have a shiny new plot image! (The plot image should be stored in the output path you've specified above. If you didn't specify one... eek!)

Additional Commands

  • List - This command lists the data available for use.

    pyradmon.py --config-file=myconfigfile.yaml list

  • Dump - This command dumps the data specified in your data sourcing configuration. It prints the data out in a very pretty table. It is recommended that you save the output to a file, especially if there are many columns.

    pyradmon.py --config-file=myconfigfile.yaml dump > dump.txt

  • Config - This command provides an interactive display of the configuration you wrote in a very friendly format. We recommend that you maximize the window to ensure that the output is displayed properly. This is very useful for debugging problems with plotting, as this displays very clearly how PyRadmon is interpreting your configuration.

    pyradmon.py --config-file=myconfigfile.yaml config

Debugging

If something crashes, you may be asked for a debug log. Simply add PYRADMON_DEBUG=1 behind pyradmon.py to make a debug log! (NOTE: This is NOT guaranteed to produce a debug log - in the future this will change!)

PYRADMON_DEBUG=1 pyradmon.py --config-file=myconfigfile.yaml plot

That's it!

If you have any bugs, suggestions, or general comments, please let me know!


Footnote: Samples of Markups

  • XML:
<?xml version="1.0" encoding="UTF-8" ?>
<plot1>
    <output>plots/test_plot_1_ch%CHANNEL%.png</output>
    <plots>
        <subplot1_id>
            <axes>
                <x>
                    <label />
                    <ticks>6</ticks>
                </x>
                <y>
                    <label />
                    <ticks>5</ticks>
                </y>
            </axes>
            <data>
                <colors>blue</colors>
                <colors>red</colors>
                <labels>Avg (K)
%AVERAGE%</labels>
                <labels>Sdv (K)
%AVERAGE%</labels>
                <x>timestamp</x>
                <y>ges|bc_total|mean</y>
                <y>ges|bc_total|stddev</y>
            </data>
            <legend>
                <border>false</border>
                <line>true</line>
            </legend>
            <title>Total Bias</title>
        </subplot1_id>
    </plots>
    <settings>
        <dpi>50</dpi>
        <target_size>595</target_size>
        <target_size>770</target_size>
    </settings>
    <title>%INSTRUMENT_SAT%   %START_DATE%-%END_DATE%
Channel %CHANNEL%  %FREQUENCY%       %ASSIMILATION_STATUS%
Global  All    %EXPERIMENT_ID%</title>
</plot1>
  • JSON:
{
    "plot1": {
        "output": "plots/test_plot_1_ch%CHANNEL%.png",
        "plots": [
            {
                "subplot1_id": {
                    "axes": {
                        "x": {
                            "label": null,
                            "ticks": 6
                        },
                        "y": {
                            "label": null,
                            "ticks": 5
                        }
                    },
                    "data": {
                        "colors": [
                            "blue",
                            "red"
                        ],
                        "labels": [
                            "Avg (K)\n%AVERAGE%",
                            "Sdv (K)\n%AVERAGE%"
                        ],
                        "x": "timestamp",
                        "y": [
                            "ges|bc_total|mean",
                            "ges|bc_total|stddev"
                        ]
                    },
                    "legend": {
                        "border": false,
                        "line": true
                    },
                    "title": "Total Bias"
                }
            }
        ],
        "settings": {
            "dpi": 50,
            "target_size": [
                595,
                770
            ]
        },
        "title": "%INSTRUMENT_SAT%   %START_DATE%-%END_DATE%\nChannel %CHANNEL%  %FREQUENCY%       %ASSIMILATION_STATUS%\nGlobal  All    %EXPERIMENT_ID%"
    }
}
  • ConfigParser: Not trivial to write
  • YAML:
plot1:
  output: plots/test_plot_1_ch%CHANNEL%.png
  settings:
    dpi: 50
    target_size: [595, 770]
  title: '%INSTRUMENT_SAT%   %START_DATE%-%END_DATE%

    Channel %CHANNEL%  %FREQUENCY%       %ASSIMILATION_STATUS%

    Global  All    %EXPERIMENT_ID%'
  plots:
  - subplot1_id:
      axes:
        x: {label: null, ticks: 6}
        y: {label: null, ticks: 5}
      data:
        colors: [blue, red]
        labels: ['Avg (K)

            %AVERAGE%', 'Sdv (K)

            %AVERAGE%']
        x: timestamp
        y: [ges|bc_total|mean, ges|bc_total|stddev]
      legend: {border: false, line: true}
      title: Total Bias