User Guide

Your First Module

mymodule.py
import argschema


class MySchema(argschema.ArgSchema):
    a = argschema.fields.Int(default=42, description='my first parameter')


if __name__ == '__main__':
    mod = argschema.ArgSchemaParser(schema_type=MySchema)
    mod.logger.warn("this module does nothing useful")
    print(mod.args)

running this code produces

$ python mymodule.py
{'a': 42, 'log_level': u'ERROR'}
$ python mymodule.py --a 2
{'a': 2, 'log_level': u'ERROR'}
$ python mymodule.py --a 2 --log_level WARNING
WARNING:argschema.argschema_parser:this module does nothing useful
{'a': 2, 'log_level': u'WARNING'}
$ python mymodule.py -h
usage: mymodule.py [-h] [--output_json OUTPUT_JSON] [--input_json INPUT_JSON]
                   [--a A] [--log_level LOG_LEVEL]

optional arguments:
  -h, --help            show this help message and exit

MySchema:
  --output_json OUTPUT_JSON
                        file path to output json file
  --input_json INPUT_JSON
                        file path of input json file
  --a A                 my first parameter (default=42)
  --log_level LOG_LEVEL
                        set the logging level of the module (default=ERROR)

Great you are thinking, that is basically argparse, congratulations!

But there is more.. you can also give your module a dictionary in an interactive session

>>> from argschema import ArgSchemaParser
>>> from mymodule import MySchema
>>> d = {'a':5}
>>> mod = ArgSchemaParser(input_data=d,schema_type=MySchema)
>>> print(mod.args)
{'a': 5, 'log_level': u'ERROR'}

or you write out a json file and pass it the path on the command line

myinput.json
{
    "a":99
}
$ python mymodule.py --input_json myinput.json
{'a': 99, 'log_level': u'ERROR', 'input_json': u'myinput.json'}

or override a parameter if you want

$ python mymodule.py --input_json myinput.json --a 100
{'a': 100, 'log_level': u'ERROR', 'input_json': u'myinput.json'}

plus, no matter how you give it parameters, they will always be validated, before any of your code runs.

Whether from the command line

$ python mymodule.py --input_json ../examples/myinput.json --a 5!
Traceback (most recent call last):
  File "mymodule.py", line 9, in <module>
    mod = argschema.ArgSchemaParser(schema_type=MySchema)
  File "build/bdist.linux-x86_64/egg/argschema/argschema_parser.py", line 175, in __init__
  File "build/bdist.linux-x86_64/egg/argschema/argschema_parser.py", line 274, in load_schema_with_defaults
  File "build/bdist.linux-x86_64/egg/argschema/utils.py", line 422, in load
marshmallow.exceptions.ValidationError: {'a': [u'Not a valid integer.']}

or from a dictionary

>>> from argschema import ArgSchemaParser
>>> from mymodule import MySchema
>>> d={'a':'hello'}
>>> mod = ArgSchemaParser(input_data=d,schema_type=MySchema,args=[])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/Users/forrestcollman/argschema/argschema/argschema_parser.py", line 159, in __init__
        raise mm.ValidationError(json.dumps(result.errors, indent=2))
    marshmallow.exceptions.ValidationError: {
    "a": [
        "Not a valid integer."
    ]
    }

Fields

argschema uses marshmallow (http://marshmallow.readthedocs.io/) under the hood to define the parameters schemas. It comes with a basic set of fields that you can use to define your schemas. One powerful feature of Marshmallow is that you can define custom fields that do arbitrary validation. fields contains all the built-in marshmallow fields, but also some useful custom ones, such as InputFile, OutputFile, InputDir that validate that the paths exist and have the proper permissions to allow files to be read or written.

Other fields, such as NumpyArray will deserialize ordered lists of lists directly into a numpy array of your choosing.

Finally, an important Field to know is Nested, which allows you to define heirarchical nested structures. Note, that if you use Nested schemas, your Nested schemas should subclass DefaultSchema in order that they properly fill in default values, as marshmallow.Schema does not do that by itself.

Another common question about Nested is how you specify that you want it not to be required, but want it filled in with whatever default values exist in the schema it references. Or alternatively, that you want it not required, and you only want the default values used if there is any reference in the input dictionary. The key to this distinction is including default={} (which will cause defaults of the subschemas to be filled in) vs leaving default unspecified, which will only trigger the subschema defaults if the original input contains any references to elements of that subschema.

This example illustrates the difference in the approaches

nested_example.py
import argschema


class MyNest(argschema.schemas.DefaultSchema):
    a = argschema.fields.Int(default=1)
    b = argschema.fields.Int(default=2)


class MySchemaFill(argschema.ArgSchema):
    nest = argschema.fields.Nested(MyNest,
                                   required=False,
                                   default={},
                                   description='nested schema that fills in defaults')


class MySchema(argschema.ArgSchema):
    nest = argschema.fields.Nested(MyNest,
                                   required=False,
                                   description='nested schema that does not always fill defaults')


mod = argschema.ArgSchemaParser(schema_type=MySchema)
print('MySchema')
print(mod.args)
mod2 = argschema.ArgSchemaParser(schema_type=MySchemaFill)
print('MySchemaFill')
print(mod2.args)
$ python nested_example.py
MySchema
{'log_level': u'ERROR'}
MySchemaFill
{'nest': {'a': 1, 'b': 2}, 'log_level': u'ERROR'}
$ python nested_example.py --nest.a 4
MySchema
{'nest': {'a': 4, 'b': 2}, 'log_level': u'ERROR'}
MySchemaFill
{'nest': {'a': 4, 'b': 2}, 'log_level': u'ERROR'}

One important use case for Nested, is where you want your json to have a list of dictionaries. You might be tempted to use the field List, with a field_type of Dict, however you should use Nested with many=True.

The template_module example shows how you might combine these features to define a more complex parameter structure.

template_module.py
from argschema import ArgSchemaParser, ArgSchema
from argschema.fields import NumpyArray, Boolean, Int, Str, Nested
from argschema.schemas import DefaultSchema
import numpy as np
import pprint as pp


# these are the core parameters for my module


class MyNestedParameters(DefaultSchema):
    name = Str(required=True, description='name of vector')
    increment = Int(required=True, description='value to increment')
    array = NumpyArray(dtype=np.float, required=True,
                       description='array to increment')
    write_output = Boolean(required=False, default=True)


# but i'm going to nest them inside a subsection called inc


class MyParameters(ArgSchema):
    inc = Nested(MyNestedParameters)


# this is another schema we will use to validate and deserialize our output
class MyOutputParams(DefaultSchema):
    name = Str(required=True, description='name of vector')
    inc_array = NumpyArray(dtype=np.float, required=True,
                           description='incremented array')


if __name__ == '__main__':

    # this defines a default dictionary that will be used if input_json is not specified
    example_input = {
        "inc": {
            "name": "from_dictionary",
            "increment": 5,
            "array": [0, 2, 5],

            "write_output": True
        },
        "output_json": "output_dictionary.json"
    }

    # here is my ArgSchemaParser that processes my inputs
    mod = ArgSchemaParser(input_data=example_input,
                          schema_type=MyParameters,
                          output_schema_type=MyOutputParams)

    # pull out the inc section of the parameters
    inc_params = mod.args['inc']

    # do my simple addition of the parameters
    inc_array = inc_params['array'] + inc_params['increment']

    # define the output dictionary
    output = {
        'name': inc_params['name'],
        'inc_array': inc_array
    }

    # if the parameters are set as such write the output
    if inc_params['write_output']:
        mod.output(output)

    pp.pprint(mod.args)

so now if run the example commands found in run_template.sh

input.json
  {
      "inc": {
            "name": "from_json",
            "increment": 1,
            "array": [3, 2, 1],
            "write_output": true
       }
  }
$ python template_module.py
  --output_json output_command.json
  --inc.name from_command
  --inc.increment 2
{'inc': {'array': array([0., 2., 5.]),
         'increment': 2,
         'name': u'from_command',
         'write_output': True},
 'log_level': u'ERROR',
 'output_json': u'output_command.json'}
$ python template_module.py
  --input_json input.json
  --output_json output_fromjson.json
{'inc': {'array': array([3., 2., 1.]),
         'increment': 1,
         'name': u'from_json',
         'write_output': True},
 'input_json': u'input.json',
 'log_level': u'ERROR',
 'output_json': u'output_fromjson.json'}
$ python template_module.py
{'inc': {'array': array([0., 2., 5.]),
         'increment': 5,
         'name': u'from_dictionary',
         'write_output': True},
 'log_level': u'ERROR',
 'output_json': u'output_dictionary.json'}

Command-Line Specification

As mentioned in the section Your First Module, argschema supports setting arguments at the command line, along with providing arguments either in an input json or directly passing a dictionary as input_data. Values passed at the command line will take precedence over those passed to the parser or in the input json.

Arguments are specified with –argument_name <value>, where value is passed by the shell. If there are spaces in the value, it will need to be wrapped in quotes, and any special characters will need to be escaped with . Booleans are set with True or 1 for true and False or 0 for false.

An exception to this rule is list formatting. If a schema contains a List and does not set the cli_as_single_argument keyword argument to True, lists will be parsed as –list_name <value1> <value2> …. In argschema 2.0 lists will be parsed in the same way as other arguments, as it allows more flexibility in list types and more clearly represents the intended data structure.

An example script showing old and new list settings:

deprecated_example.py
from argschema import ArgSchema, ArgSchemaParser
from argschema.fields import List, Float


class MySchema(ArgSchema):
    list_old = List(Float, default=[1.1, 2.2, 3.3],
                    description="float list with deprecated cli")
    list_new = List(Float, default=[4.4, 5.5, 6.6],
                    cli_as_single_argument=True,
                    description="float list with supported cli")


if __name__ == '__main__':
    mod = ArgSchemaParser(schema_type=MySchema)
    print(mod.args)

Running this code can demonstrate the differences in command-line usage:

$ python deprecated_example.py --help
/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/latest/local/lib/python2.7/site-packages/argschema-1.17.6-py2.7.egg/argschema/utils.py:346: FutureWarning: '--list_old' is using old-style command-line syntax with each element as a separate argument. This will not be supported in argschema after 2.0. See http://argschema.readthedocs.io/en/master/user/intro.html#command-line-specification for details.
usage: deprecated_example.py [-h] [--output_json OUTPUT_JSON]
                             [--input_json INPUT_JSON]
                             [--list_old [LIST_OLD [LIST_OLD ...]]]
                             [--log_level LOG_LEVEL] [--list_new LIST_NEW]

optional arguments:
  -h, --help            show this help message and exit

MySchema:
  --output_json OUTPUT_JSON
                        file path to output json file
  --input_json INPUT_JSON
                        file path of input json file
  --list_old [LIST_OLD [LIST_OLD ...]]
                        float list with deprecated cli (default=[1.1, 2.2,
                        3.3])
  --log_level LOG_LEVEL
                        set the logging level of the module (default=ERROR)
  --list_new LIST_NEW   float list with supported cli (default=[4.4, 5.5,
                        6.6])
$ python deprecated_example.py --list_old 9.1 8.2 7.3 --list_new [6.4,5.5,4.6]
/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/latest/local/lib/python2.7/site-packages/argschema-1.17.6-py2.7.egg/argschema/utils.py:346: FutureWarning: '--list_old' is using old-style command-line syntax with each element as a separate argument. This will not be supported in argschema after 2.0. See http://argschema.readthedocs.io/en/master/user/intro.html#command-line-specification for details.
{'list_old': [9.1, 8.2, 7.3], 'list_new': [6.4, 5.5, 4.6], 'log_level': u'ERROR'}

We can explore some typical examples of command line usage with the following script:

cli_example.py
from argschema import ArgSchema, ArgSchemaParser
from argschema.fields import List, NumpyArray, Bool, Int, Nested, Str
from argschema.schemas import DefaultSchema


class MyNestedSchema(DefaultSchema):
    a = Int(default=42, description="my first parameter")
    b = Bool(default=True, description="my boolean")


class MySchema(ArgSchema):
    array = NumpyArray(default=[[1, 2, 3], [4, 5, 6]], dtype="uint8",
                       description="my example array")
    string_list = List(List(Str),
                       default=[["hello", "world"], ["lists!"]],
                       cli_as_single_argument=True,
                       description="list of lists of strings")
    int_list = List(Int, default=[1, 2, 3],
                    cli_as_single_argument=True,
                    description="list of ints")
    nested = Nested(MyNestedSchema, required=True)


if __name__ == '__main__':
    mod = ArgSchemaParser(schema_type=MySchema)
    print(mod.args)
$ python cli_example.py --help
usage: cli_example.py [-h] [--output_json OUTPUT_JSON]
                      [--input_json INPUT_JSON] [--log_level LOG_LEVEL]
                      [--int_list INT_LIST] [--string_list STRING_LIST]
                      [--array ARRAY] [--nested.a NESTED.A]
                      [--nested.b NESTED.B]

optional arguments:
  -h, --help            show this help message and exit

MySchema:
  --output_json OUTPUT_JSON
                        file path to output json file
  --input_json INPUT_JSON
                        file path of input json file
  --log_level LOG_LEVEL
                        set the logging level of the module (default=ERROR)
  --int_list INT_LIST   list of ints (default=[1, 2, 3])
  --string_list STRING_LIST
                        list of lists of strings (default=[['hello', 'world'],
                        ['lists!']])
  --array ARRAY         my example array (default=[[1, 2, 3], [4, 5, 6]])

nested:
  --nested.a NESTED.A   my first parameter (default=42)
  --nested.b NESTED.B   my boolean (default=True)

We can set some values and observe the output:

$ python cli_example.py --nested.b 0 --string_list "[['foo','bar'],['baz','buz']]"
{'string_list': [[u'foo', u'bar'], [u'baz', u'buz']], 'int_list': [1, 2, 3], 'log_level': u'ERROR', 'array': array([[1, 2, 3],
       [4, 5, 6]], dtype=uint8), 'nested': {'a': 42, 'b': False}}

If we try to set a field in a way the parser can’t cast the variable (for example, having an invalid literal) we will see a casting validation error:

$ python cli_example.py --array [1,foo,3]
Traceback (most recent call last):
  File "cli_example.py", line 25, in <module>
    mod = ArgSchemaParser(schema_type=MySchema)
  File "build/bdist.linux-x86_64/egg/argschema/argschema_parser.py", line 160, in __init__
  File "build/bdist.linux-x86_64/egg/argschema/utils.py", line 138, in args_to_dict
marshmallow.exceptions.ValidationError: {
  "array": [
    "Command-line argument can't cast to NumpyArray"
  ]
}

argschema does not support setting Dict at the command line.

Sphinx Documentation

argschema comes with a autodocumentation feature for Sphnix which will help you automatically add documentation of your Schemas and ArgSchemaParser classes in your project. This is how the documentation of the test suite included here was generated.

To configure sphinx to use this function, you must be using the sphnix autodoc module and add the following to your conf.py file

from argschema.autodoc import process_schemas

def setup(app):
    app.connect('autodoc-process-docstring',process_schemas)

Installation

install via source code

$ python setup.py install

or pip

$ pip install argschema

Indices and tables