User Guide

Your First Module

mymodule.py
import argschema

class MySchema(argschema.ArgSchema):
    a = argschema.fields.Int(default = 42, description= 'my first parameter')
                            
if __name__ == '__main__':
    mod = argschema.ArgSchemaParser(schema_type=MySchema)
    print(mod.args)
    

running this code produces

$ python mymodule.py
{'a': 42, 'log_level': u'ERROR'}
$ python mymodule.py --a 2
{'a': 2, 'log_level': u'ERROR'}
$ python mymodule.py --a 2 --log_level WARNING
{'a': 2, 'log_level': u'WARNING'}
WARNING:argschema.argschema_parser:this program does nothing useful
$ python mymodule.py -h
usage: mymodule.py [-h] [--a A] [--output_json OUTPUT_JSON]
                [--log_level LOG_LEVEL] [--input_json INPUT_JSON]

optional arguments:
-h, --help            show this help message and exit
--a A                 my first parameter
--output_json OUTPUT_JSON
                        file path to output json file
--log_level LOG_LEVEL
                        set the logging level of the module
--input_json INPUT_JSON
                        file path of input json file

Great you are thinking, that is basically argparse, congratulations!

But there is more.. you can also give your module a dictionary in an interactive session

>>> from argschema import ArgSchemaParser
>>> from mymodule import MySchema
>>> d = {'a':5}
>>> mod = ArgSchemaParser(input_data=d,schema_type=MySchema)
>>> print(mod.args)
{'a': 5, 'log_level': u'ERROR'}

or you write out a json file and pass it the path on the command line

myinput.json
{
    "a":99
}
$ python mymodule.py --input_json myinput.json
{'a': 99, 'log_level': u'ERROR', 'input_json': u'myinput.json'}

or override a parameter if you want

$ python mymodule.py --input_json myinput.json --a 100
{'a': 100, 'log_level': u'ERROR', 'input_json': u'myinput.json'}

plus, no matter how you give it parameters, they will always be validated, before any of your code runs.

Whether from the command line

$ python mymodule.py --input_json myinput.json --a 5!
usage: mymodule.py [-h] [--a A] [--output_json OUTPUT_JSON]
                [--log_level LOG_LEVEL] [--input_json INPUT_JSON]
mymodule.py: error: argument --a: invalid int value: '5!'

or from a dictionary

>>> from argschema import ArgSchemaParser
>>> from mymodule import MySchema
>>> d={'a':'hello'}
>>> mod = ArgSchemaParser(input_data=d,schema_type=MySchema)
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/Users/forrestcollman/argschema/argschema/argschema_parser.py", line 159, in __init__
        raise mm.ValidationError(json.dumps(result.errors, indent=2))
    marshmallow.exceptions.ValidationError: {
    "a": [
        "Not a valid integer."
    ]
    }

Fields

argschema uses marshmallow (http://marshmallow.readthedocs.io/) under the hood to define the parameters schemas. It comes with a basic set of fields that you can use to define your schemas. One powerful feature of Marshmallow is that you can define custom fields that do arbitrary validation. fields contains all the built-in marshmallow fields, but also some useful custom ones, such as InputFile, OutputFile, InputDir that validate that the paths exist and have the proper permissions to allow files to be read or written.

Other fields, such as NumpyArray will deserialize ordered lists of lists directly into a numpy array of your choosing.

Finally, an important Field to know is Nested, which allows you to define heirarchical nested structures. Note, that if you use Nested schemas, your Nested schemas should subclass DefaultSchema in order that they properly fill in default values, as marshmallow.Schema does not do that by itself.

The template_module example shows how you might combine these features to define a more complex parameter structure.

template_module.py
from argschema import ArgSchemaParser, ArgSchema
from argschema.fields import OutputFile, NumpyArray, Boolean, Int, Str, Nested
from argschema.schemas import DefaultSchema
import numpy as np
import json

# these are the core parameters for my module
class MyNestedParameters(DefaultSchema):
    name = Str(required=True, description='name of vector')
    increment = Int(required=True, description='value to increment')
    array = NumpyArray(dtype=np.float, required=True, description='array to increment')
    write_output = Boolean(required=False, default=True)

# but i'm going to nest them inside a subsection called inc
class MyParameters(ArgSchema):
    inc = Nested(MyNestedParameters)

#this is another schema we will use to validate and deserialize our output
class MyOutputParams(DefaultSchema):
    name = Str(required=True, description='name of vector')
    inc_array = NumpyArray(dtype=np.float, required=True, description='incremented array')

if __name__ == '__main__':
    
    # this defines a default dictionary that will be used if input_json is not specified
    example_input = {
        "inc": {
            "name": "from_dictionary",
            "increment": 5,
            "array": [0, 2, 5],

            "write_output": True
        },
        "output_json": "output_dictionary.json"
    }

    # here is my ArgSchemaParser that processes my inputs
    mod = ArgSchemaParser(input_data=example_input,
                          schema_type=MyParameters,
                          output_schema_type=MyOutputParams)
                          
    # pull out the inc section of the parameters
    inc_params = mod.args['inc']

    # do my simple addition of the parameters
    inc_array = inc_params['array'] + inc_params['increment']

    # define the output dictionary
    output = {
        'name': inc_params['name'],
        'inc_array': inc_array
    }

    # if the parameters are set as such write the output
    if inc_params['write_output']:
        mod.output(output)

so now if run the example commands found in run_template.sh

input.json
  {
      "inc": {
            "name": "from_json",
            "increment": 1,
            "array": [3, 2, 1],
            "write_output": true
       }
  }
$ python template_module.py \
    --output_json output_command.json \
    --inc.name from_command \
    --inc.increment 2
{u'name': u'from_command', u'inc_array': [2.0, 4.0, 7.0]}
$ python template_module.py \
    --input_json input.json \
    --output_json output_fromjson.json
{u'name': u'from_json', u'inc_array': [4.0, 3.0, 2.0]}
$ python template_module.py
{u'name': u'from_dictionary', u'inc_array': [5.0, 7.0, 10.0]}

Sphinx Documentation

argschema comes with a autodocumentation feature for Sphnix which will help you automatically add documentation of your Schemas and ArgSchemaParser classes in your project. This is how the documentation of the test suite included here was generated.

To configure sphnix to use this function, you must be using the sphnix autodoc module and add the following to your conf.py file

from argschema.autodoc import process_schemas

def setup(app):
    app.connect('autodoc-process-docstring',process_schemas)

Installation

install via source code

$ python setup.py install

or pip

$ pip install argschema

Indices and tables