User Guide¶

Your First Module¶

mymodule.py¶

import argschema


class MySchema(argschema.ArgSchema):
    a = argschema.fields.Int(default=42, description='my first parameter')


if __name__ == '__main__':
    mod = argschema.ArgSchemaParser(schema_type=MySchema)
    mod.logger.warn("this module does nothing useful")
    print(mod.args)

running this code produces

$ python mymodule.py
mymodule.py:10: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  mod.logger.warn("this module does nothing useful")
{'a': 42, 'log_level': 'ERROR'}

$ python mymodule.py --a 2
mymodule.py:10: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  mod.logger.warn("this module does nothing useful")
{'a': 2, 'log_level': 'ERROR'}

$ python mymodule.py --a 2 --log_level WARNING
mymodule.py:10: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  mod.logger.warn("this module does nothing useful")
WARNING:argschema.argschema_parser:this module does nothing useful
{'a': 2, 'log_level': 'WARNING'}

$ python mymodule.py -h
usage: mymodule.py [-h] [--input_json INPUT_JSON] [--output_json OUTPUT_JSON]
                   [--log_level LOG_LEVEL] [--a A]

optional arguments:
  -h, --help            show this help message and exit

MySchema:
  --input_json INPUT_JSON
                        file path of input json file
  --output_json OUTPUT_JSON
                        file path to output json file
  --log_level LOG_LEVEL
                        set the logging level of the module (default=ERROR)
  --a A                 my first parameter (default=42)

Great you are thinking, that is basically argparse, congratulations!

But there is more.. you can also give your module a dictionary in an interactive session

>>> from argschema import ArgSchemaParser
>>> from mymodule import MySchema
>>> d = {'a':5}
>>> mod = ArgSchemaParser(input_data=d,schema_type=MySchema)
>>> print(mod.args)
{'a': 5, 'log_level': u'ERROR'}

or you write out a json file and pass it the path on the command line

myinput.json¶

{
    "a":99
}

$ python mymodule.py --input_json myinput.json
mymodule.py:10: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  mod.logger.warn("this module does nothing useful")
{'a': 99, 'log_level': 'ERROR', 'input_json': 'myinput.json'}

or override a parameter if you want

$ python mymodule.py --input_json myinput.json --a 100
mymodule.py:10: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  mod.logger.warn("this module does nothing useful")
{'log_level': 'ERROR', 'a': 100, 'input_json': 'myinput.json'}

plus, no matter how you give it parameters, they will always be validated, before any of your code runs.

Whether from the command line

$ python mymodule.py --input_json ../examples/myinput.json --a 5!
Traceback (most recent call last):
  File "mymodule.py", line 9, in <module>
    mod = argschema.ArgSchemaParser(schema_type=MySchema)
  File "/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/master/lib/python3.7/site-packages/argschema-3.0.1-py3.7.egg/argschema/argschema_parser.py", line 175, in __init__
  File "/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/master/lib/python3.7/site-packages/argschema-3.0.1-py3.7.egg/argschema/argschema_parser.py", line 276, in load_schema_with_defaults
  File "/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/master/lib/python3.7/site-packages/argschema-3.0.1-py3.7.egg/argschema/utils.py", line 418, in load
  File "/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/master/lib/python3.7/site-packages/marshmallow/schema.py", line 707, in load
    postprocess=True,
  File "/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/master/lib/python3.7/site-packages/marshmallow/schema.py", line 867, in _do_load
    raise exc
marshmallow.exceptions.ValidationError: {'a': ['Not a valid integer.']}

or from a dictionary

>>> from argschema import ArgSchemaParser
>>> from mymodule import MySchema
>>> d={'a':'hello'}
>>> mod = ArgSchemaParser(input_data=d,schema_type=MySchema,args=[])
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/Users/forrestcollman/argschema/argschema/argschema_parser.py", line 159, in __init__
        raise mm.ValidationError(json.dumps(result.errors, indent=2))
    marshmallow.exceptions.ValidationError: {
    "a": [
        "Not a valid integer."
    ]
    }

Fields¶

argschema uses marshmallow (http://marshmallow.readthedocs.io/) under the hood to define the parameters schemas. It comes with a basic set of fields that you can use to define your schemas. One powerful feature of Marshmallow is that you can define custom fields that do arbitrary validation. fields contains all the built-in marshmallow fields, but also some useful custom ones, such as InputFile, OutputFile, InputDir that validate that the paths exist and have the proper permissions to allow files to be read or written.

Other fields, such as NumpyArray will deserialize ordered lists of lists directly into a numpy array of your choosing.

Finally, an important Field to know is Nested, which allows you to define heirarchical nested structures. Note, that if you use Nested schemas, your Nested schemas should subclass DefaultSchema in order that they properly fill in default values, as marshmallow.Schema does not do that by itself.

Another common question about Nested is how you specify that you want it not to be required, but want it filled in with whatever default values exist in the schema it references. Or alternatively, that you want it not required, and you only want the default values used if there is any reference in the input dictionary. The key to this distinction is including default={} (which will cause defaults of the subschemas to be filled in) vs leaving default unspecified, which will only trigger the subschema defaults if the original input contains any references to elements of that subschema.

This example illustrates the difference in the approaches

nested_example.py¶

import argschema


class MyNest(argschema.schemas.DefaultSchema):
    a = argschema.fields.Int(default=1)
    b = argschema.fields.Int(default=2)


class MySchemaFill(argschema.ArgSchema):
    nest = argschema.fields.Nested(MyNest,
                                   required=False,
                                   default={},
                                   description='nested schema that fills in defaults')


class MySchema(argschema.ArgSchema):
    nest = argschema.fields.Nested(MyNest,
                                   required=False,
                                   description='nested schema that does not always fill defaults')


mod = argschema.ArgSchemaParser(schema_type=MySchema)
print('MySchema')
print(mod.args)
mod2 = argschema.ArgSchemaParser(schema_type=MySchemaFill)
print('MySchemaFill')
print(mod2.args)

$ python nested_example.py
MySchema
{'log_level': 'ERROR'}
MySchemaFill
{'nest': {'b': 2, 'a': 1}, 'log_level': 'ERROR'}

$ python nested_example.py --nest.a 4
MySchema
{'nest': {'b': 2, 'a': 4}, 'log_level': 'ERROR'}
MySchemaFill
{'nest': {'b': 2, 'a': 4}, 'log_level': 'ERROR'}

One important use case for Nested, is where you want your json to have a list of dictionaries. You might be tempted to use the field List, with a field_type of Dict, however you should use Nested with many=True.

The template_module example shows how you might combine these features to define a more complex parameter structure.

template_module.py¶

from argschema import ArgSchemaParser, ArgSchema
from argschema.fields import NumpyArray, Boolean, Int, Str, Nested
from argschema.schemas import DefaultSchema
import numpy as np
import pprint as pp


# these are the core parameters for my module


class MyNestedParameters(DefaultSchema):
    name = Str(required=True, description='name of vector')
    increment = Int(required=True, description='value to increment')
    array = NumpyArray(dtype=np.float, required=True,
                       description='array to increment')
    write_output = Boolean(required=False, default=True)


# but i'm going to nest them inside a subsection called inc


class MyParameters(ArgSchema):
    inc = Nested(MyNestedParameters)


# this is another schema we will use to validate and deserialize our output
class MyOutputParams(DefaultSchema):
    name = Str(required=True, description='name of vector')
    inc_array = NumpyArray(dtype=np.float, required=True,
                           description='incremented array')


if __name__ == '__main__':

    # this defines a default dictionary that will be used if input_json is not specified
    example_input = {
        "inc": {
            "name": "from_dictionary",
            "increment": 5,
            "array": [0, 2, 5],

            "write_output": True
        },
        "output_json": "output_dictionary.json"
    }

    # here is my ArgSchemaParser that processes my inputs
    mod = ArgSchemaParser(input_data=example_input,
                          schema_type=MyParameters,
                          output_schema_type=MyOutputParams)

    # pull out the inc section of the parameters
    inc_params = mod.args['inc']

    # do my simple addition of the parameters
    inc_array = inc_params['array'] + inc_params['increment']

    # define the output dictionary
    output = {
        'name': inc_params['name'],
        'inc_array': inc_array
    }

    # if the parameters are set as such write the output
    if inc_params['write_output']:
        mod.output(output)

    pp.pprint(mod.args)

so now if run the example commands found in run_template.sh

input.json¶

  {
      "inc": {
            "name": "from_json",
            "increment": 1,
            "array": [3, 2, 1],
            "write_output": true
       }
  }

$ python template_module.py
  --output_json output_command.json
  --inc.name from_command
  --inc.increment 2
template_module.py:14: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  array = NumpyArray(dtype=np.float, required=True,
template_module.py:29: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  inc_array = NumpyArray(dtype=np.float, required=True,
{'inc': {'array': array([0., 2., 5.]),
         'increment': 2,
         'name': 'from_command',
         'write_output': True},
 'log_level': 'ERROR',
 'output_json': 'output_command.json'}

$ python template_module.py
  --input_json input.json
  --output_json output_fromjson.json
template_module.py:14: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  array = NumpyArray(dtype=np.float, required=True,
template_module.py:29: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  inc_array = NumpyArray(dtype=np.float, required=True,
{'inc': {'array': array([3., 2., 1.]),
         'increment': 1,
         'name': 'from_json',
         'write_output': True},
 'input_json': 'input.json',
 'log_level': 'ERROR',
 'output_json': 'output_fromjson.json'}

$ python template_module.py
template_module.py:14: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  array = NumpyArray(dtype=np.float, required=True,
template_module.py:29: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  inc_array = NumpyArray(dtype=np.float, required=True,
{'inc': {'array': array([0., 2., 5.]),
         'increment': 5,
         'name': 'from_dictionary',
         'write_output': True},
 'log_level': 'ERROR',
 'output_json': 'output_dictionary.json'}

Command-Line Specification¶

As mentioned in the section Your First Module, argschema supports setting arguments at the command line, along with providing arguments either in an input json or directly passing a dictionary as input_data. Values passed at the command line will take precedence over those passed to the parser or in the input json.

Arguments are specified with –argument_name <value>, where value is passed by the shell. If there are spaces in the value, it will need to be wrapped in quotes, and any special characters will need to be escaped with . Booleans are set with True or 1 for true and False or 0 for false.

An exception to this rule is list formatting. If a schema contains a List and does not set the cli_as_single_argument keyword argument to True, lists will be parsed as –list_name <value1> <value2> …. In argschema 2.0 lists will be parsed in the same way as other arguments, as it allows more flexibility in list types and more clearly represents the intended data structure.

An example script showing old and new list settings:

deprecated_example.py¶

from argschema import ArgSchema, ArgSchemaParser
from argschema.fields import List, Float


class MySchema(ArgSchema):
    list_old = List(Float, default=[1.1, 2.2, 3.3],
                    description="float list with deprecated cli")
    list_new = List(Float, default=[4.4, 5.5, 6.6],
                    cli_as_single_argument=True,
                    description="float list with supported cli")


if __name__ == '__main__':
    mod = ArgSchemaParser(schema_type=MySchema)
    print(mod.args)

Running this code can demonstrate the differences in command-line usage:

$ python deprecated_example.py --help
/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/master/lib/python3.7/site-packages/argschema-3.0.1-py3.7.egg/argschema/utils.py:346: FutureWarning: '--list_old' is using old-style command-line syntax with each element as a separate argument. This will not be supported in argschema after 2.0. See http://argschema.readthedocs.io/en/master/user/intro.html#command-line-specification for details.
usage: deprecated_example.py [-h] [--input_json INPUT_JSON]
                             [--output_json OUTPUT_JSON]
                             [--log_level LOG_LEVEL]
                             [--list_old [LIST_OLD [LIST_OLD ...]]]
                             [--list_new LIST_NEW]

optional arguments:
  -h, --help            show this help message and exit

MySchema:
  --input_json INPUT_JSON
                        file path of input json file
  --output_json OUTPUT_JSON
                        file path to output json file
  --log_level LOG_LEVEL
                        set the logging level of the module (default=ERROR)
  --list_old [LIST_OLD [LIST_OLD ...]]
                        float list with deprecated cli (default=[1.1, 2.2,
                        3.3])
  --list_new LIST_NEW   float list with supported cli (default=[4.4, 5.5,
                        6.6])

$ python deprecated_example.py --list_old 9.1 8.2 7.3 --list_new [6.4,5.5,4.6]
/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/master/lib/python3.7/site-packages/argschema-3.0.1-py3.7.egg/argschema/utils.py:346: FutureWarning: '--list_old' is using old-style command-line syntax with each element as a separate argument. This will not be supported in argschema after 2.0. See http://argschema.readthedocs.io/en/master/user/intro.html#command-line-specification for details.
{'list_new': [6.4, 5.5, 4.6], 'log_level': 'ERROR', 'list_old': [9.1, 8.2, 7.3]}

We can explore some typical examples of command line usage with the following script:

cli_example.py¶

from argschema import ArgSchema, ArgSchemaParser
from argschema.fields import List, NumpyArray, Bool, Int, Nested, Str
from argschema.schemas import DefaultSchema


class MyNestedSchema(DefaultSchema):
    a = Int(default=42, description="my first parameter")
    b = Bool(default=True, description="my boolean")


class MySchema(ArgSchema):
    array = NumpyArray(default=[[1, 2, 3], [4, 5, 6]], dtype="uint8",
                       description="my example array")
    string_list = List(List(Str),
                       default=[["hello", "world"], ["lists!"]],
                       cli_as_single_argument=True,
                       description="list of lists of strings")
    int_list = List(Int, default=[1, 2, 3],
                    cli_as_single_argument=True,
                    description="list of ints")
    nested = Nested(MyNestedSchema, required=True)


if __name__ == '__main__':
    mod = ArgSchemaParser(schema_type=MySchema)
    print(mod.args)

$ python cli_example.py --help
usage: cli_example.py [-h] [--input_json INPUT_JSON]
                      [--output_json OUTPUT_JSON] [--log_level LOG_LEVEL]
                      [--array ARRAY] [--string_list STRING_LIST]
                      [--int_list INT_LIST] [--nested.a NESTED.A]
                      [--nested.b NESTED.B]

optional arguments:
  -h, --help            show this help message and exit

MySchema:
  --input_json INPUT_JSON
                        file path of input json file
  --output_json OUTPUT_JSON
                        file path to output json file
  --log_level LOG_LEVEL
                        set the logging level of the module (default=ERROR)
  --array ARRAY         my example array (default=[[1, 2, 3], [4, 5, 6]])
  --string_list STRING_LIST
                        list of lists of strings (default=[['hello', 'world'],
                        ['lists!']])
  --int_list INT_LIST   list of ints (default=[1, 2, 3])

nested:
  --nested.a NESTED.A   my first parameter (default=42)
  --nested.b NESTED.B   my boolean (default=True)

We can set some values and observe the output:

$ python cli_example.py --nested.b 0 --string_list "[['foo','bar'],['baz','buz']]"
{'nested': {'a': 42, 'b': False}, 'string_list': [['foo', 'bar'], ['baz', 'buz']], 'array': array([[1, 2, 3],
       [4, 5, 6]], dtype=uint8), 'int_list': [1, 2, 3], 'log_level': 'ERROR'}

If we try to set a field in a way the parser can’t cast the variable (for example, having an invalid literal) we will see a casting validation error:

$ python cli_example.py --array [1,foo,3]
Traceback (most recent call last):
  File "cli_example.py", line 25, in <module>
    mod = ArgSchemaParser(schema_type=MySchema)
  File "/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/master/lib/python3.7/site-packages/argschema-3.0.1-py3.7.egg/argschema/argschema_parser.py", line 160, in __init__
  File "/home/docs/checkouts/readthedocs.org/user_builds/argschema/envs/master/lib/python3.7/site-packages/argschema-3.0.1-py3.7.egg/argschema/utils.py", line 138, in args_to_dict
marshmallow.exceptions.ValidationError: {
  "array": [
    "Command-line argument can't cast to NumpyArray"
  ]
}

argschema does not support setting Dict at the command line.

Sphinx Documentation¶

argschema comes with a autodocumentation feature for Sphnix which will help you automatically add documentation of your Schemas and ArgSchemaParser classes in your project. This is how the documentation of the test suite included here was generated.

To configure sphinx to use this function, you must be using the sphnix autodoc module and add the following to your conf.py file

from argschema.autodoc import process_schemas

def setup(app):
    app.connect('autodoc-process-docstring',process_schemas)

Installation¶

install via source code

$ python setup.py install

or pip

$ pip install argschema