Usage

DoJSON is a simple Pythonic JSON to JSON converter.

The main goal of this package is to help with managing a set of rules for manipulation of Python dictionaries with focus on JSON serialization. Each rule is associated with regular expression and key. The regular expression has to match a key in the source mapping and produces a new value that is added to the output mapping under the new key.

Initialization

First create an Overdo object that is holding the index with rules.

>>> import dojson
>>> simple = dojson.Overdo()

Next step is to create rules that will manupulate a source object.

>>> @simple.over('first', '^.*st$')
... def first(self, key, value):
...     return value + 1
>>> @simple.over('second', '^.*nd$')
... def second(self, key, value):
...     return value + 2

And now we can try to match the source object and produce new data.

>>> data = simple.do({'1st': 1, '2nd': 2})
>>> assert 2 == data['first']
>>> assert 4 == data['second']

Command line interface

Command line interface script is installed as dojson.

The easiest way to get started by applying already registered rule to a JSON data.

{"245__": {"a": "Test title"}}

DoJSON comes with set of rules for processing MARC21 fields.

$ echo '{"245__": {"a": "Test title"}}' | dojson do marc21
{"title_statement": {"title": "Test title"}}

Sometimes one can get input with fields that does not match any rule. To get such a list of fields one can use the missing command.

$ echo '{"999__": {"a": "Test title"}}' | dojson missing marc21
999__

The usual problem comes with reading different file formats such as XML.

<?xml version='1.0' encoding='UTF-8'?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
  <record>
    <datafield tag="245" ind1=" " ind2=" ">
      <subfield code="a">Test title</subfield>
    </datafield>
  </record>
</collection>

You can specify regitered loader using -l <NAME> argument. Save the above example as example.xml and check following command.

$ dojson -i example.xml -l marcxml do marc21
{"title_statement": {"title": "Test title"}}

In similar way it is possible to specify different output serializer (-d).

$ echo '{"title_statement": {"title": "Test title"}}' | \
  dojson -d marcxml do marc21
<?xml version='1.0' encoding='UTF-8'?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
  <record>
    <datafield tag="245" ind1=" " ind2=" ">
      <subfield code="a">Test title</subfield>
    </datafield>
  </record>
</collection>

Command chaining

This makes JSON manipulation even easier. For first example see schema command that accept string argument containing URL of JSON-Schema that should be added to $schema field.

$ dojson -i example.xml -l marcxml do marc21 \
  schema http://example.org/schema/marc21.json
..."schema": "http://example.org/schema/marc21.json"...

Second example shows easy verification that rules produce an identity function.

$ dojson -l marcxml -d marcxml do marc21 do to_marc21 < example.xml | \
  diff - example.xml

Extensibility

New commands, loaders, dumpers, or rules can be provided via entry points.

  • dojson.cli commands that return a processor acception an iterator;
  • dojson.cli.load functions expecting a stream and returning Python dict or iterator;
  • dojson.cli.dump functions expecting a Python object and returning str;
  • dojson.cli.rule instances of dojson.overdo.Overdo with loaded rules.