Using YAML with Python
Python May 2nd. 2008, 9:24pmOccasionally most Python programmers needs to save some settings data, usually to save program states, user settings, etc. One way of doing this is to use a standard pickle/cPickle-approach, by which you can save (serialize) and retrieve the state of any object to and from a file. The drawback of this approach is that the resulting data file is in a binary format. And if you are like me, you want your state data in an easily hackable and debuggable format – that is, plain text.
Another approach is to use a traditional Windows INI-style file, with parameters separated in sections. This is a reasonable good approach, but it cannot handle nested data, i.e. sections within sections. And this is mightly handy if you for instance want to serialize a dictionary.
An XML file would seem like the obvious solution, being both text-based and capable of handling nested data. For me, there is two issues with this however:
- XML is not the easiest format to read if your settings file gets big
- Although Python provides out-of-the-box support for XML hacking, there is no direct support for saving and reading the before mentioned dictionary. There is several recipes for doing this on the net, but it is still something you will have to get to work.
The solution (well, one of several)
There is however another solution which comes with batteries included – namely the lesser known poor cousin of XML – YAML.
According to the not-exactly-Web 2.0-like homepage for YAML, it is “a human friendly data serialization standard for all programming languages”. Well, I don’t know about how YAML works for other programming languages, but the YAML implementation for Python (not surprisingly named PyYAML) works great. The latest version at the time of writing is v3.05.
YAML is a much more condensed and readable data format than XML, while retaining most of the benefits of XML, which makes it ideal for settings and profile files.
A simple example
The following example is a simple YAML document describing a tree structure.
# tree format
treeroot:
branch1:
name: Node 1
branch1-1:
name: Node 1-1
branch2:
name: Node 2
branch2-1:
name: Node 2-1
As can probably be seen, YAML uses indentation to specify levels, much like Python, and does thus not need the open- and end-tags of XML.
To read this data into a Python program, simply execute the following code (we assume that the YAML data is kept in a file called ‘tree.yaml’):
import yaml
f = open('tree.yaml')
dataMap = yaml.load(f)
f.close()
The variable ‘dataMap’ now contains a dictionary with the tree data. If you print the ‘dataMap’ using PrettyPrint, you will get something like the following:
{'treeroot': {'branch1': {'branch1-1': {'name': 'Node 1-1'},
'name': 'Node 1'},
'branch2': {'branch2-1': {'name': 'Node 2-1'},
'name': 'Node 2'}}}
Saving data
So now we have seen how to get data into our Python program. Saving data is just as easy:
f = open('newtree.yaml', "w")
yaml.dump(dataMap, f)
f.close()
Summing up
YAML is an excellent data format language for Python which can be used for a wide range of purposes, including application configuration files, saving program state, user settings, profiles and much more.
May 11th, 2008 at 20:05
YAML is nice in many respects. it's more structured for arbitrary data "slugs" than, say, JSON for arbitrary object serialization, but at a cost: the YAML representation is restricted to that language. if you want to pass data from one implementation to another and you have these objects, you're screwed. example, from my punymud codebase looks like this:
- &id001 !!python/object:mud.Room
exits:
east: &id011 !!python/object:__main__.Room
exits:
west: *id001
name: Porch
oid: 17
north: &id004 !!python/object:__main__.Room
exits:
south: *id001
name: Office
oid: 4
etc etc etc ... powerful, but at a cost.
but i have to say i much prefer YAML for data storage over XML.
some questions remain about YAML in python: why isn't it base? why is it so slow?
May 12th, 2008 at 03:05
Python’s pickle format is not a binary format, it’s a printable ASCII format. It is specific to Python, however, whereas YAML is not.
Python’s pickle and cPickle modules do support a binary file format as an option but the default is plain ASCII.
May 12th, 2008 at 09:05
[...] files, as it’s very human readable, unlike XML. Blog Elmholdt has published a short article on getting YAML going under Python… Posted by Paul in python at 17:41 | Comments (0) | Trackbacks [...]
October 8th, 2010 at 02:10
[...] http://mikkel.elmholdt.dk/?p=4 http://docs.python.org/library/pprint.html Posted in Python Development Cancel [...]
February 3rd, 2012 at 23:02
[...] Example from Using YAML with Python [...]