JSON Meets Pickle — Serialize Your Python Classes for Logging and Debugging

Greg Van Aken
5 min readOct 23, 2020

--

I work on software that drives the control of scientific instruments. One of our projects uses django to interface client actions with the hardware. One of the wonderful things about django is the ORM makes it so easy to persist state to a database. The benefit of this is that I can do most of my python programming as if I am working with regular python classes, however I am really working with models, whose attributes are fields in my database. So when I need to capture the state of my system for logging and diagnostic purposes, I can dump a serialization of a model or two of interest to log files.

If you’re unfamiliar with django, the best way to do this is by using dumpdata, which will take any model (or set of models) and output all of the data in the database associated with them to standard output. Now, normally django-admin commands are used as-needed to modify or interrogate the state of the app. In this particular example (inspecting the state of the system at a particular time), I’d be more likely use a tool like DBeaver to visualize / query my database.

But, let’s say there are a couple of special cases that can occur in your app and you need to know the state of your database at the exact moment they occur in order to make informed logs / debugging decisions. It is possible to have your django code call dumpdata from within your app code.

I have found this so powerful to create verbose logging that is context-dependent. It lets me capture the exact state of my objects (models) at any time. Now I can come in to work in the morning and see that a new log was written: “this exception occurred at 2 AM and here is what the database looked like exactly when it happened:” followed by a bunch of useful, readable JSON.

Needless to say, there are times when I can’t do all of my programming on top of the ORM. Specific to this project, as amazingly performant as the django ORM is, there would be performance implications to pushing too much object-oriented logic to database reading/writing, particularly for real-time instrument control. And likewise, in other very non-django projects where it doesn’t make sense architecturally to have a database + web framework, we are left without the beautiful capabilities datadump grants us.

For this reason, I decided to look into what it would look like to create a human-readable, JSON-like serialized representation of an arbitrary python class.

Caveat time: I have not done much work with Pickle. I know it exists. I’m pretty sure it does not expose the functionality I was seeking.

Request time: If you know of a standard python way / popular package to achieve what I ended up implementing below, please comment and let me know!

So, let’s say we have a class and it has a bunch of attributes.

How do we visualize this? Enter JSON:

{
"MyClass":{
"_number":100,
"_string":"hello",
"_list":[1,2,3,4],
"bool":false,
"_my_other_class_object":{
"MyOtherClass":{
"etc":"..."
}
}
}
}

Okay so for any class we need a few things:

  1. The name of the class as a string
  2. The name of all of the class’s user-defined attributes as strings
  3. The value of all of these attributes*

*If the value of these attributes are not JSON-standard types, we need a serialized representation

  1. The name of the class as a string

In python (3) there is a pretty easy way to get this. All classes have the magic attribute __name__ which returns a string representation of the class name.

So it may seem like this is as simple as self.__name__ , but self is, itself, an instance object of the class. We need to get the type of self and query that.

2. The name of all of the class’s user-defined attributes as strings

Maybe less-often used than __name__, all python classes expose a way to interrogate all member variables/ functions of a class: dir(self) . On a class without any member variables or functions defined, the following list of “magic methods” is returned:

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__']

I strongly encourage you to go checkout these magic methods if you don’t know about them. Overriding them can be extremely powerful for OOP to create objects that fit seamlessly into pythonic code. For our purposes, we care about everything except these magic methods. We only want user-defined attributes. Python conveniently protects these magic methods with __ , so with some list comprehension magic, and a nice helper function, we can get a list of the names we are after.

self.class_attr(attr) only returns True if attr does not start with __ and is not callable. The latter removes any methods (magic or otherwise) from our initial list returned by dir(self) ; and the former, any special python attributes. We can get away with only checking that the variable does not have leading __ because even if the user defined an attribute with leading __ , dir(self) would not display it as such due to name mangling, as is extremely well explained in Dan Bader’s blog post.

3. The value of these attributes

getattr(<object>, <attribute as string>) is all we really need for this one (and some more list comprehension magic).

It is all of these ideas that have resulted in serialclass, a python library that exposes a base class that uses these principles to easily get a JSON-style representation of any class that inherits it (all in about 60 lines!). Check it out on GitHub, but here is a peek:

from serialclass import SerialClass


class MyOtherClass(SerialClass):
def __init__(self):
self.etc = "..."

class MyClass(SerialClass):
def __init__(self):
self._number = 100
self._string = "hello"
self._list = [1,2,3,4]
self._bool = False
self._my_other_class_object = MyOtherClass()

myclass = MyClass()
print(myclass.serialize())
print(myclass.pstringify(indent=2))

Running this results in:

{'MyClass': {'_bool': False, '_list': [1, 2, 3, 4], '_my_other_class_object': {'MyOtherClass': {'etc': '...'}}, '_number': 100, '_string': 'hello'}}
{
"MyClass": {
"_bool": false,
"_list": [
1,
2,
3,
4
],
"_my_other_class_object": {
"MyOtherClass": {
"etc": "..."
}
},
"_number": 100,
"_string": "hello"
}
}

The first is a dictionary representation of the class, and the second is a JSON string with indent=2, and matches exactly the representation we set out to achieve!

Peace, love, zen, and python!

--

--

Greg Van Aken

Based in Philadelphia, PA, I have a passion for designing and implementing software to facilitate scientific discovery and medical innovation.