get it right: controlling attribute access in Python

November 16, 2020

In Python, you use the dot (.) operator to access attributes of an object. Normally, that isn’t something you have to give much thought to. However, when you want to customize what happens during attribute access, things can get complicated.

In this post, we’ll see how to control what happens when you use the dot operator. Before we can talk about customizing attribute access, we need to discuss two related topics: class and instance attributes, and descriptors. If both of those are familiar to you, feel free to skip ahead.

Class and instance attributes
Descriptors
Customizing attribute access
Example 1: returning None when an attribute isn’t found
Example 2 - an attribute that updates itself when accessed
Further reading

Class and instance attributes

There are two kinds of attributes in Python: class and instance attributes. In the following class, volume is a class attribute, and line is an instance attribute.

class Speaker:
    volume = "low"
    
    def __init__(self, line):
        self.line = line
    
    def speak(self):
        return line if self.volume == "low" else line.upper()

Each instance of the class has its own line attribute. It’s easy to see why calling a.speak() below returns "hello" while b.speak() returns "goodbye":

>>> a = Speaker("hello")
>>> b = Speaker("goodbye")
>>> a.speak()
"hello"
>>> b.speak()
"goodbye"

The class attribute volume is shared by all instances. If you change its value, that change is visible to both a and b:

>>> Speaker.volume = "high"
>>> a.speak()
'HELLO'
>>> b.speak()
'GOODBYE'

Each instance has an instance dictionary where instance attributes are stored:

>>> a.__dict__
{'line': 'hello'}
>>> b.__dict__
{'line': 'goodbye'}

Class attributes are stored in a class dictionary:

>>> Speaker.__dict__
mappingproxy({'__module__': '__main__',
              'volume': 'low',
              '__init__': <function __main__.Speaker.__init__(self, line)>,
              'speak': <function __main__.Speaker.speak(self)>,
              '__dict__': <attribute '__dict__' of 'Speaker' objects>,
              '__weakref__': <attribute '__weakref__' of 'Speaker' objects>,
              '__doc__': None})

It’s important to remember this distinction between instance and class attributes (and where they are stored).

Descriptors

Descriptors are an important concept related to attribute access. A descriptor is a class that defines one or more of the following methods:

__get__(),
__set__(),
or __delete__()

Below is a simple descriptor class. Its __get__() method always returns 0.

class ZeroAttribute:
    """
    Attribute that is always 0
    """
    def __get__(self, obj, owner=None):
        return 0

Descriptors are only useful as class variables:

class Foo:
    x = ZeroAttribute()

Accessing Foo.x will run its __get__() method:

>>> Foo.x  # calls ZeroAttribute.__get__()
0

Even though x was defined like a class attribute, you can also access x as an instance attribute, which also invokes the ZeroAttribute.__get__() method:

>>> a = Foo()
>>> a.x  # calls ZeroAttribute.__get__()
0

Arguments to the `get()` method

The __get__() method accepts two arguments, obj and owner.

If the __get__() method is called by acessing a _class__ attribute, obj is set to None, and owner is set to the class.
If the __get__() method is called by accessing an instance attribute, obj is set to the instance, and owner is set to the type of the instance.

This lets you specify different behaviour for class attribute access and instance attribute access. For example, if you explicitly don’t want to allow class attribute access, you can do something like this:

class ZeroAttribute:
    """
    Attribute that is always 0
    """
    def __get__(self, obj, owner=None):
        if obj is None:
            raise AttributeError()  # don't allow accessing as a class attribute
        return 0

class Foo:
    x = ZeroAttribute()

>>> Foo.x  # accessing as class attribute, will raise
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 7, in __get__
AttributeError
>>>
>>> a = Foo()
>>> a.x  # accessing as instance attribute, OK
0

Data descriptors and non-data descriptors

A descriptor that only defines __get__() is called a non-data descriptor. A descriptor that also defines __set__() or __delete__() is called a data descriptor. As you’ll see in the next section, this is an important difference to remember.

Customizing attribute access

Now that you know about class and instance attributes, and a little bit about descriptors, you’re ready to understand how attribute access really works in Python, and how to customize it.

When you write x.y, the x.__getattribute__() method is invoked. The default implementation of __getattribute__() does the following:

First, it checks if y is a data descriptor. If so, it returns the result of its __get__() method.
Next, it tries to find 'y' in the instance dictionary of x and return it.
Next, it checks if y is a non_data descriptor. If so, it returns the result of its __get__() method.
Next, it tries to find 'y' in the class dictionary of the type of x and return it.
Finally, if none of the above worked, it raises an AttributeError.

If __getattribute__() raises an AttributeError, x.__getattr__() is called if it is defined.

So you can control the way attribute access works in a few different ways:

Override __getattribute__
Write a __getattr__
Make the attribute a descriptor object

Let’s look at a some examples that will help understand when to use which.

Example 1: returning `None` when an attribute isn’t found

By default, accessing an attribute that doesn’t exist gives you an AttributeError:

>>> class MyClass: 
...     def __init__(self, x):
...         self.x = x
...
>>> obj = MyClass(42)
>>> obj.x
42
>>> obj.y
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyClass' object has no attribute 'y'

Let’s say we want to get None (or some other custom behaviour) when we access non-existent attributes.

Overriding `getattribute()` - the wrong way

The first approach we might consider is overriding __getattribute__(). Try to spot the problem with the code below:

class MyClass:
    def __init__(self, x):
        self.x = x
        
    def __getattribute__(self, name):
        try:
            return self.__dict__[name]
        except KeyError:
            return None

If we try to access a non-existent attribute of a MyClass instance, we get a RecursionError!

>>> obj = MyClass(42)
>>> obj.y
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 6, in __getattribute__
  File "<stdin>", line 6, in __getattribute__
  File "<stdin>", line 6, in __getattribute__
  [Previous line repeated 996 more times]
RecursionError: maximum recursion depth exceeded

The problem is this line in MyClass.__getattribute__(), which itself calls MyClass.__getattribute__():

return self.__dict__[name]  # calls self.__getattr__("__dict__") - infinite recursion!

Overriding `getattribute()` - the less wrong, but still wrong, way

To prevent MyClass.__getattribute__() from calling itself recursively, we could use the base class’, i.e., object’s implementation of __getattribute__() instead. If that fails, we return None:

class MyClass:
    def __init__(self, x):
        self.x = x
        
    def __getattribute__(self, name):
        try:
            return object.__getattribute__(self, name)
        except AttributeError:
            return None

This gives us the behaviour we want:

>>> obj = MyClass(42)
>>> obj.x
42
>>> obj.y  # no error, returns None
>>>

Using `getattr` - the right approach

The more elegant solution is to write a __getattr__() method. Recall that the default implementation of __getattribute__() will raise AttributeError when it doesn’t find an attribute. When that happens, __getattr__() is called:

class MyClass:
    def __init__(self, x):
        self.x =x
        
    def __getattr__(self, name):
        return None

>>> obj = MyClass(42)
>>> obj.x  # OK - __getattribute__ will find attribue 'x'
42
>>> obj.y  # OK - __getattr__ is invoked and returns None

Example 2 - an attribute that updates itself when accessed

As another example, consider writing an attribute that is updated every time you access it:

>>> obj = MyClass()
>>> obj.x
0
>>> obj.x
1
>>> obj.x
2

Using a non-data descriptor

When you want to control the access behaviour of a specific attribute, a descriptor is generally the right tool for the job.

The IncrementingAttribute.__get__() method below returns 0 the first time it is called for an instance. Subsequently, it returns 1, 2, 3, etc. It does this by storing an internal attribute _value in the instance.

class IncrementingAttribute:
    def __get__(self, obj, owner=None):
        # rdon't allowaccessing as a class attribute
        if obj is None:
            raise AttributeError()
            
        # if accessing for the first time, return 0,
        # otherwise, increment by 1 and return the result
        if not hasattr(obj, "_value"):
            obj._value = -1
        obj._value += 1
        return obj._value

class MyClass:
    x = IncrementingAttribute()

The attribute x is updated each time it is accessed:

>>> obj = MyClass()
>>> obj.x
0
>>> obj.x
1

What happens if we “reset” the value of x and then try to access it?

>>> obj.x
1
>>> obj.x = 0  # reset to 0
>>> obj.x
0
>>> obj.x
0
>>> obj.x
0

Oh no! x has somehow lost the ability to update itself. To understand why, it’s first important to understand what happens when the following line is executed:

obj.x =  0

Because x does not define a __set__() method, this line will fall back to the default behaviour of setting an attribute. That is, it will add an entry 'x' to the instance dictionary of obj. You can see this by inspecting the __dict__ attribute before and after setting the attribute x:

>>> obj = MyClass()
>>> obj.x
0
>>> obj.__dict__
{'_value': 0}
>>> obj.x
1
>>> obj.__dict__
{'_value': 1}
>>> obj.x = 1   # adds entry 'x' to obj.__dict__
{'_value': 1, 'x': 1}

Recall that x is a non-data descriptor, that is, it only defines a __get__() method. Because __getattribute__() looks in the instance dictionary before checking for non-data descriptors, it finds x in the instance dictionary and returns that.

Using a data descriptor

The solution to the above problem is to also define a __set__() method in our descriptor class:

class IncrementingAttribute:
    def __get__(self, obj, owner=None):
        # don't allowaccessing as a class attribute
        if obj is None:
            raise AttributeError()
        # if accessing for the first time, return 0,
        # otherwise, increment by 1 and return the result
        if not hasattr(obj, "_value"):
            obj._value = -1
        obj._value += 1
        return obj._value
        
    def __set__(self, obj, value):
        obj._value = value - 1

class MyClass:
    x = IncrementingAttribute()

Now, we get the expected reset behaviour:

>>> obj = MyClass()
>>> obj.x
0
>>> obj.x
1
>>> obj.x = 0
>>> obj.x
0
>>> obj.x
1

What about `property`?

You could also use a property to achieve the same result:

class MyClass:
    @property
    def x(self):
        if not hasattr(self, "_value"):
            self._value = -1
        self._value += 1
        return self._value

    @x.setter
    def x(self, value):
        self._value = value - 1

In fact, @property is really just a way to define a (data) descriptor! So if you find yourself writing properties that all look the same (either in the same class or across different classes), that’s a sign that you should write a descriptor instead.

Class and instance attributes

Descriptors

Arguments to the __get__() method

Data descriptors and non-data descriptors

Customizing attribute access

Example 1: returning None when an attribute isn’t found

Overriding __getattribute__() - the wrong way

Overriding __getattribute__() - the less wrong, but still wrong, way

Using __getattr__ - the right approach