[Tutor] question about descriptors

Discussion:

Albert-Jan Roskam

2015-11-07 12:53:11 UTC

p, li { white-space: pre-wrap; }

Hi,
First, before I forget, emails from hotmail/yahoo etc appear to end up in the spam folder these days, so apologies in advance if I do not appear to follow up to your replies.
Ok, now to my question. I want to create a class with read-only attribute access to the columns of a .csv file. E.g. when a file has a column named 'a', that column should be returned as list by using instance.a. At first I thought I could do this with the builtin 'property' class, but I am not sure how. I now tried to use descriptors (__get__ and __set__), which are also used by ' property' (See also: https://docs.python.org/2/howto/descriptor.html).

In the " if __name__ == '__main__'" section, [a] is supposed to be a shorthand for == equivalent to [b]. But it's not.I suspect it has to do with the way attributes are looked up. So once an attribute has been found in self.__dict__ aka "the usual place", the search stops, and __get__ is never called. But I may be wrong. I find the __getatttribute__, __getattr__ and __get__ distinction quite confusing.
What is the best approach to do this? Ideally, the column values should only be retrieved when they are actually requested (the .csv could be big).
Thanks in advance!

import csv
from cStringIO import StringIO

class AttrAccess(object):

def __init__(self, fileObj):
self.__reader = csv.reader(fileObj, delimiter=";")
self.__header = self.__reader.next()
#[setattr(self, name, self.__get_column(name)) for name in self.header]
self.a = range(10)

@property
def header(self):
return self.__header

def __get_column(self, name):
return [record[self.header.index(name)] for record in self.__reader] # generator expression might be better here.

def __get__(self, obj, objtype=type):
print "__get__ called"
return self.__get_column(obj)
#return getattr(self, obj)

def __set__(self, obj, val):
raise AttributeError("Can't set attribute")

if __name__ == " __main__":
f = StringIO("a;b;c\n1;2;3\n4;5;6\n7;8;9\n")
instance = AttrAccess(f)
print instance.a # [a] does not call __get__. Looks, and finds, in self.__dict__?
print instance.__get__("a") # [b] this is supposed to be equivalent to [a]
instance.a = 42 # should throw AttributeError!

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Alan Gauld

2015-11-07 13:31:34 UTC

Permalink

On 07/11/15 12:53, Albert-Jan Roskam wrote:
> Ok, now to my question.
> I want to create a class with read-only attribute access
> to the columns of a .csv file.

Can you clarify what you mean by that?
The csvreader is by definition read only.
So is it the in-memory model that you want read-only?
Except you don't really have an in-memory model that I can see?

> E.g. when a file has a column named 'a', that column should
> be returned as list by using instance.a.

That appears to be a separate issue to whether the returned
list is read-only or not? As ever the issue of dynamically
naming variables at run time and then figuring out how to
access them later raises its head. Its hardly ever a good plan.

> At first I thought I could do this with the builtin 'property'
> class, but I am not sure how.

To use property I think you'd need to know the names of your
columns in advance. (Or dynamically build your classes)

> I now tried to use descriptors (__get__ and __set__),
> which are also used by ' property'

> In the " if __name__ == '__main__'" section, [a] is supposed
> to be a shorthand for == equivalent to [b].

I have no idea what you mean by that sentence?

> class AttrAccess(object):
>
> def __init__(self, fileObj):
> self.__reader = csv.reader(fileObj, delimiter=";")
> self.__header = self.__reader.next()
> @property
> def header(self):
> return self.__header
>
> def __get_column(self, name):
> return [record[self.header.index(name)] for record in self.__reader] # generator expression might be better here.

You should only get the index once otherwise it could add a lot of time
for a long file(especially if there were a lot of columns)

def __get_column(self, name):
idx = self.header.index(name)
return [record[idx] for record in self.__reader]

> def __get__(self, obj, objtype=type):
> print "__get__ called"
> return self.__get_column(obj)
> #return getattr(self, obj)
>
> def __set__(self, obj, val):
> raise AttributeError("Can't set attribute")

If you want everything read-only should this not be __setattr__()?

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Peter Otten

2015-11-07 14:03:44 UTC

Permalink

Albert-Jan Roskam wrote:

>
>
> p, li { white-space: pre-wrap; }
>
> Hi,
> First, before I forget, emails from hotmail/yahoo etc appear to end up in
> the spam folder these days, so apologies in advance if I do not appear to
> follow up to your replies. Ok, now to my question. I want to create a
> class with read-only attribute access to the columns of a .csv file. E.g.
> when a file has a column named 'a', that column should be returned as list
> by using instance.a. At first I thought I could do this with the builtin
> 'property' class, but I am not sure how. I now tried to use descriptors
> (__get__ and __set__), which are also used by ' property' (See also:
> https://docs.python.org/2/howto/descriptor.html).
>
> In the " if __name__ == '__main__'" section, [a] is supposed to be a
> shorthand for == equivalent to [b]. But it's not.I suspect it has to do
> with the way attributes are looked up. So once an attribute has been found
> in self.__dict__ aka "the usual place", the search stops, and __get__ is
> never called. But I may be wrong. I find the __getatttribute__,
> __getattr__ and __get__ distinction quite confusing. What is the best
> approach to do this? Ideally, the column values should only be retrieved
> when they are actually requested (the .csv could be big). Thanks in
> advance!
>
>
>
> import csv
> from cStringIO import StringIO
>
>
> class AttrAccess(object):
>
>
> def __init__(self, fileObj):
> self.__reader = csv.reader(fileObj, delimiter=";")
> self.__header = self.__reader.next()
> #[setattr(self, name, self.__get_column(name)) for name in
> #[self.header]
> self.a = range(10)
>
>
> @property
> def header(self):
> return self.__header
>
> def __get_column(self, name):
> return [record[self.header.index(name)] for record in
> self.__reader] # generator expression might be better here.
>
> def __get__(self, obj, objtype=type):
> print "__get__ called"
> return self.__get_column(obj)
> #return getattr(self, obj)
>
> def __set__(self, obj, val):
> raise AttributeError("Can't set attribute")
>
> if __name__ == " __main__":
> f = StringIO("a;b;c\n1;2;3\n4;5;6\n7;8;9\n")
> instance = AttrAccess(f)
> print instance.a # [a] does not call __get__. Looks, and finds, in
> self.__dict__?
> print instance.__get__("a") # [b] this is supposed to be equivalent
> to [a]
> instance.a = 42 # should throw AttributeError!

I think the basic misunderstandings are that

(1) the __get__() method has to be implemented by the descriptor class
(2) the descriptor instances should be attributes of the class that is
supposed to invoke __get__(). E. g.:

class C(object):
x = decriptor()

c = C()

c.x # invoke c.x.__get__(c, C) under the hood.

As a consequence you need one class per set of attributes, instantiating the
same AttrAccess for csv files with differing layouts won't work.

Here's how to do it all by yourself:

class ReadColumn(object):
def __init__(self, index):
self._index = index
def __get__(self, obj, type=None):
return obj._row[self._index]
def __set__(self, obj, value):
raise AttributeError("oops")

def first_row(instream):
reader = csv.reader(instream, delimiter=";")

class Row(object):
def __init__(self, row):
self._row = row

for i, header in enumerate(next(reader)):
setattr(Row, header, ReadColumn(i))

return Row(next(reader))

f = StringIO("a;b;c\n1;2;3\n4;5;6\n7;8;9\n")
row = first_row(f)
print row.a
row.a = 42

Instead of a custom descriptor you can of course use the built-in property:

for i, header in enumerate(next(reader)):
setattr(Row, header, property(lambda self, i=i: self._row[i]))

In many cases you don't care about the specifics of the row class and use
collections.namedtuple:

def rows(instream):
reader = csv.reader(instream, delimiter=";")
Row = collections.namedtuple("Row", next(reader))
return itertools.imap(Row._make, reader)

f = StringIO("a;b;c\n1;2;3\n4;5;6\n7;8;9\n")
row = next(rows(f))
print row.a
row.a = 42

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Albert-Jan Roskam

2015-11-11 08:08:46 UTC

Permalink

<snip>

> I think the basic misunderstandings are that
>
> (1) the __get__() method has to be implemented by the descriptor class
> (2) the descriptor instances should be attributes of the class that is
> supposed to invoke __get__(). E. g.:
>
> class C(object):
> x = decriptor()
>
> c = C()
>
> c.x # invoke c.x.__get__(c, C) under the hood.

Exactly right, that was indeed my misunderstanding! I was thinking about __get__ and __set__ in the same terms as e.g. __getitem__ and __setitem__

> As a consequence you need one class per set of attributes, instantiating the
> same AttrAccess for csv files with differing layouts won't work.

That is no problem at all for me. One instance per file will be fine.

> Here's how to do it all by yourself:
>
> class ReadColumn(object):
> def __init__(self, index):
> self._index = index
> def __get__(self, obj, type=None):
> return obj._row[self._index]
> def __set__(self, obj, value):
> raise AttributeError("oops")

This appears to return one value, whereas I wanted I wanted to return all values of a column, ie as many values as there are rows.
But the logic probably won't change. Same applies to the use of namedtuple, I suppose (?). I have never used namedtuple like namedtuple("Column", self.header)(*self.columns).

> def first_row(instream):
> reader = csv.reader(instream, delimiter=";")
>
> class Row(object):
> def __init__(self, row):
> self._row = row
>
> for i, header in enumerate(next(reader)):
> setattr(Row, header, ReadColumn(i))
>
> return Row(next(reader))
>
>
> f = StringIO("a;b;c\n1;2;3\n4;5;6\n7;8;9\n")
> row = first_row(f)
> print row.a
> row.a = 42
>
> Instead of a custom descriptor you can of course use the built-in property:
>
> for i, header in enumerate(next(reader)):
> setattr(Row, header, property(lambda self, i=i: self._row[i]))

This seems most attractive/straightforward to me.

> In many cases you don't care about the specifics of the row class and use
> collections.namedtuple:
>
>
> def rows(instream):
> reader = csv.reader(instream, delimiter=";")
> Row = collections.namedtuple("Row", next(reader))
> return itertools.imap(Row._make, reader)
>
>
> f = StringIO("a;b;c\n1;2;3\n4;5;6\n7;8;9\n")
> row = next(rows(f))
> print row.a
> row.a = 42

Thanks a lot for helping me!

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Steven D'Aprano

2015-11-07 14:24:58 UTC

Permalink

On Sat, Nov 07, 2015 at 12:53:11PM +0000, Albert-Jan Roskam wrote:

[...]
> Ok, now to my question. I want to create a class with read-only
> attribute access to the columns of a .csv file. E.g. when a file has a
> column named 'a', that column should be returned as list by using
> instance.a. At first I thought I could do this with the builtin
> 'property' class, but I am not sure how.

90% of problems involving computed attributes (including "read-only"
attributes) are most conveniently solved with `property`, but I think
this may be an exception. Nevertheless, I'll give you a solution in
terms of `property` first.

I'm too busy/lazy to handle reading from a CSV file, so I'll fake it
with a dict of columns.

class ColumnView(object):
_data = {'a': [1, 2, 3, 4, 5, 6],
'b': [1, 2, 4, 8, 16, 32],
'c': [1, 10, 100, 1000, 10000, 100000],
}
@property
def a(self):
return self._data['a'][:]
@property
def b(self):
return self._data['b'][:]
@property
def c(self):
return self._data['c'][:]

And in use:

py> cols = ColumnView()
py> cols.a
[1, 2, 3, 4, 5, 6]
py> cols.a = []
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: can't set attribute

Now, some comments:

(1) You must inherit from `object` for this to work. (Or use Python 3.)
It won't work if you just say "class ColumnView:", which would make it a
so-called "classic" or "old-style" class. You don't want that.

(2) Inside the property getter functions, I make a copy of the lists
before returning them. That is, I do:

return self._data['c'][:]

rather than:

return self._data['c']

The empty slice [:] makes a copy. If I did not do this, you could mutate
the list (say, by appending a value to it, or deleting items from it)
and that mutation would show up the next time you looked at the column.

(3) It's very tedious having to create a property for each column ahead
of time. But we can do this instead:

def make_getter(key):
def inner(self):
return self._data[key][:]
inner.__name__ = key
return property(inner)

class ColumnView(object):
_data = {'a': [1, 2, 3, 4, 5, 6],
'b': [1, 2, 4, 8, 16, 32],
'c': [1, 10, 100, 1000, 10000, 100000],
}
for key in _data:
locals()[key] = make_getter(key)
del key

and it works as above, but without all the tedious manual creation of
property getters.

Do you understand how this operates? If not, ask, and someone will
explain. (And yes, this is one of the few times that writing to locals()
actually works!)

(4) But what if you don't know what the columns are called ahead of
time? You can't use property, or descriptors, because you don't know
what to call the damn things until you know what the column headers are,
and by the time you know that, the class is already well and truly
created. You might think you can do this:

class ColumnView(object):
def __init__(self):
# read the columns from the CSV file
self._data = ...
# now create properties to suit
for key in self._data:
setattr(self, key, property( ... ))

but that doesn't work. Properties only perform their "magic" when they
are attached to the class itself. By setting them as attributes on the
instance (self), they lose their power and just get treated as ordinary
attributes. To be technical, we say that the descriptor protocol is only
enacted when the attribute is found in the class, not in the instance.

You might be tempted to write this instead:

setattr(self.__class__, key, property( ... ))

but that's even worse. Now, every time you create a new ColumnView
instance, *all the other instances will change*. They will grown new
properties, or overwrite existing properties. You don't want that.

Fortunately, Python has an mechanism for solving this problem:
the `__getattr__` method and friends.

class ColumnView(object):
_data = {'a': [1, 2, 3, 4, 5, 6],
'b': [1, 2, 4, 8, 16, 32],
'c': [1, 10, 100, 1000, 10000, 100000],
}
def __getattr__(self, name):
if name in self._data:
return self._data[name][:]
else:
raise AttributeError
def __setattr__(self, name, value):
if name in self._data:
raise AttributeError('read-only attribute')
super(ColumnView, self).__setattr__(name, value)
def __delattr__(self, name):
if name in self._data:
raise AttributeError('read-only attribute')
super(ColumnView, self).__delattr__(name)

--
Steve
_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Albert-Jan Roskam

2015-11-11 08:33:17 UTC

Permalink

> Date: Sun, 8 Nov 2015 01:24:58 +1100
> From: ***@pearwood.info
> To: ***@python.org
> Subject: Re: [Tutor] question about descriptors
>
> On Sat, Nov 07, 2015 at 12:53:11PM +0000, Albert-Jan Roskam wrote:
>
> [...]
> > Ok, now to my question. I want to create a class with read-only
> > attribute access to the columns of a .csv file. E.g. when a file has a
> > column named 'a', that column should be returned as list by using
> > instance.a. At first I thought I could do this with the builtin
> > 'property' class, but I am not sure how.
>
> 90% of problems involving computed attributes (including "read-only"
> attributes) are most conveniently solved with `property`, but I think
> this may be an exception. Nevertheless, I'll give you a solution in
> terms of `property` first.
>
> I'm too busy/lazy to handle reading from a CSV file, so I'll fake it
> with a dict of columns.

Actually, I want to make this work for any iterable, as long as I can get the header names and as long as it returns one record per iteration.

> class ColumnView(object):
> _data = {'a': [1, 2, 3, 4, 5, 6],
> 'b': [1, 2, 4, 8, 16, 32],
> 'c': [1, 10, 100, 1000, 10000, 100000],
> }
> @property
> def a(self):
> return self._data['a'][:]
> @property
> def b(self):
> return self._data['b'][:]
> @property
> def c(self):
> return self._data['c'][:]

Interesting. I never would have thought to define a separate class for this.

> And in use:
>
> py> cols = ColumnView()
> py> cols.a
> [1, 2, 3, 4, 5, 6]
> py> cols.a = []
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> AttributeError: can't set attribute
>
>
>
> Now, some comments:
>
> (1) You must inherit from `object` for this to work. (Or use Python 3.)
> It won't work if you just say "class ColumnView:", which would make it a
> so-called "classic" or "old-style" class. You don't want that.

Are there any use cases left where one still must use old-style classes? Or should new code always inherit from object (unless one want to inherit from another "true" class, of course).

> (2) Inside the property getter functions, I make a copy of the lists
> before returning them. That is, I do:
>
> return self._data['c'][:]
>
> rather than:
>
> return self._data['c']
>
>
> The empty slice [:] makes a copy. If I did not do this, you could mutate
> the list (say, by appending a value to it, or deleting items from it)
> and that mutation would show up the next time you looked at the column.

These mutability problems always make me pull my hair out! :-) I like the [:] notation, but:

In [1]: giant = range(10 ** 7)

In [2]: %timeit copy1 = giant[:]
10 loops, best of 3: 97 ms per loop

In [3]: from copy import copy

In [4]: %timeit copy2 = copy(giant)
10 loops, best of 3: 90 ms per loop

In [5]: import copy

In [6]: %timeit copy2 = copy.copy(giant)
10 loops, best of 3: 88.6 ms per loop

Hmmm, wicked, when I looked earlier this week the difference appear to be bigger.

> (3) It's very tedious having to create a property for each column ahead
> of time. But we can do this instead:
>
>
> def make_getter(key):
> def inner(self):
> return self._data[key][:]
> inner.__name__ = key
> return property(inner)
>
>
> class ColumnView(object):
> _data = {'a': [1, 2, 3, 4, 5, 6],
> 'b': [1, 2, 4, 8, 16, 32],
> 'c': [1, 10, 100, 1000, 10000, 100000],
> }
> for key in _data:
> locals()[key] = make_getter(key)
> del key
>
>
> and it works as above, but without all the tedious manual creation of
> property getters.
>
> Do you understand how this operates? If not, ask, and someone will
> explain. (And yes, this is one of the few times that writing to locals()
> actually works!)

I think so. I still plan to write several working implementations to get a better idea about which strategy to choose.

> (4) But what if you don't know what the columns are called ahead of
> time? You can't use property, or descriptors, because you don't know
> what to call the damn things until you know what the column headers are,
> and by the time you know that, the class is already well and truly
> created. You might think you can do this:
>
> class ColumnView(object):
> def __init__(self):
> # read the columns from the CSV file
> self._data = ...
> # now create properties to suit
> for key in self._data:
> setattr(self, key, property( ... ))
>
>
> but that doesn't work. Properties only perform their "magic" when they
> are attached to the class itself. By setting them as attributes on the
> instance (self), they lose their power and just get treated as ordinary
> attributes. To be technical, we say that the descriptor protocol is only
> enacted when the attribute is found in the class, not in the instance.

Ha! That is indeed exactly what I tried! :-))

> You might be tempted to write this instead:
>
> setattr(self.__class__, key, property( ... ))

I thought about defining a classmethod, then inside it do setattr(cls, key, property( ... ))
But that is probably the same?

> but that's even worse. Now, every time you create a new ColumnView
> instance, *all the other instances will change*. They will grown new
> properties, or overwrite existing properties. You don't want that.
>
> Fortunately, Python has an mechanism for solving this problem:
> the `__getattr__` method and friends.
>
>
> class ColumnView(object):
> _data = {'a': [1, 2, 3, 4, 5, 6],
> 'b': [1, 2, 4, 8, 16, 32],
> 'c': [1, 10, 100, 1000, 10000, 100000],
> }
> def __getattr__(self, name):
> if name in self._data:
> return self._data[name][:]
> else:
> raise AttributeError
> def __setattr__(self, name, value):
> if name in self._data:
> raise AttributeError('read-only attribute')
> super(ColumnView, self).__setattr__(name, value)
> def __delattr__(self, name):
> if name in self._data:
> raise AttributeError('read-only attribute')
> super(ColumnView, self).__delattr__(name)

That also seems very straightforward. Why does "if name in self._data:" not cause a recursion? self._data calls __getattr__, which has self._data in it, which...etc.

_______________________________________________
Tutor maillist - ***@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor