Python's groupby pitfall
Python's groupby pitfall
Python function groupby has one feature which can be cause of unexpected
pitfall:
> …The returned group is itself an iterator that shares the underlying iterable > with groupby(). Because the source is shared, when the groupby() object is > advanced, the previous group is no longer visible. So, if that data is needed > later, it should be stored as a list…
Example:
from itertools import * input = ((1, "one"), (1, "one too"), (1, "also one"), (2, "two"), (2, "another two"), (2, "two again") ) grp = groupby(input, lambda p: p[0]) # grp = map(lambda p: (p[0], list(p[1])), grp) # (A) maxg, maxi = max(grp, key=lambda g: g[0]) print("maxg =", maxg, "maxi =", list(maxi))
Output without (A) is:
maxg = 2 maxi = [(2, 'two again')]
which is unexpected because we suppose to find "last" group to iterate than over it's items (they are 3!).
But with (A) is:
maxg = 2 maxi = [(2, 'two'), (2, 'another two'), (2, 'two again')]
This happens because sub-iterators of groups are not independent, iteration over groups "eats" sub-iterators too.