Python's groupby pitfall

Python's groupby pitfall

Python function groupby has one feature which can be cause of unexpected pitfall:

> …The returned group is itself an iterator that shares the underlying iterable > with groupby(). Because the source is shared, when the groupby() object is > advanced, the previous group is no longer visible. So, if that data is needed > later, it should be stored as a list…

Example:

from itertools import *

input = ((1, "one"), (1, "one too"), (1, "also one"),
         (2, "two"), (2, "another two"), (2, "two again")
)

grp = groupby(input, lambda p: p[0])
# grp = map(lambda p: (p[0], list(p[1])), grp)   # (A)
maxg, maxi = max(grp, key=lambda g: g[0])
print("maxg =", maxg, "maxi =", list(maxi))

Output without (A) is:

maxg = 2 maxi = [(2, 'two again')]

which is unexpected because we suppose to find "last" group to iterate than over it's items (they are 3!).

But with (A) is:

maxg = 2 maxi = [(2, 'two'), (2, 'another two'), (2, 'two again')]

This happens because sub-iterators of groups are not independent, iteration over groups "eats" sub-iterators too.