Pandas Pivot Or Groupby For Dynamically Generated Columns
I have a dataframe with sales information in a supermarket. Each row in the dataframe represents an item, with several characteristics as columns. The original DataFrame is somethi
Solution 1:
One possible way to use groupby to make lists of it that can then be turned into columns:
In [24]: res = df.groupby(['ticket_number', 'ticket_price'])['item'].apply(list).apply(pd.Series)
In [25]: res
Out[25]:
                                 0       1     2
ticket_number ticket_price
001           21            tomato   candy  soup
002           12              soup    cola   NaN
003           56              beef  tomato  pork
Then, after cleaning up this result a bit:
In [27]: res.columns = ['item' + str(i + 1) for i in res.columns]
In [29]: res.reset_index()
Out[29]:
  ticket_number ticket_price   item1   item2 item3
0           001           21  tomato   candy  soup
1           002           12    soup    cola   NaN
2           003           56    beef  tomato  pork
Another possible way to create a new column which numbers the items in each group with groupby.cumcount:
In [38]: df['item_number'] = df.groupby('ticket_number').cumcount()
In [39]: df
Out[39]:
     item ticket_number ticket_price  item_number
0  tomato           001           21            0
1   candy           001           21            1
2    soup           001           21            2
3    soup           002           12            0
4    cola           002           12            1
5    beef           003           56            0
6  tomato           003           56            1
7    pork           003           56            2
And then do some reshaping:
In [40]: df.set_index(['ticket_number', 'ticket_price', 'item_number']).unstack(-1)
Out[40]:
                              item
item_number                      0       1     2
ticket_number ticket_price
001           21            tomato   candy  soup
002           12              soup    cola   NaN
003           56              beef  tomato  pork
From here, with some cleaning of the columns names, you can achieve the same as above.
The reshaping step with set_index and untack could also be done with pivot_table: df.pivot_table(columns=['item_number'], index=['ticket_number', 'ticket
_price'], values='item', aggfunc='first')
Post a Comment for "Pandas Pivot Or Groupby For Dynamically Generated Columns"