## NumPy: Create a dict() by grouping values in a column by another column values

Question

Suppose I have a 2-D NumPy array like the one below:

```
arr = numpy.array([[1,0], [1, 4.6], [2, 10.1], [2, 0], [2, 3.53]])
arr
Out[39]:
array([[ 1. , 0. ],
[ 1. , 4.6 ],
[ 2. , 10.1 ],
[ 2. , 0. ],
[ 2. , 3.53]])
```

What would be the fastest way to group the values in the 2nd column based on the values in the first column and create a dict out of it (the desired output is below)

```
{1: [0, 4.6], 2: [10.1, 0, 3.53]}
```

Currently I use a loop, and because the actual array I have is more than 1 million rows, and the first column has more than 5000 unique values, it's quite slow. I prefer *not* to use pandas.

Show source

## Answers ( 4 )

You may do it without

`numpy`

via using`collections.defaultdict`

. In-fact based on the example you provided, you don't even need the numpy array. Python's`list`

are good enough for your requirement. Below is the example:where final content hold by

`my_dict`

will be:You can use

`np.split`

:Here's an approach -

Sample run -

Runtime testOther approaches -

Timings -

Bottlenecks for the approaches :

Seems like with

`defaultdict`

based approach the conversion to`list`

with`.tolist()`

is proving to be heavy (>50% of total runtime) -For the other two approaches the sorting (if needed) at the start alongwith the splitting/loop-comprehension at the end are the time-consuming portions. The sorting step has the runtime (~50% of total runtime) -

Assuming that your first column is in sorted order, this will work.