## numpy: Compressing block matrix

Consider a matrix `M1`

giving values for all combinations `x,y`

. Consider a partition `f(x)->X`

and a partition `g(y)->Y`

. Furthermore consider an operation `p(A)`

on a set `A`

of numbers, i.e. `max(A)`

or `sum(A)`

.

The mappings `f,g`

can be used to create from `M1`

a block matrix `M2`

where all `x`

that are mapped to the same `X`

are adjacent, and the same for all `y`

.

This matrix `M2`

has a block for each combination of the 'sets' `X,Y`

.

Now I would like to condense this matrix `M2`

into another matrix `M3`

by applying `p`

on each block separately. `M3`

has one value for each combination of `X,Y`

.

Ideally, I would like to skip the transformation of `M1`

into `M2`

using `f`

and `g`

on the fly.

What would be the most efficient way to perform such operation and would it be possible to deploy `numpy`

or `scipy`

for it?

**Special case:** Actually, in my case `x`

and `y`

are identical and there is only one function `f`

applied to both of them. I only care about the part of `M2`

that is under the diagonal.

Show source

## Answers ( 2 )

The most straightforward way I can think of to do this, although perhaps not the most efficient (especially if your matrix is huge), is to convert your matrix to a one-dimensional array, and then have corresponding arrays for the partition group indices

`X`

and`Y`

. You can then group by the partition group indices and finally restructure the matrix back into its original form.For example, if your matrix is

and your partitions are

From that point, there are several ways to implement the reshaping and subsequent grouping. You can do it with Pandas, for instance, by constructing a

`DataFrame`

and using its`stack()`

method to "stack" all the rows on top of each other in a single column, indexed by their original row and column indices.(I have truncated the output for readability, and I trust that you can evaluate the rest of these examples yourself if you want to see their output.) You can then add columns representing the partition group indices:

Then you can group by those indices and apply your aggregation function of choice.

You will have to define

`p`

(or find an existing definition) such that it takes a one-dimensional Numpy array and returns a single number. If you want to use something like`sum()`

, you can just use`st.groupby(...).sum()`

because Pandas has built-in support for that and a few other standard functions, but`agg`

is general and works for any reduction function`p`

you can provide.Finally, the

`unstack()`

method will convert the DataFrame back into the properly 2D "matrix form", and then if you want you can use the`as_matrix()`

method to turn it back into a pure Numpy array.If you don't want to bring in Pandas, there are other libraries that do the same thing. You might look at numpy-groupies, for example, but as I write this it appears to be only compatible with Python 2. However I haven't found any library that does true two-dimensional grouping, which you might need if you are working with very large matrices, large enough that having an additional 2 or 3 copies of them would exhaust the available memory.

Let

`M1`

be a numpy`n`

x`m`

array. You can start by determining which partitions you have. The set constructor removes repeated entries, but orders them arbitrarily. I sort them just to have a well-defined ordering:To build a block matrix for each

`X,Y`

you can use numpy boolean indexing along with the grid-construction helper`ix_`

to select only rows and columns belonging to`X`

and`Y`

, respectively. Finally, apply`p`

to the selected submatrix:The partitions

`f`

and`g`

have to apply element-wise to numpy arrays for this to work. As mentioned in the other answer the`numpy.vectorize`

decorator can be used to achieve this.To give an example: