## Optimizing numpy array multiplication: * faster than numpy.dot?

Question

Questions:

1) How is it that `numpy.dot()` is slower than `*` in the example code below when BLAS is being used?

2) Is there a way that `numpy.dot()` can be implemented instead of `*` in this case for faster array multiplication? I think that I am missing a piece of crucial information which would answer question 1 and mean that `numpy.dot()` is at least as fast as `*`, if not faster.

Details included below. Thanks in advance for answers and help.

Details:

I am writing a program which solves coupled PDEs using python 2.7 (64-bit), numpy 1.11.2, Anaconda2 on Windows 7. To improve the accuracy of the program output I need to use large arrays (shape (2, 2^14) and larger) and small integration steps, resulting in a gigantic number of array multiplication operations per simulation which I need to optimize for speed.

Having looked around, it seems as though `numpy.dot()` should be used for faster array multiplication with respect to `*` as long as BLAS is installed and working with numpy. This is frequently recommended. However, when I use the timer script below, `*` is faster than `numpy.dot()` by at least a factor of 7. In some cases this increases to factors >1000:

``````from __future__ import division
import numpy as np
import timeit

def dotter(a, b):
return np.dot(a, b)

def timeser(a, b):
return a*b

def wrapper(func, a, b):
def wrapped():
return func(a, b)
return wrapped

size = 100
num = int(3e5)

a = np.random.random_sample((size, size))
b = np.random.random_sample((size, size))

wrapped = wrapper(dotter, a, b)
dotTime = timeit.timeit(wrapped, number=num)/num
print "\nTime for np.dot: ", dotTime

wrapped = wrapper(timeser, a, b)
starTime = timeit.timeit(wrapped, number=num)/num
print "\nTime for *: ", starTime

print "dotTime / starTime: ", dotTime/starTime
``````

This outputs:

``````Time for np.dot:  8.58201189949e-05
Time for *:  1.07564737429e-05
dotTime / starTime:  7.97846218436
``````

Both `numpy.dot()` and `*` are being distributed across multiple cores, which I think suggests that BLAS is working to some extent, at least:

Looking at `numpy.__config__.show()` it appears as though I am using BLAS and lapack (although not openblas_lapack?):

``````lapack_opt_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_opt_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
openblas_lapack_info:
NOT AVAILABLE
lapack_mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_mkl_info:
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
``````

Show source
2017-01-05 15:01 1 Answers

## Answers to Optimizing numpy array multiplication: * faster than numpy.dot? ( 1 )

1. `np.dot` calls matrix-matrix multiplication while `*` is element wise multiplication. The symbol for matrix-matrix multiplication is `@` for Python 3.5+.