Python Pandas Matrix Multiplication Multiple Operations in one

Question

I am trying to use the pandas matrix multiplication. I have four steps

  1. Create 2500 new columns by dividing 2500 columns by 2500 columns
  2. Use the result above and subtract 1 from all the columns
  3. Use the result of point 2 and divide all 2500 columns by 1 column
  4. Use the result of point 3 and add 1 column to all 2500

Is there a way to write this better, perhaps in a single line of code? It is currently quite slow and is CPU heavy.

The code snippet is below:

df_baseScenario[end_columns+ "fwd_rate"] =  df_baseScenario[start_columns].divide(df_baseScenario[end_columns].values,axis=0)
df_baseScenario[end_columns+ "fwd_rate"] =  df_baseScenario[end_columns+ "fwd_rate"].subtract(1)
df_baseScenario[end_columns+ "fwd_rate"] =  df_baseScenario[end_columns+ "fwd_rate"].multiply(1/df_baseScenario['Calc_Period'].values,axis=0)
df_baseScenario[end_columns+ "fwd_rate"] =  df_baseScenario[end_columns+ "fwd_rate"].add(df_baseScenario['Rate Index Spread'].values,axis=0)

Show source
| numpy   | pandas   | python   | performance   | matrix-multiplication   2017-01-03 14:01 2 Answers

Answers ( 2 )

  1. 2017-01-03 14:01

    I create code by text in question, no by sample code:

    df_baseScenario = pd.DataFrame({'Calc_Period':[1,2,3],
                       'Rate Index Spread':[4,5,6],
                       'D':[1,3,5],
                       'E':[5,3,6],
                       'F':[7,4,3],
                       'G':[4,2,9],
                       'H':[5,7,8],
                       'I':[8,5,4]})
    
    print (df_baseScenario)
    start_columns = ['D','E','F']
    end_columns = ['G','H','I']
    
    df = df_baseScenario[start_columns].mul(df_baseScenario[end_columns].values)  \
                                        .sub(1) \
                                        .div(df_baseScenario['Calc_Period'], axis=0) \
                                        .add(df_baseScenario['Rate Index Spread'], axis=0)
    #assign calumne names by end_columns
    df.columns = end_columns
    #add prefix
    df = df.add_suffix('fwd_rate')
    print (df)
       Gfwd_rate  Hfwd_rate  Ifwd_rate
    0   7.000000  28.000000  59.000000
    1   7.500000  15.000000  14.500000
    2  20.666667  21.666667   9.666667
    

    #concat columns to original
    print (pd.concat([df_baseScenario, df], axis=1))
       Calc_Period  D  E  F  G  H  I  Rate Index Spread  Gfwd_rate  Hfwd_rate  \
    0            1  1  5  7  4  5  8                  4   7.000000  28.000000   
    1            2  3  3  4  2  7  5                  5   7.500000  15.000000   
    2            3  5  6  3  9  8  4                  6  20.666667  21.666667   
    
       Ifwd_rate  
    0  59.000000  
    1  14.500000  
    2   9.666667  
    
  2. 2017-01-03 14:01

    To make it a bit shorter and more readable, you can employ chaining as follows:

    df = df_baseScenario
    df["fwd_rate"] = df[start_cols] \
        .divide(df[end_cols].values, axis=0) \
        .subtract(1) \
        .multiply(1/df['Calc_Period'], axis=0) \
        .add(df['Rate Index Spread'], axis=0)
    
◀ Go back