I have created a function that replaces the NaNs in a Pandas dataframe with the means of the respective columns. I tested the function with a small dataframe and it worked. When I applied it though to a much larger dataframe (30,000 rows, 9 columns) I got the error message: IndexError: index out of bounds
The function is the following:
# The 'update' function will replace all the NaNs in a dataframe with the mean of the respective columns def update(df): # the function takes one argument, the dataframe that will be updated ncol = df.shape # number of columns in the dataframe for i in range(0 , ncol): # loops over all the columns df.iloc[:,i][df.isnull().iloc[:, i]]=df.mean()[i] # subsets the df using the isnull() method, extracting the positions # in each column where the return(df)
The small dataframe I used to test the function is the following:
0 1 2 3 0 NaN NaN 3 4 1 NaN NaN 7 8 2 9.0 10.0 11 12
Could you explain the error? Your advice will be appreciated.