well, shouldn't it be argmin instead of argmax while calculating BMU?
`max_activation_unit = np.argmax(distances)`
as you want to consider the unit that is closes to input which means argmin if you are using euclidean distance. argmax can be used only if you are using cosine similarity
Also, why do you divide the weight matrix by its norm and reshape it to (matrix_side, matrix_side, 1)?