Hi there, thank you for the explanation. But I want to ask something:

1 min readNov 5, 2019

The w in the formulation is updated using w = 1/n(sum(X*g))-1/n(sum(g’*w)). Why in the coding implementation it becomes: w_new = (X * g(np.dot(w.T, X))).mean(axis=1) — g_der(np.dot(w.T, X)).mean() * w.

I don’t understand why g’ has to be averaged first before it is multiplied by w?

Really appreciate the answer. Thank you.

Written by Deny Rahmalianto

Responses (1)