1 min readNov 5, 2019
Hi there, thank you for the explanation. But I want to ask something:
The w in the formulation is updated using w = 1/n(sum(X*g))-1/n(sum(g’*w)). Why in the coding implementation it becomes: w_new = (X * g(np.dot(w.T, X))).mean(axis=1) — g_der(np.dot(w.T, X)).mean() * w.
I don’t understand why g’ has to be averaged first before it is multiplied by w?
Really appreciate the answer. Thank you.