Newton-Galerkin method, the most carefully guarded secret of finite element business. All examples I have seen so far feel like 'special tricks' and therefore I want to cover it once-and-for-all thoroughly, even just to remind myself.
Let us first consider searching an approximation to the zeros of a function f : \mathbb{R} \rightarrow \mathbb{R} (i.e. standard Newton's method) because this is very analogous to what we do when solving zeros of functionals. Thus, let us solve
f(x)=0. Suppose that we have a sufficiently good approximation x_0 to x. Then we attempt to compute the correct perturbation \Delta x satisfying
x = x_0 + \Delta x. Then using Taylor's expansion we get
0 = f(x) = f(x_0 + \Delta x) = f(x_0)+f'(x_0) \Delta x + \mathcal{O}(\Delta x^2). Neglecting the higher order terms and solving for \Delta x gives the approximate equation
\Delta x \approx - \frac{f(x_0)}{f'(x_0)}. Then we have a new approximation to x:
x \approx x_0 - \frac{f(x_0)}{f'(x_0)}, which you might remember from high school (back then we used some stupid geometrical argument).
Similarly, let us consider a non-linear PDE for which the weak formulation is: find u \in V such that
a(u;v)=l(v),\quad \forall v \in V, where a(u;v) is linear in v, non-linear in u, and l(v) is linear. Here V is some function space with the proper boundary conditions
Example. A weak form like this could, for example, arise from the strong formulation
-\nabla \cdot (u^3 \nabla u) = f,\quad \text{in $\Omega$,} with the boundary condition u|_{\partial \Omega}=0. In this case
a(u;v)=\int_\Omega u^3 \nabla u \cdot \nabla v \,\mathrm{d}x and the usual
l(v)=\int_\Omega f v\,\mathrm{d}x.
Anyways, we again assume that we have an approximation u_0 to u and search for the correction \delta u satisfying
u = u_0 + \delta u. Then a similar Taylor's expansion as before gives
0 = a(u_0+\delta u;v) - l(v) = a(u_0;v)+D_u a(u_0;v) \delta u + \mathcal{O}(\delta u^2) - l(v), where D_u a(u_0;v) \delta u is the derivative of a with respect to the first argument into the direction \delta u evaluated at (u_0,v). Rearranging and neglecting the HOT's gives
D_u a(u_0;v) \delta u \approx l(v)-a(u_0;v). Note that the derivative D_u a(u_0;v) is a linear operator acting on \delta u and therefore in general we must discretize the previous equation before solving for \delta u. However, first we should compute an explicit representation for D_u a(u_0;v) \delta u and this depends on how a is defined. An example follows.
Example. Suppose
a(u;v)=\int_\Omega u^3 \nabla u \cdot \nabla v\,\mathrm{d}x as in the previous example. Then by the definition of the directional derivative
\begin{align*}
D_u a(u_0;v) \delta u &= \frac{\partial a(u_0+t \delta u;v)}{\partial t}\Big|_{t=0}\\
&= \frac{\partial}{\partial t} \int_\Omega (u_0 + t \delta u)^3 \nabla(u+t \delta u) \cdot \nabla v\,\mathrm{d}x \Big|_{t=0} \\
&= \int_\Omega 3 u_0^2 \delta u \nabla u_0 \cdot \nabla v\,\mathrm{d}x + \int_\Omega u_0^3 \nabla \delta u \cdot \nabla v\,\mathrm{d}x.
\end{align*}
Now one step of the Newton-Galerkin iteration reads: find \delta u \in V such that
D_u a(u_0;v) \delta u = l(v)-a(u_0;v) holds for every v \in V.