We introduce a new highly parallel and memory efficient deformable image registration algorithm to handle challenging clinical applications. The algorithm is based on the normalized gradient fields (NGF) distance measure and Gauss-Newton numerical optimization. By carefully analyzing the mathematical structure of the problem, a matrix-free Hessian-vector multiplication for NGF is derived, giving a highly integrated formulation. Embedding the new scheme in a full, non-linear image registration algorithm enables fast calculations on high resolutions with dramatically reduced memory consumption. The new approach provides linear scalability compared with a traditional sparse-matrix-based scheme. The algorithm is evaluated on a challenging problem from radiotherapy, where pelvis cone-beam CT and planning CT images are registered. Speedups up to a factor of 149.3 for a single Hessian-vector multiplication and of 20.3 for a complete non-linear registration are achieved.