Yi Zhang | Why norm loss function comes from the assumption that data is drawn from a Gaussian distribution?

I was reading the paper “A Review on Deep Learning Techniques for Video Prediction” by S. Oprea.

In Section 2.4 The Devil is in the Loss Function, It says: “ Most of distance-based loss functions, such as based on lp norm, come from the assumption that data is drawn from a Gaussian distribution.”

I could not understand the how. Then I found this blog that reminds me the distance minimization comes in when we are trying to fit a data distribution assuming a Gaussian distribution. I will put the formulas below for refreshing memory.

With a dataset $\Chi = \{ x^{(1)},x^{(2)},...,x^{(m)} \}$ that comes from an unknown distribution $p_{data}(x)$, we want to model it with $p_{model}(x;\theta)$. Using max-likelihood, $\theta_{ML} = p_{model}(\Chi;\theta) = \Pi^{m}_{i=1}p_{model}(x^{(i)};\theta)$ Using log operation for precision on computer, $\theta_{ML} = \Sigma^{m}_{i=1}log p_{model}(x^{(i)};\theta)$ Assuming that $p_{model}(x;\theta)$ follows a Gaussian distribution $p_{model}(x;\theta) \sim \mathcal{N}(\mu,1)$,

\[\theta_{ML} = arg max_{\theta} \Sigma^{m}_{i=1}log p_{model}(x^{(i)};\theta) \\ \theta_{ML} = arg max_{\theta} \Sigma^{m}_{i=1}log \frac{1}{\sqrt{2\pi}} exp(-(x-\theta)^2) \\ \theta_{ML} = arg max_{\theta} \Sigma^{m}_{i=1}log \frac{1}{\sqrt{2\pi}} +(-(x-\theta)^2) \\ \theta_{ML} = arg max_{\theta} -m log\sqrt{2\pi}-\Sigma^{m}_{i=1}(x-\theta)^2 \\ \theta_{ML} = arg max_{\theta} -\Sigma^{m}_{i=1}(x-\theta)^2 \\ \theta_{ML} = arg min_{\theta} \Sigma^{m}_{i=1}(x-\theta)^2 \\\]