I am reading Probabilistic Machine Learning: An Introduction
In chapter 4 Statistics pages 109:
A empirical distribution
$p_D(y) = frac{1}{N}sum_{n=1}^Ndelta(y-y_n)$
$begin*{align}
D_{KL}(p_D(y)||p(y|theta)) &= sum_y[p_Dlog p_D - p_Dlog p(y|theta)]\
&= -H(p_D)-frac{1}{N}sum_{n=1}^Nlog p(y_n|theta)
end*{align}$
I don’t understand the second term in the last equations from my understanding
I tried to derive the second term but ended up with extra term $delta(y_n-y_n)$ which is infinity
$begin*{align}
p_Dlog p(y|theta) &= sum_y frac{1}{N}sum_{n=1}^Ndelta(y-y_n)log p(y|theta)\
&= frac{1}{N} sum_{n=1}^N sum_y delta(y-y_n)log p(y|theta)\
&= frac{1}{N} sum_{n=1}^N delta(y_n-y_n)log p(y_n|theta)\
end*{align}$
I am wondering how the equations in the textbook are derived or the delta term is handled, step by step with explantations.
New contributor
Zewen Huang is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.