"We can approximate p(x,y)=p(x)p(y|x) using the empirical data distribution p(x,y) =1/N sum_{n=1}^N delta_{x_n}(x) delta_{y_n}(y)"

Kameron Wang

Kameron Wang

Answered question

2022-11-18

Understanding Empirical Data Distribution
"We can approximate p ( x , y ) = p ( x ) p ( y | x ) using the empirical data distribution
p ( x , y ) = 1 N n = 1 N δ x n ( x ) δ y n ( y ) "
In another part of the paper they say p ( y | y n ) = δ y n ( y ).
I have some background in probability but none in statistics; I was able to figure out what an Empirical CDF is, but not a pdf like here, so I'm not sure exactly what the authors are doing. Does the δ refer to the Dirac delta distribution?

Answer & Explanation

Ryan Davies

Ryan Davies

Beginner2022-11-19Added 18 answers

Step 1
The empirical data distribution is a probability distribution which allocates probability 1/N to point in the training dataset and 0 otherwise. More formally, it is supported on N points ( x i , y i ) of the training set each having probability mass 1/N$ and so all other points have mass 0.
Step 2
Yes, δ x n ( x ) and δ y n ( y ) are indicator functions which are 1 when x = x n and 0 otherwise; similarly for δ y n ( y ).
P(x,y) is the joint probability mass and you can check that it is 0 for any point not in the training set and 1/N for a point in the training set.

Do you have a similar question?

Recalculate according to your conditions!

New Questions in High school statistics

Ask your question.
Get an expert answer.

Let our experts help you. Answer in as fast as 15 minutes.

Didn't find what you were looking for?