Data that follows a distribution with non-finite expectation
I find it difficult to get around the
Bailee Short
Answered question
2022-06-10
Data that follows a distribution with non-finite expectation I find it difficult to get around the idea of some random variables following some distributions (such as the Cauchy Distribution) not to have finite means. How does one actually interpret data from such random variables? Do we give up and not investigate sample means or ...? Please help clarify this.
Answer & Explanation
Elianna Douglas
Beginner2022-06-11Added 23 answers
There are several aspects to this. The Cauchy distribution has a heavy-tail, which is the reason why its mean value does not exist. No real-world data will truly have a heavy tail in the empirical sense - there will always be a bound on how large our datapoints are. Why then would you ever use the Cauchy distribution as a model for your data, when the data doesn't even have as simple as property as having the same sample mean as the mean value of the Cauchy distribution? The reason is that you are often not that interested in the mean value, when you are dealing with modelling data with a heavy-tailed distribution. When you have real-world data, and you choose to model it with a heavy-tailed distribution, such as Cauchy, or probably more common in practise, a heavy-tailed Pareto distribution, you do so because you have an interest in what is going on in the tail. You believe that your data behaves such that you can come across very large datapoints (far out in the tail) with a probability that is not negligible. You do this all the time in non-life insurance. Suppose you run an insurance business and that you are selling boat insurance. Everything is going well for you, and so far the worst payment you have had to make was $100,000 on a claim for a very nasty scratch on a yacht. The insurance business has learned the hard way that even though everything looks fine, you should not under any circumstances use a light-tailed distribution (such as the normal distribution) to model your claims. Because one day an oil tanker will have a nasty spill, and it will partly sink, messing up a lot of its equipment. This is going to be expensive, and you had better be prepared that this sort of claim might show up. So you choose to model your data with a heavy-tailed distribution. What you are usually interested in are quantiles in the far end of the distribution. You can ask such questions as "What is the 99.5 % quantiles of my fitted distribution?". The answer might be 5.000.000, and what it means is that only 0.5% of the time, you expect a data point that is larger than 5.000.000. Again, fitting and finding quantiles in heavy-tailed distributions is a common problem in non-life insurance, since regulation dictates that you must have enough capital to cover such rare events.