by Alan L. Montgomery
It is interesting. I had lots of census variables and just picked what I thought were plausible. So I think this would be a perfect application of the problem. It needs to be done. Because now the posterior is overstated; the posterior is too tight because I'm not reflecting the uncertainty in variable selection.
It does bother me. At the same time you are thinking about: which posterior do you want to believe? One reason for doing it the way I did it is I think: here's one posterior, here's another, and here's another--these are all according to the K parameters. So, if this is K=.1, then fine, this is one and this is 10. This is a sweep prior. You can go back and think what probability prior do I assess to K=.1,1, and 10. If I think they are all equally weighted, I ask what's the implication when I weight all of these things together. That's one way to get around it. This is not satisfactory. The critical issue is how much to shrink. First, we don't know how much to shrink. Second, these Wishart distributions do not reflect how we would want to shrink. What we really want to do is not have some distribution like this for our prior beliefs (slide?) but have a prior distribution that looks like this. The question is how to put these new priors into the model. This is another direction I would like to take.
The other point is that the posterior looks like this. I want the posterior to be fatter rather than skinnier. This is opposite to the way most people think. I think I'm overconfident about my pricing strategy, not underconfident. We need to find better ways to reflect uncertainty in the model selection problem and our uncertainties in how we parameterize these priors.
I was surprised by the comment that I did better than Blattberg & George. I thought I did worse because what I would really like to do is -- I only thought about shrinking each of the individual stores back to some overall mean. I forgot about the individual brand problem. What I really would have like to have done is two things:
The final comment is about what Wagner talked about. He raised an interesting point about trying to compare and explain this by looking at other data and why this made sense to him. People have so much more information out there. It tells me something about the prior. Maybe it tells me about how I should structure the problem. It makes me firmly believe that these Bayesian hierarchical models are a much better way to go because now I can think about using another stage of this model and using other external data in a nested stage of this hierarchy.
Go to written version of paper