So, in this case, what I was saying was that I don't want to - I
don't need to really say anything about the error covariance
matrices so I'm going to have a diffuse prior. This is going to
be non-informative. Then on top of that I don't really want to
say anything about the demographics, so again, I'm going to have
a diffuse prior of that. Now, what is going to be important, is
to say something about this random variation across the stores.
So you remember that I've got 192 parameters and each of these
192 parameters is going to dropped in this xxx hierarchical
distribution. Now the problem I'm running into is that I've only
got 83 stores, so if I don't have an informative prior, then I'm
not going to have a well defined Wishart distribution. So, for
the Wishart to be defined, I've got to have an informative prior
at this stage. Now, this puts some problems on it, because I
don't exactly know what this prior should be, so what I'm going
to do is, there's going to be all these scaling differences, you
know, is the price elasticity for Minute Maid, does that vary
more than the price elasticity for Tropicana. So what I'm going
to do there, is I'm going to go through and just go back and do
kind of empirical technique to try to set the prior on this
. So, what I'm going to do,
I'm going to say essentially
postulate that there is some type of independent relationships
on each of the - for this prior on each of these parameters and
I'm going to go back and compute the least squares estimates and then
I'm going to take the variance of those and then I'm going to
scale those by k and then I also want to include the
in there, and it's easier just to go back and think about this
in terms of what's the expectations of my prior. Well, the
expectation of my prior is down here, so the expectation is
essentially going to be this matrix here. This
.
So if I go back and I've got the k's and the k's are sort of
telling me what's the expectation. k is 1 and that my
expectation would be
is essentially going back - it's
putting me at the same place that I would start from an
empirical - an empirical prior.
For more detail on specification of the prior click here. Also, see here.
The reason that I chose this parameterization is that I'd like
to go back and have something that puts me somewhere close to
the least-squared estimates, somewhere that's close to the pooled
estimates, and somewhere between the pooled and least squares estimates.
So, if my k=0 and my
is 0, then my prior is going to
say that there's no demographic effects, it's essentially going
to mimic a pool model. So, it's not - it's going to sit -
there's no cross-store variation and there's no demographic effects
and all the stores are going to have the same, so we want to
think about this is, I've got my individual parameter of
and that's going to equal
to this
and
this
plus the demographic
effects plus this independent
variable or this random variable. Now the next case is going to
say that we've got some type of direct demographic independent
interactions. So here is k =0, if
is not equal to
zero, what I'm saying is there is no demographic xxx , there is
no random effects, but these graphic effects are new. Now since
I haven't put an informative prior on
I'm not saying xx
I'm going to try xx the data from what the results are here. The
next notion is, where should I set the k? Should k be small,
should k be large. Well, if I set k large, then the
estimates are going to converge to the individual store models,
So, it's going to be essentially like going back and saying that
each of the stores is unrelated to all the others. The other
option is to go back and just set k=1, so it's essentially
some type of empirical Bayes prior. Or the other option is to
set k to a small value, so suppose I said
of the
empirical Bayes prior, so how do I determine what would be a
good k, well, if I was - a pure approach I would just say k
is this number, or if I'm not sure about it, I would pick some
kind of prior on the k. Now in this case, I'm interested in
what this k should be. So, what I'm going to do is I'm going
to go back and allow the k to vary.
How we choose our final hyperparameter is given here.