Tuesday, April 28, 2009

Mediterranean Diet Score - What else does the model tell us?

A further analysis of the expected (from the model) vs the observed numbers within each diet score category and eating particular amounts of each food group throw up more interesting observations. This doesn't mean that we are claiming that people should behave like the model and that we expected that their food choices would not be internally correlated in some way. In fact, we expected that they would be (so do these researchers which is why they are trying to describe a dietary pattern); this data gives us information on what these correlations are.

We have already noted that meat & poultry intake is actually largely independent of the diet score. So, when researchers claim that a Mediterranean diet is low in meat, we will know that that is not borne out by this data.

Comparison of expected vs actual numbers in the low score (0-3) group shows that there are more people getting a low score because they score 0 for vegetables, fruit & nuts, legumes or fish consumption (i.e. they don't eat 'enough' of these). Conversely, in the high scoring group (6-9) there are more people than expected scoring a 1 in these same categories. There are also more people than expected scoring a 0 for meat & poultry, dairy and cereals in the high scoring group, i.e. they get a high score while still eating 'bad' amounts of these food groups. The suggestion here then is that the diet score is mainly a reflection of vegetable, fruit & nut, legume and fish consumption and is much less related to dairy and cereal consumption as well as being independent of meat consumption.

What about the predicted correlation between meat & poultry, dairy and the monounsaturated/saturated fat ratio? Whether this has any impact is very hard to determine. The main reason for this is that the postulated effect is obscured by the data presentation. To see why consider this thought experiment:

Let's assume our subject has a score of 6 so far and we have meat & poultry, dairy and the fat ratio still to determine. If either meat & poultry or dairy score 0 (i.e. greater than median consumption), this increases the chance that the fat ratio will also be scored 0, giving a final score of 7; if both meat & poultry and dairy score 0, then there is an even greater chance of fat ratio being 0, giving a final score of 6. But, we can't see the difference (i.e. that there are proportionally more scores of 6 and 7 and fewer of 8 than expected) because all scores 6-9 are lumped together. The same argument works in the other direction, making the scores 2 and 3 potentially more frequent than scores of 1, but again this effect is obscured when the low score includes 0-3. Perhaps this is why?


Saturday, April 25, 2009

Data Patterns in the Mediterranean Diet Score

The original construction of the edifice known as the Mediterranean Diet began with a paper which used a scoring system to handle the mass of data that results when you give thousands of people food frequency questionnaires. The data manipulation is roughly like this: People say how often and how much they eat of typical dishes and foodstuffs; these quantities and frequencies are converted to daily food group consumption for which a score is given. RESULT massive amounts of data reduced to one number.

Let's recall the basics: the food groups were: vegetables, fruits & nuts, legumes, meat & poultry, fish, dairy products, cereals, monounsaturated to saturated fat intake ratio and alcohol intake. The bad score was 0 for eating more of the bad groups (meat & poultry, dairy, low mono to saturated fat ratio and alcohol) and not enough of the good groups (vegetables, fruit & nuts, legumes, fish, cereals) and the good score was 1 for doing the opposite. When these scores are added up, the lowest possible score (bad) is 0 and the highest (good) is 9.

Now, it turns out that the pattern of scores expected can be modeled by a mathematical probability distribution known as the binomial distribution. Strictly speaking, to adopt this model of the situation we need to make two assumptions about the behaviour of the participants. Firstly, we assume that each participant is operating (i.e. choosing foodstuffs and quantities of these to eat) independently of (i.e. not influenced by) each other participant (this is quite likely) and secondly, we need to assume that a participant's scoring on each food group is independent of (i.e. not influenced by) the score of other food groups. This second assumption is not entirely true, for example, it is clear that there would be some correlation between both dairy products and meat & poultry consumption and the monounsaturated/saturated fat ratio. However, for now, let's make this assumption and then we can check whether the actual data support this view.

With this model for the scoring process, it becomes possible, given the total number of participants, to calculate the expected numbers with different scores. It is quite easy to see that, given the way the scoring system has been constructed, there will be a full spread of scores, because on every food item, there is a 50-50 chance of scoring 0. Another point to consider is that, with the exception of scores 0 and 9, there are multiple ways of obtaining the other scores. For example, a score of 1 may be obtained by being above the median on one and only one of the 9 food groups - which means there are 9 different ways of getting this score. Whereas a score of 2 may be obtained by combining a score of 1 from 2 out of 9 groups: there are 36 ways of doing this. For the most likely scores of 4 and 5, it can be shown that there are 126 different ways to obtain each of these scores.

Table 2 (and you should look at this table while reading the next bit) in the paper shows the individual food group scores versus the Mediterranean diet score. This table is important because it gives some insight into the raw results of the scoring process which is otherwise obscured because score results for the individuals are grouped into three categories: low diet score (0-3), medium (4-5) and high (6-9). This is distinctly unhelpful and does not let us see how many got each individual score. However we can see – within each diet score category – how many people scored 0 or 1 for a particular food group (i.e. how many people ate more than the median amount and how many ate less than it - or vice versa).

Using the binomial distribution and the total number of individuals, it is possible to predict how many people should score 0 or 1 for each food group in the three diet score categories under our assumptions outlined above. (But note that we will get the same prediction for each and every food group because the model makes no distinction between these.) For example, for men in the category of low diet score (0-3), we would expect 2257 individuals and to see only 643 (28%) scoring 1 but 1616 (72%) scoring 0. In the category of medium diet score (4-5) 4378 individuals are expected, with equal numbers scoring 0 and 1 and in the category of high diet score (8-9), we would expect to see 643 of 2257 (28%) scoring 0 and 1616 (72%) scoring 1.

How does the prediction compare to the actual values? Quite well in fact for legumes and fruit & nuts (both 23%/77% and 50/50 in the medium score category) and dairy products (31%/69% in the low and high categories and 50/50 in the medium category) not quite so well for fish, vegetables and fat ratio where the ratios are actually 'more extreme' than predicted (18%/82% or 20%/80% in the low and high score categories). Cereals (36%/64%) and meat & poultry show the worst correspondence where the ratios are 'less extreme' than predicted. Overall we slightly overestimate the number in the medium score category (actual number 3808) and underestimate the numbers in the low and high categories.

There are some foodstuffs included in Table 2 which are not included in the calculation of the Mediterranean diet score: eggs, potatoes and sweets. As they are not used to calculate the score, it can be expected, that any participant in any diet score group would be equally likely to be above as below the median consumption of these items and so there would be approximately a 50%/50% split in the consumption in each diet score category. A sizeable departure from these figures would suggest that the participant's consumption of these non-scoring foodstuffs is in some way dependent on or linked to consumption of a scoring food group. However, this is clearly not the case, except possibly for potatoes, which show about 16% deviation from the expected 50-50 split in favour of (unsurprisingly) vegetables.

This leads to the most notable find in this table - a point which was completely unmentioned in the original article. The distribution of meat and poultry consumption appears to be essentially independent of the Mediterranean diet score. Within each scoring category, the distribution of above and below median consumption for meat and poultry is much more like that of eggs and potatoes and sweets than it is of the other items making up the diet score. Whatever the diet score group, there is close to a 50-50 split in the distribution of individuals' meat consumption. On a Chi-sqared test on the actual versus the expected values, the meat and poultry item shows the strongest result for independence in common with the foodstuffs which are independent of the diet score (e.g. eggs, potatoes, sweets) because they are not used in its calculation. It is interesting that this result is not remarked upon in the paper. In fact, to the contrary, when giving an example based on the link between a 2-point increment in the diet score and improved survival, it is mentioned that such an increment could be achieved by `making a substantial reduction in meat intake' despite the evidence that many high scorers score highly in spite of an above median meat intake!

Wednesday, April 1, 2009

When epidemiology works

Sometimes epidemiology produces a reasonable result. Take this recent story.

Researchers note that somewhere has a very high rate of a relatively rare condition. In this case a province of Iran with a high rate of oesophageal cancer. The first step in such a situation is often the case control study. This is a study done on the basis of matching people already diagnosed with the condition as closely as possible with controls who do not have the condition and then attempting to find significant ways in which the cases and the controls differ. Case control studies can be problematic because the results can be manipulated by choice of the controls. Also this type of study is purely observational and it is after the fact observation. You are relying on people's recall and what they recall may be influenced by their present condition - particularly when they have a serious disease. However in this study the results were quite striking:

Compared with drinking warm or lukewarm tea (65C or less), drinking hot tea (65-69C) was associated with twice the risk of oesophageal cancer, and drinking very hot tea (70C or more) was associated with an eight-fold increased risk.

The speed with which people drank their tea was also important.

Drinking a cup of tea in under two minutes straight after it was poured was associated with a five-fold higher risk of cancer compared with drinking tea four or more minutes after being poured.

There was no association between the amount of tea consumed and risk of cancer.

Compare this with the purported increase in risk of death from eating red meat (based on the responses to one labyrinthine food frequency questionnaire ten years earlier) of about 0.31 times (men) to 0.36 times (women).

And what was also nice was that having asked people to estimate how hot they drank their tea, they then went and measured the actual temperature.