The importance of multi-sourcing for the representativeness of your data

Stefan Boom
Managing Director, Benelux and Nordics, Dynata

It’s a fact, and a fact that has been known for a long time, that different online panels will produce different results. Not necessarily results different enough to change the overall narrative you’d draw for the results, but enough to disturb your tracking trends. And, perhaps, enough to disturb your mind.

To understand why this is so, we need to think in traditional terms about sampling. Samples, if drawn randomly, will represent the frame from which they are drawn. In traditional methods, the frame and the population were synonymous. Everyone had an equal chance of selection. There was a little bit of non-coverage, some people didn’t have a telephone, some addresses could not or would not be approached, but this was marginal. The frames for both telephone and face-to-face were universal, established and independent of the market research industry.

This has never been the case for online sampling. We cannot be universal since there is no one place on the internet where everyone goes, nor one repository of personal contact information of all internet users, and (of course) sending spam text messages to random mobile phone numbers would be wrong.

Why online panels produce different results – and how sampling bias can be reduced

So, we rely on online access panels and exchanges. What do these “frames” represent? Nothing more than the sources used to recruit them from. Each company has its jealously guarded set of sources they use to recruit from, and it is this difference – plus what you might call “house effects”, coming from panel maintenance practice – that causes each panel to produce different results.

Now, every single source is, by definition, biased, since everyone coming from that source has that one thing in common. It stands to reason then that the least bias is obtained from maximising the number of sources used to recruit from. Least bias, coupled with the correct application of quota control, will bring about the maximum representivity. This is no different to how sample points would have been selected for face-to-face interviews. To put it in the words of a sampling textbook: what multi-sourcing in online sampling gives you is a reduction in clustering.

Now here we are not talking about one or two sources being mixed, we are talking hundreds if not thousands. The ways in which online advertising is bought and sold means that no panel provider can truly know how many sources have been used for recruitment. This is a good thing.

Reliability of your panel is crucial, even more so than representivity

Of course, while representivity is, for many researchers, very important, reliability is often more important. Reliability means getting the same result over and over again, when the same result is what is expected. In the case of a tracker, reliability is almost always many times more important than validity (i.e. the truth, which comes from representivity). It is vital that any change seen in the data must be as a result of changes in the market, and not as a result of changes in the sampling.

This means that monitoring and maintaining the sources that feed the panel becomes all important, and only the panel company is in a position to do this. The easiest way to do this is to ensure that each source is a very small percentage of the panel. If you do, then any change in sourcing can have no real impact on the panel itself.

For good business reasons, this may not always be possible. Some sources of recruitment may be relatively large. In that case, it is important that the source is relatively neutral in its opinions – that is to say, its answers should tend towards the mean. Again, if this is the case, then the loss of that source does not affect the data coming from the panel overall and may only have an effect on feasibility.

Assessing your different panel recruitment sources to minimise their potential bias

The same is true of major opportunities that present themselves as opportunities to increase the overall size of the panel. The greater the opportunity, the more carefully the source needs to be examined for where it might fit in the panoply of all sources. It needs to be assessed for its overall biasing potential.

Of course, the larger the source is, the less likely it is to suffer extreme bias coming from clustering effects since it is likely to be more general in nature.

By extension the same issues faced in sourcing is also true of individual panels themselves where the panel company runs multiple branded panels, perhaps as a result of consolidation or merger activity.

A company like Dynata then expends a great deal of effort understanding its panels and sources of recruitment. It groups them together into channels of similar sources and panels, and controls these channels at the survey level. This is a pragmatic solution that brings with it data consistency at the survey level, while allowing flexibility at the source level for change – whether planned or unplanned.

The smart approach to panel building, and how to manage very niche audiences

You may think then that the smart approach to panel building is to work with only large sources of recruitment that are closest to the mean in terms of the answers they return. But this would fail to address the final need that researchers have.

This is the need to access very niche audiences, those that are hardest to find.

To find these people in general sources is, by definition, hard. The smart way to find these people is to find places where they congregate on the internet, and approach them there. There is the potential however that these people, having their “niche-ness” in common, might also be biased in terms of how they answer general questions. And so, we circle back to the same problem we started with; how to deal with biased sources.

The solution depends on the nature of the niche they occupy and how in-demand it is for research. If it is in high demand, then it can make sense to set up a distinct panel of such people and use them only when they are the target of the research. Pulling out audience like his from the general panel causes no problems to the validity of the overall panel – after all it makes no real difference if the haystack you are looking at contains no needles!

But if they are not in demand enough to set up a specialist panel – bearing in mind that a panel unused will tend to melt away – then they need to exist in the general panel, in potentially large numbers.

This problem can be solved by the final weapon in the panel company’s arsenal, the use of sources not for recruitment into a panel, but like an ad-hoc partner whose volume can be ramped up and down, directly into surveys according to demand.

From all this, it should be clear that multi-sourcing by panel companies is the solution to researchers’ practical needs for sample, and that having a coherent approach to the management of multiple sources is the major role of the panel company today.

To learn more click here.

***
This article was initially published in Dutch by Data & Insights Network: click here >

BLOG

The importance of multi-sourcing for the representativeness of your data

SEARCH

LOGIN

The importance of multi-sourcing
for the representativeness of your data