The insurance market is being increasingly drawn to the benefits of ‘big data’. Yet one of the principal ways in which insurers are assembling this big data is highly questionable on ethical grounds. It has implications for every policyholder, for the ability of claims departments to differentiate good claims from bad and for the regulator’s responsibility for maintaining market confidence.
‘Big data’ relies on the accumulation of datasets containing information about people the insurer may, or may not, want to do business with. For the insurer to draw the most value from those datasets, the information within them needs to be integrated and interpreted. In combining these datasets, several variables need to be taken into account:
- some datasets will be sourced from within the insurer, but most will have been purchased from third parties;
- some will contain data wholly relating to insurance, but most will be sourced from other types of businesses;
- some will contain data that can be directly connected to particular individuals, while others will have personal data that has been anonymised – that is, it has had ‘personal identifiers’ removed.
The value of these data sets will vary according to how much insight insurers expect to gain from them. A big hurdle to gaining such insight is the level of anonymisation within the dataset. Let’s explore anonymised data a little more.
All businesses acquire data from their customers and many sell it on to third parties. Some only sell it on with personal identifiers like name and address stripped out. Others will sell it on with personal identifiers in place if the customer has consented to their details being shared with third parties. Where no such consent has been given, most firms will strip out personal identifiers, bundle up what remains and sell it on. In the latter case, the firm will say that it has abided by its usual policy of not selling on ‘your details’ – it has simply stripped out the ‘your’ from the ‘details’. Or so it thinks.
It turns out to be relatively easy to get round that anonymisation. Researchers and security experts are now able to de-anonymise datasets by relatively simple processes of aggregation and correlation with other datasets in the same general sphere as the so called anonymous one. Some service providers circling round the insurance sector openly promote just such a service, highlighting how their own particular dataset can help insurers reattach personal identifiers to datasets from which they had apparently been stripped. This is done by applying relationship patterns to different sources of data. Hey presto, that set of numbers suddenly comes with a name and address.
How serious is this? We all benefit in some way (both as individuals and as society as a whole) from the terms under which we disclose information about ourselves being treated with respect. Each of us wants to retain some privacy about the various strands of our identity and are only willing to forego some of that privacy in return for a benefit which we personally understand and accrue. When control over our identities is taken from us and various slices of it become market commodities, then many of us become concerned (read more about this here). After all, think of how you would feel if your local GP practice sold on your ‘anonymised’ medical records to a pharmaceutical company, who then aggregated it with other datasets and started sending you tailored mail shots about its latest wonder drug.
Insurers need to take a very careful look at the intersection between their privacy commitments, their handling of data relating to policyholders and their obligation to act with integrity.
Two further issues arise from insurers’ use of anonymised data and I’ll set this out in a second post shortly.