Researchers have found that data on Google searches could provide insight into the interests and concerns of different demographic groups.
The study by academics at Warwick Business School, UK, found that while internet users in US states with higher birth rates search for more information about pregnancy, those in states with lower birth rates look up more information about cats.
However, analysis of the relationship between Google search data and the number of infant deaths per 1,000 births showed that internet users in US states with higher infant mortality rates search for more information on credit, loans and sexually transmitted diseases.
Using the Google Correlate service, the researchers – Adrian Letchford, Tobias Preis and Suzy Moat, of Warwick Business School’s Data Science Lab – sifted through millions of potential correlations with birth rates and infant mortality rates.
However, the researchers highlight that such analyses require careful statistical precautions to avoid identification of spurious correlations – a danger when handling large datasets.
Their paper, titled Quantifying the search behaviour of different demographics using Google Correlate and published in PLOS ONE, introduces a method to address this problem.
“To carry out an analysis like this, our method has to take into account that Google users search for a huge number of different phrases,” said Dr Letchford, Research Fellow in Data Science. “This means that we would expect to find some phrases with strong correlations by chance.
“We compared large amounts of random data for US states with Google search data for those states. Once we know how strong the correlations between random data and Google search data tend to be, we can work out whether the correlations we see between socioeconomic data and Google search data are likely to be spurious or not.”
The study analysed data from the Centers for Disease Control and Prevention on the number of births per 1,000 people in each US state and the number of infant deaths per 1,000 births in each state.
“Using our approach, we find that people in states with higher birth rates search more frequently for phrases like ‘pregnancy workout’, ‘pregnancy calendar’ and ‘baby constipation’,” said Dr Letchford.
“In contrast, people in states with lower birth rates search for more information about cats, typing in phrases like ‘dry cat food’, ‘cat not eating’ and ‘friskies’.
“We also found that people in US states with higher infant mortality rates search for more information about credit, loans and sexually transmitted diseases.
“Here, we see people typing in terms like ‘loans for people with bad credit’ and ‘sexually transmitted diseases’ more frequently. However, our method does not identify any phrases that people search for more frequently in US states with lower mortality rates.”
Dr Moat, Associate Professor of Behavioural Science and co-director of the Data Science Lab at Warwick Business School, said: “Data from search engines such as Google give us an unprecedented opportunity to measure what information people are looking for on an incredibly large scale.
“These results show that geographic differences in the information people are seeking can offer striking and sometimes saddening insights into the different worlds that different people inhabit.”
Dr Preis, Associate Professor of Behavioural Science and Finance and co-director of the Data Science Lab, said: “Our previous work has provided evidence that data from online services can help us reduce delays in measuring human behaviour in the real world, and sometimes even identify early warning signs of future changes in behaviour. The results we present here show how online data can help us understand differences in human behaviour and experience across different geographic regions too.”