Book A Demo

Geolocation Data and the Destination Tourism Industry: The Whole Story

Written by Alexandra V. Pasi, PhD


Alexandra V. Pasi holds a PhD in Mathematics, and has broad industry and academic experience in machine learning, statistics, data science, and quantitative and qualitative modeling and forecasting with applications to a variety of fields including finance, genomics, and geopolitics. She is passionate about communicating the fundamentals of good mathematical and scientific practice to decision-makers looking to make data-informed decisions. 

Executive Summary:

  1. Geolocation data helps to answer a rich portfolio of questions around visitor movement within a destination to improve a DMO's destination marketing and management activities. 

  2. User privacy is protected through the use of opt-in collection policies, anonymized data, and the aggregation of data to observe general trends rather than individuals, and extract the patterns of movement from groups of travelers. 

  3. Skilled data scientists, working with raw data instead of generic snapshots, provide the most flexibility in the range of questions that can be answered through geolocation data. Reliable and robust analytic insights are further achieved by interweaving geolocation data with other types of data.
Salt Lake County Dynamic Visualization-1
Download This Whitepaper
Learn More
Salt Lake County Dynamic Visualization-2
Learn More
Salt Lake County Dynamic Visualization
Case Studies
Learn More

Geolocation Data and the Destination Tourism Industry

The use of geolocation data collected from mobile apps is revolutionizing the travel industry, allowing for real-time, targeted insights. Geolocation data can help answer questions about where visitors come from, what specific locations they travel to, what routes they take, and what days and times they travel. When this data is interwoven with other data sources such as credit card transaction and tax data, it can also help answer vital questions about the economic impact of visitation to a region to stimulate economic opportunity, monitor demand and shape visitation trends.

But with the rise of big data comes the nuance of an ever-shifting digital privacy landscape, the volatility of which is reflected in the volatility of the data itself. Luckily, for those committed to doing ethical data science, the solution to this volatility and to concerns about privacy is the same. By only using data collected with proper opt-in and anonymization processes and aggregating data at a larger scale, one further safe-guard individuals’ privacy while creating a more stable view of the data.

Knowing how to perform this aggregation in a way which surfaces the features relevant to the desired insight requires a knowledge of the particular limitations and quirks of the data. These limitations can be addressed both with a variety of statistical and mathematical techniques and by supplementing with diverse data sources. This task of interweaving multiple data sources into a coherent and meaningful picture calls for the navigation of many subtle considerations of the larger real-world context from which the data is drawn. It is through this synergistic unification of big data with the human capacity for identifying context and meaning that modern analytic tools can be effectively leveraged to navigate change and drive growth. 

Privacy-Protective Geolocation Practices 

Mobile geolocation data offers insight into the dynamic interactions of physical spaces and the local economy at a degree of granularity which was previously unthinkable. This insight allows for targeted and effective policy and marketing decision-making. This potential for increasingly granular views must be balanced against concerns for privacy and volatility in the data. The solution to both of these problems is aggregation. Mobile app geolocation data is collected on an opt-in basis and then anonymized in order to remove any personally identifying information linked uniquely to a device. But aggregation provides an important additional privacy safeguard, while also being an indispensable tool for creating reliable and generalizable insights. 

Aggregating the data–that is, analyzing larger swatches of the data as a carefully combined whole–allows us to eliminate noise, smooth volatility, and uncover the general trends behind the fragmented snapshots of individual movement. Understanding the way in which data should be aggregated or woven together in order to provide a specific kind of insight requires a careful analysis of the limitations particular to that type of data. 

Minding the Gaps

There are blind spots in every data set; it’s the unavoidable nature of observation and measurement. Generally speaking, the more observations you make—that is, the larger your sample is—the less gaps you have, which makes intuitive sense. However, the underlying methodology through which your data is collected can leave you with limited visibility in certain areas which persist despite having made a very large number of observations. While geolocation data does provide a much larger and more granular sample than many of the tourism-industry standards prior to its introduction, like all data sets, it is not immune to its own blind spots. 

Knowing how to take the right slice of the right data to answer a particular question requires understanding the gaps in the coverage of the data. Sometimes these limitations are corrected or controlled for using a battery of statistical techniques, and other times by supplementing with another type of data that does have visibility into those blind spots, such as credit card transaction data, tax data, and the like. Here are a few blind spots particular to mobile app geolocation data:

  1. International Travelers: This population tends to be significantly underrepresented by mobile app geolocation data within the US, likely due in part to varying international privacy regulations and a tendency for international travelers to limit their data usage due to exorbitant roaming fees. However, this population has a significant impact on the tourism economy, with the International Trade Administration estimating that international travelers spent over $80 billion in 2021 on U.S. travel and tourism-related goods and services.
  2. Older Adults: According to Pew Research, only 61% of Americans aged 65+ and 83% of those aged 50-64 owned a smartphone, compared to over 95% for adults under 50. Thus, mobile geolocation data will be less representative of older adults. 
  3. Visitors to Wilderness Areas: Mobile app geolocation data sources have limited visibility in backcountry wilderness areas with poor mobile service coverage. 
  4. Low-Income and Low-Educated Adults: Among those with a high school education or less, only 75% of adults owned a smartphone, and among those who made $30,000 a year or less, 76% owned a smartphone. Comparatively, 96% of those who made over $75,000 a year owned a smartphone, according to a Pew Research poll. This consideration is of particular relevance to legislators and policy-makers looking to use geolocation information to inform programs and policies for impoverished communities. 

Identifying these gaps is the critical first-step to implementing robust mathematical approaches and interweaving a broader base of data sets which are cumulatively more resilient to such limitations. 

What is the Truth?

Careful consideration of this discussion naturally raises the questions: if every data set has its blind spots, how do we know what the truth is? How do we ground, calibrate, or validate our data?

These questions underscore the necessity of having access to the raw data in order to perform an analysis. High-level snapshots of data which has not been carefully processed, curated, and communicated with a particular use-case in mind are liable to mislead any individual looking to wield the power of big data. Working productively with these data sets requires both sophisticated data science and guidance grounded in an understanding of the ultimate goals one wishes to achieve using these analytic insights. 

A key point here is that different data sources, at their core, usually measure different things, even when they’re presumed to be reflecting the same thing. Take as an example two different data sources which, on the surface, purport to reflect the number of people staying in hotels in a given area. On the one hand we have hotel occupancy data for a given day, and on the other hand we have mobile app geolocation data that gives us the number of mobile devices observed in hotels on that day. 

Geolocation Figure 1

During most times, we would expect these numbers to correlate closely with each other, and they do. Each hotel room will have an average of one or two individuals, and these individuals would comprise the bulk of traffic through the hotels. However, during times of very high-traffic, for instance during certain conferences or spring break, we would expect to see more people per hotel room, and this will be reflected as an increase in the number of mobile devices observed in hotels within the mobile device geolocation data set, even though hotel occupancy will not increase past 100%. Additionally, in the case of a large conference or event being held at a hotel, these devices will be observed at the hotel within the mobile geolocation data set, and so this data set will rightfully reflect a much higher volume than the hotel-occupancy data will show, as a large number of individuals may be driving from outside the area to attend during the day, or staying in less traditional accommodations such as privately owned rentals due to the higher hotel rates during times of peak demand. Utilizing two distinct data sets allows for more robust and differentiated kinds of insights than would be provided by just one. 

If we accept hotel occupancy data as the ground truth a priori, and modify our geolocation-based visitation estimations based only on occupancy numbers, then we may be significantly underestimating overall visitation to the region during times of peak traffic. At the very least, we will have more inaccuracy in visitation estimations during times of heavy traffic if we use this approach. However, we can still use hotel occupancy data to validate our general approach to modeling visitation trends, as we should expect these two metrics from the two different data sets to correlate overall. The takeaway is that different data sources should harmonize with one another, but they should not always be expected or forced to sing the same note at the same time. Nonetheless, using multiple types of data to triangulate is an important tool for grounding analytics in reality. 

The Whole Story

Any sufficiently large and robust data set is liable to be as complicated and messy as the real world from which it’s plucked. Interweaving multiple types of data into a picture of the world is as much an art as it is a science. This unavoidable reality illustrates the principle that data analytics should be used as a vital but supplementary tool for decision-making, not a high-tech Magic 8 Ball. Data does not speak on its own; it must be put in the context of a larger story. As advanced as the cutting-edge machine-learning techniques are, for the time being people remain the superior story-tellers. It is only in combining this human capacity for meaning-making, a robust toolbox of statistical and mathematical techniques, and the dizzyingly vast scale of insight provided by big data, that the true power of modern analytics is realized.