- Why Zartico
- About Zartico
Written by Alexandra V. Pasi, PhD
|Alexandra V. Pasi holds a PhD in Mathematics, and has broad industry and academic experience in machine learning, statistics, data science, and quantitative and qualitative modeling and forecasting with applications to a variety of fields including finance, genomics, and geopolitics. She is passionate about communicating the fundamentals of good mathematical and scientific practice to decision-makers looking to make data-informed decisions.|
Geolocation Data and the Destination Tourism Industry
The use of geolocation data collected from mobile apps is revolutionizing the travel industry, allowing for real-time, targeted insights. Geolocation data can help answer questions about where visitors come from, what specific locations they travel to, what routes they take, and what days and times they travel. When this data is interwoven with other data sources such as credit card transaction and tax data, it can also help answer vital questions about the economic impact of visitation to a region to stimulate economic opportunity, monitor demand and shape visitation trends.
But with the rise of big data comes the nuance of an ever-shifting digital privacy landscape, the volatility of which is reflected in the volatility of the data itself. Luckily, for those committed to doing ethical data science, the solution to this volatility and to concerns about privacy is the same. By only using data collected with proper opt-in and anonymization processes and aggregating data at a larger scale, one further safe-guard individuals’ privacy while creating a more stable view of the data.
Knowing how to perform this aggregation in a way which surfaces the features relevant to the desired insight requires a knowledge of the particular limitations and quirks of the data. These limitations can be addressed both with a variety of statistical and mathematical techniques and by supplementing with diverse data sources. This task of interweaving multiple data sources into a coherent and meaningful picture calls for the navigation of many subtle considerations of the larger real-world context from which the data is drawn. It is through this synergistic unification of big data with the human capacity for identifying context and meaning that modern analytic tools can be effectively leveraged to navigate change and drive growth.
Privacy-Protective Geolocation Practices
Mobile geolocation data offers insight into the dynamic interactions of physical spaces and the local economy at a degree of granularity which was previously unthinkable. This insight allows for targeted and effective policy and marketing decision-making. This potential for increasingly granular views must be balanced against concerns for privacy and volatility in the data. The solution to both of these problems is aggregation. Mobile app geolocation data is collected on an opt-in basis and then anonymized in order to remove any personally identifying information linked uniquely to a device. But aggregation provides an important additional privacy safeguard, while also being an indispensable tool for creating reliable and generalizable insights.
Aggregating the data–that is, analyzing larger swatches of the data as a carefully combined whole–allows us to eliminate noise, smooth volatility, and uncover the general trends behind the fragmented snapshots of individual movement. Understanding the way in which data should be aggregated or woven together in order to provide a specific kind of insight requires a careful analysis of the limitations particular to that type of data.
Minding the Gaps
There are blind spots in every data set; it’s the unavoidable nature of observation and measurement. Generally speaking, the more observations you make—that is, the larger your sample is—the less gaps you have, which makes intuitive sense. However, the underlying methodology through which your data is collected can leave you with limited visibility in certain areas which persist despite having made a very large number of observations. While geolocation data does provide a much larger and more granular sample than many of the tourism-industry standards prior to its introduction, like all data sets, it is not immune to its own blind spots.
Knowing how to take the right slice of the right data to answer a particular question requires understanding the gaps in the coverage of the data. Sometimes these limitations are corrected or controlled for using a battery of statistical techniques, and other times by supplementing with another type of data that does have visibility into those blind spots, such as credit card transaction data, tax data, and the like. Here are a few blind spots particular to mobile app geolocation data:
Identifying these gaps is the critical first-step to implementing robust mathematical approaches and interweaving a broader base of data sets which are cumulatively more resilient to such limitations.
What is the Truth?
Careful consideration of this discussion naturally raises the questions: if every data set has its blind spots, how do we know what the truth is? How do we ground, calibrate, or validate our data?
These questions underscore the necessity of having access to the raw data in order to perform an analysis. High-level snapshots of data which has not been carefully processed, curated, and communicated with a particular use-case in mind are liable to mislead any individual looking to wield the power of big data. Working productively with these data sets requires both sophisticated data science and guidance grounded in an understanding of the ultimate goals one wishes to achieve using these analytic insights.
A key point here is that different data sources, at their core, usually measure different things, even when they’re presumed to be reflecting the same thing. Take as an example two different data sources which, on the surface, purport to reflect the number of people staying in hotels in a given area. On the one hand we have hotel occupancy data for a given day, and on the other hand we have mobile app geolocation data that gives us the number of mobile devices observed in hotels on that day.
During most times, we would expect these numbers to correlate closely with each other, and they do. Each hotel room will have an average of one or two individuals, and these individuals would comprise the bulk of traffic through the hotels. However, during times of very high-traffic, for instance during certain conferences or spring break, we would expect to see more people per hotel room, and this will be reflected as an increase in the number of mobile devices observed in hotels within the mobile device geolocation data set, even though hotel occupancy will not increase past 100%. Additionally, in the case of a large conference or event being held at a hotel, these devices will be observed at the hotel within the mobile geolocation data set, and so this data set will rightfully reflect a much higher volume than the hotel-occupancy data will show, as a large number of individuals may be driving from outside the area to attend during the day, or staying in less traditional accommodations such as privately owned rentals due to the higher hotel rates during times of peak demand. Utilizing two distinct data sets allows for more robust and differentiated kinds of insights than would be provided by just one.
If we accept hotel occupancy data as the ground truth a priori, and modify our geolocation-based visitation estimations based only on occupancy numbers, then we may be significantly underestimating overall visitation to the region during times of peak traffic. At the very least, we will have more inaccuracy in visitation estimations during times of heavy traffic if we use this approach. However, we can still use hotel occupancy data to validate our general approach to modeling visitation trends, as we should expect these two metrics from the two different data sets to correlate overall. The takeaway is that different data sources should harmonize with one another, but they should not always be expected or forced to sing the same note at the same time. Nonetheless, using multiple types of data to triangulate is an important tool for grounding analytics in reality.
The Whole Story
Any sufficiently large and robust data set is liable to be as complicated and messy as the real world from which it’s plucked. Interweaving multiple types of data into a picture of the world is as much an art as it is a science. This unavoidable reality illustrates the principle that data analytics should be used as a vital but supplementary tool for decision-making, not a high-tech Magic 8 Ball. Data does not speak on its own; it must be put in the context of a larger story. As advanced as the cutting-edge machine-learning techniques are, for the time being people remain the superior story-tellers. It is only in combining this human capacity for meaning-making, a robust toolbox of statistical and mathematical techniques, and the dizzyingly vast scale of insight provided by big data, that the true power of modern analytics is realized.