Investigating Impact of Weather on Customer Behavior

The weather has a profound impact on mobility choices and hence on the performance of entities dependent on mobility. However, investigating what impact the changing weather has on behaviour is a difficult task. This article presents a solution in Python using a notion of Weather Rating and historic weather data from the Netherlands.

Task

Weather conditions are in many cases one of the features impacting the customers' behaviour. It was also the case this time when my company operating in the parking industry had long wondered if changes in weather only affect in any way the number of customers in their facilities. This problem is linked to a larger question regarding the influence of weather on mobility choice in general, for instance, the choice of the mode of transport.

Literature

The initial research into this topic has revealed that the issue isn’t that straight-forward as it seems.

Some studies (Hassan and Barker, 1999; Keay and Simmonds, 2005) show reduction in car traffic with rainfall, while other (Sabir, 2011) suggest that people more likely chose motorized transport modes in rain. Sabir (2011), as well as Cools et al. (2010), claim that temperature has a lower effect on mobility than precipitation and both them and dozens of other, mostly European studies (Böcker, 2014) show that car ridership decreases in warm days in favour of walking and cycling. Meanwhile, other studies (Datla and Sharma, 2010) suggest temperature doesn’t affect mobility itself but only in combination with other factors. The wind is frequently omitted in these studies (Böcker, 2014) and when it’s mentioned, some (Aaheim and Hauge, 2005) find it as an important factor decreasing volumes of pedestrians and cyclists while other studies (Burke et al., 2006) cannot confirm a significant association.

Problems

There are several issues which come up when starting comparing any type of transactional data with the weather. At first, it seems logical to consider features such as temperature, precipitation and wind speed and compare their values to the performance data. However, all these data are strongly affected by the time of the year, for instance, the temperature is, in general, lower in the winter and precipitation is higher in the summer (at least in Benelux countries). The result is that comparing customer behaviour data to these features only uncovers the yearly seasonality which may well not be associated with the weather at all (it may have to do with higher traffic during summer holidays for example).

It quickly becomes evident that another way of quantifying the weather is necessary, a way which combines several features and also adjusts them to the seasonality. The algorithmic approach to this problem would require an extensive set of data with different atmospheric conditions as features and an outcome in the form ‘nice day’/’bad day’. However, even then the algorithm could assign more weight than necessary to the yearly variance of conditions and measure climate instead of the weather.

Another apparent solution is comparing conditions, e.g. the temperature with their average from a given week or month. The problem, in this case, is that temperature doesn’t usually vary from its short-term average largely. We, therefore, end up with small values of a similar magnitude throughout the period of time which, when compared to another metric, don’t result in high statistical confidence.

Solution

The solution to this problem seems to require some kind of way to subjectively describe the weather. This description shouldn’t purely quantify the temperature, precipitation, wind and other conditions, but rather should provide an answer to a question ‘is this a nice day?’. Luckily, it turned out that such a description system exists and it’s largely underrated. But let’s go through it step by step.

Data

It all starts with the data. Fortunately, the weather data aren’t too difficult to acquire, they’re usually freely accessible, have decent quality and provide historical records for long periods of time. One of the well-known open APIs for this purpose is https://meteostat.net/en, however, my experience with this source was that the data were rather basic and were lacking some important metrics.

Screenshot from Meteostat API Documentation at https://dev.meteostat.net/api/stations/daily.html

Instead of using such open datasets, it is usually better to use the data from national meteorological institutes. Almost every country (sometimes even state or a region) has an agency responsible for measuring and recording the weather (NOAA is the one for the US) and in most cases, their datasets are the reachest and have the best quality. After all services such as Meteostat use data provided by these institutes but in a limited scope. Using the primary sources, however, usually makes most sense where an analysis we’re doing concerns only one or at most few countries.

For my case, since it concerns the Netherlands, I found a dataset of historical weather data from the Dutch Royal Meteorological Institute (Koninklijk Nederlands Meteorologisch Instituut, KMNI).

Screenshot from KMNI’s historic data site at http://projects.knmi.nl/klimatologie/daggegevens/selectie.cgi (machine translation from Dutch)

These data can be retrieved from this application, there are dozens of weather stations from all across the Netherlands and several parameters to choose from. The data also reach long back into history and records from as early as the beginning of the previous century can be obtained. The main problem of this application is that the export is in TXT format (comma separated so it’s easy to copy the entire thing to Excel) but it still requires some data engineering. There is, however, also a way to automate the process using an API.

Sample export from KMNI, each row corresponds to 1 day and each row to one weather parameter; the meaning of parameters is explained above the table

Below is a Python script I wrote for processing the data which converts the TXT file into a CSV leaving only the relevant rows:

Weather Rating

This is the most exciting part of this solution, a largely underrated way to express weather as a single score telling if the conditions are nice or poor. In other words, it allows translating the subjective perception of weather to an objective evaluation. The idea has been introduced by a Belgian Hugo Poppe and further nurtured by several other meteorologists from the region.

In short, the rating takes into account cloud cover, fog and mist, precipitation, and wind. We start with a score of 10 and depending on the magnitude of these four elements, a certain number of points is deducted. The result is a Weather Rating between 1 and 10 (if we obtain value lower than 1, it’s still considered as 1) where a sunny, dry day with little wind gets a 10 and a day with poor weather gets a 1. More details about the Rating can be found on this website.

Instruction for calculating Weather Rating | Source

The temperature is not included in the weather rating. This has been done on purpose, for two reasons:

1. The temperature is linked to the other elements that are used. Little wind, a lot of sunshine and no rain, almost always give a pleasant feel.

2. The temperature actually plays a limited role in the experiencing of weather quality. A cool day with little wind, a lot of sunshine and no rain is still highly valued by most people.

Algorithm

Based on the description above, coding the process of calculating the Weather Rating is quite a straightforward task. The main problem is the availability of all the required data. In this case, the data on Dutch weather from KMNI don’t contain information about fog or mist, hence, I decided to replace this metric with “Percentage of the longest possible sunshine duration” where if the sun shines for less than 50% of the day, 1 point is subtracted and if the value is less than 20%, it’s minus 2 points.

The code below is in Python, it iterates through the data frame obtained from the CSV file with the data, where each column is one parameter and each row is a separate day. Based on the few key parameters, the algorithm calculates the Weather Rating for each row and returns it in the new column in the data frame:

Below is the weather rating for each day in 2018 and 2019 and the average rating for each month. The overall trend is that summers see nicer days than winter, which is as expected. However, there are still nice days in winter and poor days in summer and the average of all months is close. That all means that when comparing other data to Weather Rating, the annual trends shouldn’t pose problems. Using weather data from several years instead of one additionally ensures a more equal distribution of ratings in different months (e.g. July 2018 had really nice weather but the weather in July 2019 was poorer) and different days of a week.

Weather Rating in the Netherlands for all days of 2018 and 2019

The code I used for this task can be found in the GitHub repository: https://github.com/mateuszwiza/weather-rating

Results

In order to generate results for my problem and answer the question if the weather has any influence on customers behaviour, I needed to combine the Weather Rating with numbers of our transactions. For that, I checked the number of transactions for each day in all days of 2018 and 2019 and created a table where each row contained a date, number of transactions and a Weather Rating. The last step was to aggregate the transaction numbers (take their average) per Weather Rating and observe if the numbers change somehow.

The results proved that the weather has an effect on the numbers of customers. In 85% out of nearly 200 tested locations, there was a significant negative trend in the number of customers given improving weather, i.e. with nicer weather we saw fewer customers.

Results: vertical axis is the average daily number of customers; the horizontal axis in the Weather Rating

Examining the results again, but this time as the relative change in the number of transactions from their mean revealed additionally, that on days with Weather Rating of 6 we see normal numbers of customers. At the same time, on days with the poorest weather we see ~7% more customers and on the best days we see even 10% fewer customers!

Finally, the last important observation was that the trend changes slightly depending on the function of parking. In general, the downward trend was stronger and more prevailing in leisure-oriented locations, e.g. at supermarkets, and less visible in work, business or commuter-oriented locations.

In the end, what’s especially interesting about these results is that they not only reveal the behaviour of customers of a single company but rather show general trends when it comes to choosing a car as a mean of transport with different weather conditions and in different situations. Nevertheless, the results prove the usefulness and simplicity of the Weather Score in such comparisons.

Data Scientist @ Q-Park