For those new to location data analysis, there is one question that pops up time and time again: “How do I turn my list of addresses into points on a map?” This is the first and most crucial hurdle to location data analysis and unfortunately some give up at this step and go no further.
In this article, I will explain the basic concept of “geocoding” – the process of turning an address or other location reference (such as a postcode or zip code) into a point on a map – and show you how to start geocoding with the Mapcite Excel Add-In in less than 30 seconds.
Once you have read this article, why not read part two which delves a little further into geocoding?
What is geocoding and why do it?
Put simply, geocoding is the process of turning an address or other location reference into a set of latitude/longitude coordinates. We do this so that the data can be displayed on a web-map as a point or pin.
Customers who download the Mapcite Excel Add-In want to analyse the location content of their data. But unless their data already has latitude/longitude coordinates, they will need to geocode the data to obtain these coordinates before seeing the data on a map.
It is possible to geocode manually – pinpoint the location using an online map and copy the latitude/longitude into two new columns in your data (please check the T&Cs first to ensure you are allowed to do this). But this process is inefficient and labour intensive. If you have as little as three rows of data to locate, automated geocoding is usually the quicker option and it is why Mapcite built a geocoder engine into its Excel Add-In.
What is a geocoder?
A geocoder is a piece of software that works out the latitude/longitude coordinates of an address. It does this by taking the address in your data and comparing this to its own database of addresses and place names. It uses fuzzy matching algorithms to calculate which of its addresses is the closest match.
A geocoder is a piece of software that works out the latitude/longitude coordinates of an address. It does this by taking the address in your data and comparing this to its own database of addresses and place names. It uses fuzzy matching algorithms to calculate which of its addresses is the closest match.
This is not as easy at it sounds. In my previous article on Why you need an Address Master Data Management Strategy, I pointed out that addresses are a very poor way of storing and managing locations in a machine environment. This is because addresses are text strings and can be written in different ways and can be misspelt or mistyped. A good geocoder will look at the various elements within the address, weight their significance, correct for common misspellings and even disregard some elements.
Also do not underestimate the challenge in building and maintaining a global list of addresses to put inside the geocoder software. Address format, quality and availability varies from country to country as does the availability of accurate locations for the addresses.
Done well, geocoding can batch process millions of records without human intervention. In the Mapcite Consulting team, we have batch geocoded 60 million records for one customer before. And we have customers using the Mapcite Excel Pro geocoder to locate tens of thousands of addresses each month themselves.
Using the Mapcite geocoder in Excel
First download the free Mapcite Excel Add-In from the Microsoft Office App Store. and open the spreadsheet you want to geocode. Then follow these simple steps:
- Click on the menu option ‘Mapcite’ in the Excel Ribbon
- Press ‘Geocode Data’ button
- Match the columns in your data with the columns expected by the geocoder
- Press ‘Start Geocoding’ and sit back and watch!
Getting the best out of geocoding
To give the geocoder the best chance of success, it is important to make the input data (your list of addresses) as ‘clean’ as possible. This means you may need to spend time fixing your data before geocoding. You should pay close attention to:
- Address Structure – use separate columns in your spreadsheet for different components of the address to help the geocoder understand your data. The Mapcite Excel geocoder divides addresses into different elements – Street, City, County, Postal Code, Country. If the columns in your data matches this structure, the chances of successful matching are better
- Unstructured data – If the whole address is contained in a single cell and it is too much effort to split it into parts, you need to do “Unstructured” geocoding. This tells the geocoder to expect the whole address as a single line of text. The Mapcite Excel Add-In geocoder has an option for Unstructured geocoding.
- Address Format – ensure the address data is placed in the correct columns. For example, ensure the postcode column only contains postcodes. Try to ensure the data is free from spelling errors and abbreviations (such as Ave. instead of Avenue or St. instead of Street). You can also fill in blanks, such as populating the County or Country column if it is missing in your data and is easy to add.
Interpreting the results
Once the geocoder has run, it is important to check the results. This is because geocoders can make false matches. There are different ways to check such as:
- Looking at the data on the map – sometimes you can easily spot bad matches by showing the geocoded data on the map as pins. For instance, if all your data is within a state or country, it is easy to spot pins outside this area. Pro tip: in the Mapcite Excel Add In, if you click on a pin, it will highlight the row in the spreadsheet, making it easier to check the data.
- Comparing results in the spreadsheet – when a match is made by the geocoder, it returns not only the coordinates but also the address it has found. By comparing this to your original address, you can see whether to accept the match.
Geocoding as a service
At Mapcite, our consulting services team regularly undertakes geocoding as a service for customers. This can be particularly useful if your dataset is large or messy. This can be done as a one-off cleansing activity or a regular service.
About the Author
Richard Crump is Head of Consulting at Mapcite, a location data analytics company and previously held a similar role at Ordnance Survey, the National Mapping Agency for Great Britain.