From 1ddd7cc45ea3c74df15c3edb17668f6f007e5157 Mon Sep 17 00:00:00 2001 From: Julian Simioni Date: Mon, 28 May 2018 23:29:50 -0400 Subject: [PATCH] Add documentation on result quality and confidence scores --- README.md | 22 ++++++++++++----- result_quality.md | 61 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 77 insertions(+), 6 deletions(-) create mode 100644 result_quality.md diff --git a/README.md b/README.md index 187eddf..cbb80db 100644 --- a/README.md +++ b/README.md @@ -5,15 +5,22 @@ Here is where you can find all documentation for the [Pelias geocoder](https://g ## Table of Contents ### Core Features and API Documentation -- [Forward geocoding](search.md) to find a place by searching for an address or name -- [Reverse geocoding](reverse.md) to find what is located at a certain coordinate location -- [Autocomplete](autocomplete.md) to give real-time result suggestions without having to type the whole location + +#### Endpoint descriptions +- [Forward geocoding](search.md) (**/v1/search**) to find a place by searching for an address or name +- [Reverse geocoding](reverse.md) (**/v1/reverse**) to find what is located at a certain coordinate location +- [Autocomplete](autocomplete.md) (**/v1/autocomplete**) to give real-time result suggestions without having to type the whole location +- [Structured Geocoding](structured-geocoding.md) (**/v1/search/structured**) to find a place with data already separated into housenumber, street, city, etc +- [Place endpoint](place.md) (**/v1/place**) for details on a place returned from a previous query + +#### Query parameters and options - [Global coverage with prioritized local results](search.md#prioritize-results-by-proximity) -- [Structured Geocoding](structured-geocoding.md) to find a place with data already separated into housenumber, street, city, etc -- [Glossary of common terms](glossary.md) -- [Place endpoint](place.md) for details on a place returned from a previous query - [Language support](language-codes.md) for seeing results in different languages +#### Response Properties + +- [Confidence scores, match\_types and other tools for determining result quality](result_quality.md) + ### Data Sources - [Pelias data sources](data-sources.md) @@ -22,3 +29,6 @@ Here is where you can find all documentation for the [Pelias geocoder](https://g ### Pelias project development - [Release notes](release-notes.md). See notable changes in Pelias over time + +### Misc +- [Glossary of common terms](glossary.md) diff --git a/result_quality.md b/result_quality.md new file mode 100644 index 0000000..0292544 --- /dev/null +++ b/result_quality.md @@ -0,0 +1,61 @@ +# Determining Result Quality + +Each result returned from Pelias contains several different properties to help you programmatically determine if a result is good enough for your purposes. + + +### layer +This is essentially what type of result was returned, for example whether the result was an address, a city, or a country (a [full list](https://github.com/pelias/documentation/blob/master/search.md#filter-by-data-type) of possible layers can be found in the search endpoint documentation). + +The `layers` parameter can be used to filter on this if you know ahead of time what you want. Results returned that are of a different layer than you expect generally mean incorrect results. + +### match\_type + +This field is present on queries to the [search](search.md) and [structured search](structured-geocoding.md) endpoints only. + +There are three possible values: `exact`, `interpolated`, and `fallback`; + +If Pelias found exactly what it believes you were looking for, the `match_type` value will be `exact`. + +If Pelias determined you are querying for a street address, and could not find that exact address, but was able to estimate where that address might be (if it exists) via the [interpolation engine](https://github.com/pelias/interpolation/), the match type will be `interpolated`. + +If Pelias wasn't able to return exactly what it thinks you asked for, it will try to return something that relates to your query in an intelligent way. These fallback results will be records that follow the relationships of places in the real world. + +#### Some examples: + +A query for [1600 Pennsylvania Avenue, Seattle, Washington](/v1/search?text=1600 Pennsylvania Avenue, Seattle, WA) returns the city of Seattle, since there is no Pennsylvania Avenue in Seattle. In previous versions of Pelias, this query would return 1600 Pennsylvania Avenue addresses in other parts of the world (such as the famous White House address in Washington, D.C.). + +A query for [France](http://pelias.github.io/compare/#/v1/search%3Ftext=France) will return one result, with `match_type` `exact`. However, a query for the non-existent city of [Berlin, France](http://pelias.github.io/compare/#/v1/search%3Ftext=France) will also return France, but in this case with a match type of `fallback`. Pelias knows you were looking for something in the country of France called Berlin. It couldn't find it, so instead of returning one of the many [other Berlins](http://pelias.github.io/compare/#/v1/search%3Ftext=berlin), it returns France. This demonstrates that the `match_type` value depends on the query _and_ the result. + +### confidence +This is a general score computed to calculate how likely result is what was asked for. It's meant to be a combination of all the information available to Pelias. +It's not super sophisticated, and results may not be sorted in confidence-score order. In that case results returned first should be trusted more. Confidence scores are floating point numbers ranging from `0.0` to `1.0`. + +Confidence scores are calculated differently for different endpoints: + +For **reverse geocoding** it's based on distance from the reverse geocoded point. The progression of confidence scores is as follows: + +| distance | confidence score | +| --- | --- | +| < 1 meter | 1.0 | +| 1 - 10 meters | 0.9 | +| 11 - 100 meters | 0.8 | +| 101 - 250 meters | 0.7 | +| 251 - 1000 meters | 0.6 | + +For **forward geocoding and autocomplete**, several factors affect the score. These factors are: + +* the `match_type`, as described above +* whether the postal code matched (postal codes must be an optional part of all Pelias queries since not all records have known postal codes) +* for address results, whether the housenumber and street name match the input query +* whether any other fields are obviously non-matching (such as the city, region, or country fields). + +In all cases, confidence scores for `fallback` results will be reduced. + + +### accuracy + +The accuracy field gives information on the accuracy of the latitude/longitude point returned with the given result. This value is a property of the result itself and won't change based on the query. There are currently two possible values for the `accuracy` field: `point` and `centroid`. + +`point` results are generally addresses, venues, or interpolated addresses. A point result means the record represents a record that can reasonably be represented by a single latitude/longitude point. + +`centroid` results, on the other hand, are records that represent a larger area, such as a city or country. Pelias cannot currently return results with geometries made of polygons or lines, so all such records are estimated with a centroid.