Browse Source

De-Mapzenify data source documentation

pull/221/head
Julian Simioni 7 years ago committed by GitHub
parent
commit
30f3f14e9e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
  1. 26
      data-sources.md

26
data-sources.md

@ -1,8 +1,8 @@
# Data sources in Mapzen Search # Data sources with supported importers
Mapzen Search is powered by several major open data sets and owes a tremendous debt of gratitude to the individuals and communities which produced them. Pelias is built with a mostly data-agnostic architecture: any datasource that can be converted into the Elasticsearch document format used by Pelias can be imported and geocoded against. Of course, building a good importer takes time. Pelias currently has official support for five importers from four different open data projects. We owe a tremendous debt of gratitude to the individuals and communities which produced these datasets.
Attribution is required for many of the Mapzen Search data providers. Some license information is listed here, but you are responsible for researching each project to follow their license terms. Attribution is required for many of data providers. Some license information is listed here, but you are responsible for researching each project to follow their license terms.
## OpenAddresses ## OpenAddresses
@ -14,10 +14,10 @@ Layers:
[OpenAddresses](http://openaddresses.io/) is a collection of over 300 million addresses around the world. Data in OpenAddresses only comes from national, state, and local governments, so this data is highly authoritative. Because it consists of entirely bulk imports, OpenAddresses is a large, global, and rapidly growing dataset. Many countries, particularly in Europe, now have every address represented in OpenAddresses. [OpenAddresses](http://openaddresses.io/) is a collection of over 300 million addresses around the world. Data in OpenAddresses only comes from national, state, and local governments, so this data is highly authoritative. Because it consists of entirely bulk imports, OpenAddresses is a large, global, and rapidly growing dataset. Many countries, particularly in Europe, now have every address represented in OpenAddresses.
OpenAddresses is by far the largest dataset by number of records used by Mapzen Search. Even though it only contains address data (as in no building names or other metadata), it's a great resource for global geocoding. OpenAddresses is by far the largest dataset by number of records used by Pelias. Even though it only contains address data (as in no building names or other metadata), it's a great resource for global geocoding.
The license for each individual source within OpenAddresses differs. Many of the sources require [attribution](https://mapzen.com/rights/), and many others have a share-alike clause. The license for each individual source within OpenAddresses differs. Many of the sources require [attribution](https://mapzen.com/rights/), and many others have a share-alike clause.
*Note:* Mapzen Search does _not_ currently return license information directly, but the license and attribution requirements for each source within OpenAddresses can be determined from the machine-readable [state.txt](http://results.openaddresses.io/state.txt) file published on the OpenAddresses website. *Note:* Pelias does _not_ currently return license information directly, but the license and attribution requirements for each source within OpenAddresses can be determined from the machine-readable [state.txt](http://results.openaddresses.io/state.txt) file published on the OpenAddresses website.
## Who's on First ## Who's on First
@ -36,7 +36,7 @@ Layers:
- `neighbourhood` - `neighbourhood`
- `coarse` (alias for simultaneously using all the above) - `coarse` (alias for simultaneously using all the above)
[Who's on First](https://whosonfirst.mapzen.com) is an open-data directory of worldwide administrative places. Created by Mapzen, it is the primary provider of: [Who's on First](https://www.whosonfirst.org/) is an open-data directory of worldwide administrative places. Originally started at Mapzen, it is the primary provider of:
- Countries - Countries
- Macroregions (for example, England is a Macroregion within the United Kingdom) - Macroregions (for example, England is a Macroregion within the United Kingdom)
@ -46,7 +46,7 @@ Layers:
- Localities (cities, towns, hamlets) - Localities (cities, towns, hamlets)
- Neighbourhoods - Neighbourhoods
Additionally, for addresses, venues, and points of interest coming from OpenStreetMap, Geonames, and OpenAddresses, Mapzen Search uses Who's on First to provide standardized fields for the country, region, locality, and neighbourhood. Additionally, for addresses, venues, and points of interest coming from OpenStreetMap, Geonames, and OpenAddresses, Pelias uses Who's on First to provide standardized fields for the country, region, locality, and neighbourhood.
[License](https://github.com/whosonfirst/whosonfirst-data/blob/master/LICENSE.md) [License](https://github.com/whosonfirst/whosonfirst-data/blob/master/LICENSE.md)
@ -62,10 +62,12 @@ Layers:
[OpenStreetMap](https://www.openstreetmap.org/) is a community-driven, editable map of the world. It prioritizes local knowledge and individual contributions over bulk imports, which often means it has excellent coverage even in remote areas where no large-scale mapping efforts have been attempted. OpenStreetMap contains information on landmarks, buildings, roads, and natural features. [OpenStreetMap](https://www.openstreetmap.org/) is a community-driven, editable map of the world. It prioritizes local knowledge and individual contributions over bulk imports, which often means it has excellent coverage even in remote areas where no large-scale mapping efforts have been attempted. OpenStreetMap contains information on landmarks, buildings, roads, and natural features.
With its coverage of roads as well as rich metadata, OpenStreetMap is arguably the most valuable dataset used by Mapzen Search for general usage. With its coverage of roads as well as rich metadata, OpenStreetMap is arguably the most valuable dataset used by Pelias for general usage.
All OpenStreetMap data is licensed under the [ODbL](http://opendatacommons.org/licenses/odbl/), a [share-alike](https://en.wikipedia.org/wiki/Share-alike) license which also requires attribution. All OpenStreetMap data is licensed under the [ODbL](http://opendatacommons.org/licenses/odbl/), a [share-alike](https://en.wikipedia.org/wiki/Share-alike) license which also requires attribution.
**Note:** There are _two_ importers for OSM data. The main importer, [pelias/openstreetmap](https://github.com/pelias/openstreetmap/), handles venues and addresses. The [pelias/polylines](https://github.com/pelias/polylines) importer handles streets, since dealing with line geometry is a special challenge.
## Geonames ## Geonames
`sources=geonames` | `sources=gn` `sources=geonames` | `sources=gn`
@ -82,20 +84,20 @@ Layers:
- `neighbourhood` - `neighbourhood`
- `coarse` (alias for simultaneously using all the above) - `coarse` (alias for simultaneously using all the above)
[Geonames](http://www.geonames.org/) is an aggregation of many authoritative and non-authoritative datasets. It contains information on everything from country borders to airport names to geographical features. While Geonames does not contain any shape data (such as country borders), it does have a powerful and well defined hierarchy to describe the relationships between different records. This custom hierarchy makes it harder to use in combination with data from other sources, but the Mapzen [Who's On First](http://whosonfirst.mapzen.com/) project will help by providing concordance between Geonames and other datasets. [Geonames](http://www.geonames.org/) is an aggregation of many authoritative and non-authoritative datasets. It contains information on everything from country borders to airport names to geographical features. While Geonames does not contain any shape data (such as country borders), it does have a powerful and well defined hierarchy to describe the relationships between different records. This custom hierarchy makes it harder to use in combination with data from other sources, but the [Who's On First](https://www.whosonfirst.org) project will help by providing concordance between Geonames and other datasets.
In the meantime, Geonames still provides a wide variety of useful data that helps augment the other datasets used by Mapzen Search. In the meantime, Geonames still provides a wide variety of useful data that helps augment the other datasets used by Pelias.
Geonames data is licensed [CC-BY-3.0](http://creativecommons.org/licenses/by/3.0/). Geonames data is licensed [CC-BY-3.0](http://creativecommons.org/licenses/by/3.0/).
# Deprecated sources # Deprecated sources
Certain data sources used to be supported by Mapzen Search but are no longer offered part of the core service and have been superseded by a new data source. Certain data sources used to be supported by Pelias but are no longer offered part of the core service and have been superseded by a new data source.
## Quattroshapes ## Quattroshapes
`sources=quattroshapes` | `sources=qs` `sources=quattroshapes` | `sources=qs`
Quattroshapes used to be supported by Mapzen Search and its use was discontinued in April 2016. Quattroshapes used to be supported by Pelias and its use was discontinued in April 2016. The importer can still be found at [pelias-deprecated/quattroshapes](https://github.com/pelias-deprecated/quattroshapes).
It has been replaced by Who's on First, which continues to provide global administrative place data (countries, regions, counties, cities) and administrative lookup (_"what country, region, and city is this address part of?"_). It has been replaced by Who's on First, which continues to provide global administrative place data (countries, regions, counties, cities) and administrative lookup (_"what country, region, and city is this address part of?"_).

Loading…
Cancel
Save