Browse Source

First round of tweaks

pull/112/head
Julian Simioni 9 years ago
parent
commit
c25d6190ba
No known key found for this signature in database
GPG Key ID: 6DAD08919FDBF563
  1. 33
      installing.md

33
installing.md

@ -15,7 +15,7 @@ In general, Pelias will require:
a single machine or across several a single machine or across several
* [Node.js](https://nodejs.org/) 0.12 or newer (Node 4 or 5 is recommended) * [Node.js](https://nodejs.org/) 0.12 or newer (Node 4 or 5 is recommended)
* Up to 100GB disk space to download and extract data * Up to 100GB disk space to download and extract data
* Lots of RAM. A full North America OSM import just barely fits on a machine with 16GB RAM * Lots of RAM. At least 2-4GB. A full North America OSM import just barely fits on a machine with 16GB RAM
## Choose your branch ## Choose your branch
@ -69,7 +69,7 @@ selected files.
### Openstreetmap ### Openstreetmap
Openstreetmap has a nearly limitless array of download options, and any of them will work as long as Openstreetmap has a nearly limitless array of download options, and any of them should work as long as
they're in [PBF](http://wiki.openstreetmap.org/wiki/PBF_Format) format. Generally the files will they're in [PBF](http://wiki.openstreetmap.org/wiki/PBF_Format) format. Generally the files will
have the extension `.osm.pbf`. Good sources include the [Mapzen Metro Extracts](https://mapzen.com/data/metro-extracts/) have the extension `.osm.pbf`. Good sources include the [Mapzen Metro Extracts](https://mapzen.com/data/metro-extracts/)
(feel free to submit pull requests for additional cities or regions if needed), and planet files (feel free to submit pull requests for additional cities or regions if needed), and planet files
@ -115,14 +115,16 @@ compare names, so it can tell that records with `101 Main St` and `101 Main Stre
refer to the same place. refer to the same place.
Unfortunately, our current implementation is very slow, and requires about 50GB of scratch disk Unfortunately, our current implementation is very slow, and requires about 50GB of scratch disk
space during a full planet import. space during a full planet import. It's worth noting that Mapzen Search currently does _not_
deduplicate any data, although we hope to improve the performance of deduplication and resume using
it eventually.
## Considerations for full-planet builds ## Considerations for full-planet builds
As may be evident from the dataset section above, importing all the data in all four supported datasets is As may be evident from the dataset section above, importing all the data in all four supported datasets is
worthy of its own discussion. Current [full planet builds](https://pelias-dashboard.mapzen.com/pelias) worthy of its own discussion. Current [full planet builds](https://pelias-dashboard.mapzen.com/pelias)
weigh in at over 300 million documents, and about 140GB total storage in Elasticsearch. Needless to weigh in at over 300 million documents, and require about 140GB total storage in Elasticsearch.
say, a full planet build is not likely to succeed on most personal computers. Needless to say, a full planet build is not likely to succeed on most personal computers.
Fortunately, because of services like AWS and the scalability of Elasticsearch, full planet builds Fortunately, because of services like AWS and the scalability of Elasticsearch, full planet builds
are possible without too much extra effort. To set expectations, a cluster of 4 are possible without too much extra effort. To set expectations, a cluster of 4
@ -133,10 +135,11 @@ c4.8xlarge instance running the importers can complete a full planet build in ab
### Download the Pelias repositories ### Download the Pelias repositories
At a minimum, you'll need the Pelias [schema] and [api] repositories, as well as at least one of the At a minimum, you'll need the Pelias [schema](https://github.com/pelias/schema/) and
importers. Here's a bash snippet that will download all the repositories (they are all small enough [api](https://github.com/pelias/api/) repositories, as well as at least one of the importers. Here's
that you don't have to worry about the space of the code itself), check out the production branch a bash snippet that will download all the repositories (they are all small enough that you don't
(which is probably the one you want), and install all the node module dependencies. have to worry about the space of the code itself), check out the production branch (which is
probably the one you want), and install all the node module dependencies.
```bash ```bash
for repository in schema api whosonfirst geonames openaddresses openstreetmap; do for repository in schema api whosonfirst geonames openaddresses openstreetmap; do
@ -165,7 +168,7 @@ you can see the Elasticsearch configuration looks something like this:
```json ```json
{ {
esclient: { "esclient": {
"hosts": [{ "hosts": [{
"host": "localhost", "host": "localhost",
"port": 9200 "port": 9200
@ -251,7 +254,7 @@ If you want to reset the schema later (to start over with a new import or becaus
has been updated), you can drop the index and start over like so: has been updated), you can drop the index and start over like so:
```bash ```bash
# warning: this will remove all your data from pelias~ # !! WARNING: this will remove all your data from pelias!!
node scripts/drop_index.js # it will ask for confirmation first node scripts/drop_index.js # it will ask for confirmation first
node scripts/create_index.js node scripts/create_index.js
``` ```
@ -268,13 +271,13 @@ the importers with simply `cd $importer_directory; npm start`. Unfortunately onl
and Openstreetmap importers works that way right now. and Openstreetmap importers works that way right now.
For [Geonames](https://github.com/pelias/geonames/) and [Openaddresses](https://github.com/pelias/openaddresses), For [Geonames](https://github.com/pelias/geonames/) and [Openaddresses](https://github.com/pelias/openaddresses),
please see their respective READMEs, which detail the process of running them. We'd love to see pull please see their respective READMEs, which detail the process of running them. By the way, ~we'd
requests that allow them to read configuration from `pelias.json` like the other importers. love to see pull requests that allow them to read configuration from `pelias.json` like the other
importers.
Depending on how much data you've imported, now may be a good time to grab a coffee. Without admin Depending on how much data you've imported, now may be a good time to grab a coffee. Without admin
lookup, the fastest speeds you'll see are around 10,000 records per second. With admin lookup, lookup, the fastest speeds you'll see are around 10,000 records per second. With admin lookup,
1000/sec is pretty fast. expect around 800-1000 inserts per second.
### Start the API ### Start the API

Loading…
Cancel
Save