documentation/installing.md

# Installing Pelias

Mapzen offers the Mapzen Search service in hopes that as many people as possible will use it,
but we also encourage people to set up their own Pelias instance.

For most cases, it's useful to have much of the installation process automated, so we suggest
looking at the [Pelias Vagrant image](https://github.com/pelias/vagrant).

However, for more in-depth usage, to learn more about the working of Pelias, or to contribute back,
manual setup is useful. These instructions will help you install Pelias from scratch manually.

## Installation Overview

The steps for fully installing Pelias look like this:

1. Decide which datasets and settings will be used
2. Download appropriate data
3. Download Pelias code, using the appropriate branches
4. Set up Elasticsearch
5. Install the Elasticsearch schema using pelias-schema
6. Use one or more importers to load data into Elasticsearch
7. Start the API server to begin handling queries

## System Requirements

In general, Pelias will require:

* A working [Elasticsearch](https://www.elastic.co/products/elasticsearch) 2.3 cluster. It can be on
  a single machine or across several
* [Node.js](https://nodejs.org/) 0.12 or newer (Node 4 or 6 is recommended)
* Up to 100GB disk space to download and extract data
* Lots of RAM, 8GB is a good minimum. A full North America OSM import just fits in 16GB RAM


## Choose your datasets

Pelias can currently import data from four different sources. The contents and description of these
sources are available on our [data sources page](./data-sources.md). Here we'll just focus on what to
download for each one.

### Who's on First

The [Who's on First](https://github.com/pelias/whosonfirst#data) importer contains code and
instructions for downloading WOF data.

Alternatively, there are two other ways to download Who's on First data. The first is to use the pre-created
[bundles](https://whosonfirst.mapzen.com/bundles/). These consist of a series of archives that can
be easily extracted (instructions are on the page).

For more advanced uses, or to contribute back to Who's on First, use the
[whosonfirst-data](https://github.com/whosonfirst/whosonfirst-data) Github repository. Again, there
are [instructions](https://github.com/whosonfirst/whosonfirst-data#git-and-github). Note that this
repo requires [git-lfs](https://git-lfs.github.com/), a lot of bandwidth, and 27GB (currently) of
disk space.

### Geonames

The [pelias/geonames](https://github.com/pelias/geonames/#importing-data) importer contains code and
instructions for downloading Geonames data automatically. Individual countries, or the entire planet
(1.3GB compressed) can be specified.

### OpenAddresses
The OpenAddresses project includes [numerous download options](https://results.openaddresses.io/),
all of which are `.zip` downloads. The full dataset is just over 6 gigabytes compressed (the
extracted files are around 30GB), but there are numerous subdivision options. In any case, the
`.zip` files simply need to be extracted to a directory of your choice, and Pelias can be configured
to either import every `.csv` in that directory, or only selected files.

### OpenStreetMap

OpenStreetMap has a nearly limitless array of download options, and any of them should work as long as
they're in [PBF](http://wiki.openstreetmap.org/wiki/PBF_Format) format. Generally the files will
have the extension `.osm.pbf`. Good sources include the [Mapzen Metro Extracts](https://mapzen.com/data/metro-extracts/)
(which has popular cities available immediately, or custom areas that take only
a few minutes to build), and planet files listed on the [OSM wiki](http://wiki.openstreetmap.org/wiki/Planet.osm).
A full planet PBF file is about 36GB.

## Choose your import settings

There are several options that should be discussed before starting any data imports, as they require
a compromise between import speed and resulting data quality and richness.

### Admin Lookup

Most data that is imported by Pelias comes to us incomplete: many data sources don't supply what we
call admin hierarchy information: the neighbourhood, city, country, or other region that contains
the record. In OpenAddresses, for example, many records contain only a housenumber, street name, and
coordinates.

Fortunately, Who's on First contains a well-developed set of geometries for all admin regions from the
neighbourhood to continent level. Through
[point-in-polygon](https://en.wikipedia.org/wiki/Point_in_polygon) lookup, our importers can
[derive](https://github.com/pelias/wof-admin-lookup) this information!

The downsides to enabling admin lookup are increased memory requirements and longer import times.
Because geometry data is quite large, expect to use about 6GB of RAM (not disk) during import just
for this geometry data. And because of the complexity of the required calculations, imports with
admin lookup are up to 10 times slower than without.

Who's on First, of course, always includes full hierarchy information because it's built into the
dataset itself, so there's no tradeoff to be made. Who's on First data will always import quite fast
and with full hierarchy information.

### Address Deduplication

OpenAddresses data contains lots of addresses, but it also contains lots of duplicate data. To help
reduce this problem we've built an [address-deduplicator](https://github.com/pelias/address-deduplicator)
that can be run at import. It uses the [OpenVenues deduplicator](https://github.com/openvenues/address_deduper)
to remove records that are near each other and have names that are likely to be duplicates. Note
that it's considerably smarter than simply doing exact comparisons of names and coordinates: it uses
[Geohash prefixes](https://en.wikipedia.org/wiki/Geohash) to compare nearby records, and the
[libpostal address normalizer](https://github.com/openvenues/libpostal#examples-of-normalization) to
compare names, so it can tell that records with `101 Main St` and `101 Main Street` are likely to
refer to the same place.

Unfortunately, our current implementation is very slow, and requires about 50GB of scratch disk
space during a full planet import. It's worth noting that Mapzen Search currently does _not_
deduplicate any data, although we hope to improve the performance of deduplication and resume using
it eventually.

## Considerations for full-planet builds

As may be evident from the dataset section above, importing all the data in all four supported datasets is
worthy of its own discussion. Current [full planet builds](https://pelias-dashboard.mapzen.com/pelias)
weigh in at over 320 million documents, and require about 230GB total storage in Elasticsearch.
Needless to say, a full planet build is not likely to succeed on most personal computers.

Fortunately, because of services like AWS and the scalability of Elasticsearch, full planet builds
are possible without too much extra effort. To set expectations, a cluster of 4
[r3.xlarge](https://aws.amazon.com/ec2/instance-types/) AWS instances running Elasticsearch, and one
c4.8xlarge instance running the importers can complete a full planet build in about two days.

## Choose your Pelias code branch

As part of the setup instructions below, you'll be downloading several Pelias packages from source
on Github. All of these packages offer 3 branches for various use cases. Based on your needs, you
should pick one of these branches and use the same one across all of the Pelias packages.

`production`: contains only code that has been tested against a full-planet build and is live on
Mapzen Search. This is the "safest" branch and it will change the least frequently, although we
generally release new code at least once a week.

`staging`: these branches contain the code that is currently being tested against a full planet
build for imminent release to Mapzen Search. It's useful to track what code will be going out in the
next release, but not much else.

`master`: master branches contain the latest code that has passed code review, unit/integration
tests, and is ready to be included in the next release. While we try to avoid it, the nature of the
master branch is that it will sometimes be broken. That said, these are the branches to use for
development of new features.

## Installation

### Download the Pelias repositories

At a minimum, you'll need the Pelias [schema](https://github.com/pelias/schema/) and
[api](https://github.com/pelias/api/) repositories, as well as at least one of the importers. Here's
a bash snippet that will download all the repositories (they are all small enough that you don't
have to worry about the space of the code itself), check out the production branch (which is
probably the one you want), and install all the node module dependencies.

```bash
for repository in schema api whosonfirst geonames openaddresses openstreetmap; do
	git clone git@github.com:pelias/${repository}.git
	pushd $repository > /dev/null
	git checkout production # or staging, or remove this line to stay with master
	npm install
	popd > /dev/null
done
```

### Customize Pelias Config

Nearly all configuration for Pelias is driven through a single config file: `pelias.json`. By
default, Pelias will look for this file in your home directory, but you can configure where it
looks. For more details, see the [pelias-config](https://github.com/pelias/config) repository.

The two main things of note to configure are where on the network to find Elasticsearch, and where
to find the downloaded data files.

Pelias will by default look for Elasticsearch on `localhost` at port 9200 (the standard
Elasticsearch port).

By taking a look at the [default config](https://github.com/pelias/config/blob/master/config/defaults.json#L2),
you can see the Elasticsearch configuration looks something like this:

```js
{
  "esclient": {
  "hosts": [{
    "host": "localhost",
    "port": 9200
  }]

  ... // rest of config
}
```

If you want to connect to Elasticsearch somewhere else, change `localhost` as needed. You can
specify multiple hosts if you have a large cluster. In fact, the entire `esclient` section of the
config is sent along to the [elasticsearch-js](https://github.com/elastic/elasticsearch-js) module, so
any of its [configuration options](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/configuration.html)
are valid.

The other major section, `imports`, defines settings for each importer. The defaults look like this:

```json
{
 "imports": {
    "geonames": {
      "datapath": "./data",
      "adminLookup": false
    },
    "openstreetmap": {
      "datapath": "/mnt/pelias/openstreetmap",
	  "adminLookup": false,
      "leveldbpath": "/tmp",
      "import": [{
        "filename": "planet.osm.pbf"
      }]
    },
    "openaddresses": {
      "datapath": "/mnt/pelias/openaddresses",
      "adminLookup": false,
      "files": []
    },
    "whosonfirst": {
      "datapath": "/mnt/pelias/whosonfirst"
    }
  }
}
```

As you can see, the default datapaths are meant to be changed. This is also where you can enable
admin lookup by overriding the default value.

### Install Elasticsearch

Other than requiring Elasticsearch 2.3, nothing special in the Elasticsearch setup is required for
Pelias, so please refer to the [official 2.3 install docs](https://www.elastic.co/guide/en/elasticsearch/reference/2.3/setup.html).

Older versions of Elasticsearch are not supported.

Make sure Elasticsearch is running and connectable, and then you can continue with the Pelias
specific setup and importing. Using a plugin like [head](https://mobz.github.io/elasticsearch-head/)
or [Marvel](https://www.elastic.co/products/marvel) can help monitor Elasticsearch as you import
data.

If you're using a terminal, you can also search and/or monitor Elasticsearch using their [APIs.](https://www.elastic.co/guide/en/elasticsearch/reference/2.3/api-conventions.html)

**Note:** On large imports, Elasticsearch can be very sensitive to memory issues. Be sure to modify it's [heap size](https://www.elastic.co/guide/en/elasticsearch/guide/2.x/heap-sizing.html) from the default confiration to something more appropriate to your machine.

### Set up the Elasticsearch Schema

The Elasticsearch Schema is analogous to the layout of a table in a traditional relational database,
like MySQL or PostgreSQL. While Elasticsearch attempts to auto-detect a schema that works when
inserting new data, this generally leads to non-optimal results. In the case of Pelias, inserting
data without first applying the Pelias schema will cause all queries to fail completely: Pelias
requires specific configuration settings for both performance and accuracy reasons.

Fortunately, now that your `pelias.json` file is configured with how to connect to Elasticsearch,
the Schema repository can automatically create the Pelias index and configure it exactly as needed:

```bash
cd schema # assuming you've just run the bash snippet to download the repos from earlier
node scripts/create_index.js
```

If you want to reset the schema later (to start over with a new import or because the schema code
has been updated), you can drop the index and start over like so:

```bash
# !! WARNING: this will remove all your data from pelias!!
node scripts/drop_index.js # it will ask for confirmation first
node scripts/create_index.js
```

Note that Elasticsearch has no analogy to a database migration, so you generally have to delete and
reindex all your data after making schema changes.

### Run the importers

Now that the schema is set up, you're ready to begin importing data.

Our [goal](https://github.com/pelias/pelias/issues/255) is that eventually you'll be able to run all
the importers with simply `cd $importer_directory; npm start`.

That importer is the [Geonames](https://github.com/pelias/geonames/) importer, please see its README file
for the most up to date instructions. 

Depending on how much data you've imported, now may be a good time to grab a coffee. Without admin
lookup, the fastest speeds you'll see are around 10,000 records per second. With admin lookup,
expect around 800-2000 inserts per second.

### Start the API

As soon as you have any data in Elasticsearch, you can start running queries against the
[Pelias API server](https://github.com/pelias/api/).

Again thanks to `pelias.json`, the API already knows how to connect to Elasticsearch, so all that's
required to star the API is `npm start`. You can now send queries to `http://localhost:3100/`!
First draft 9 years ago			`# Installing Pelias`

			`Mapzen offers the Mapzen Search service in hopes that as many people as possible will use it,`
Rewrite introduction It now focuses more on how the Vagrant install should be used in many cases, but manual install is the focus here. 8 years ago			`but we also encourage people to set up their own Pelias instance.`

			`For most cases, it's useful to have much of the installation process automated, so we suggest`
			`looking at the [Pelias Vagrant image](https://github.com/pelias/vagrant).`

			`However, for more in-depth usage, to learn more about the working of Pelias, or to contribute back,`
			`manual setup is useful. These instructions will help you install Pelias from scratch manually.`
First draft 9 years ago
Create table of contents The contents are reordered slightly to fit the table. 8 years ago			`## Installation Overview`

			`The steps for fully installing Pelias look like this:`

			`1. Decide which datasets and settings will be used`
			`2. Download appropriate data`
			`3. Download Pelias code, using the appropriate branches`
			`4. Set up Elasticsearch`
			`5. Install the Elasticsearch schema using pelias-schema`
			`6. Use one or more importers to load data into Elasticsearch`
			`7. Start the API server to begin handling queries`

			`## System Requirements`
First draft 9 years ago
			`In general, Pelias will require:`

update install.md for elasticsearch 2 8 years ago			`* A working [Elasticsearch](https://www.elastic.co/products/elasticsearch) 2.3 cluster. It can be on`
First draft 9 years ago			`a single machine or across several`
Update supported Node.js versions 8 years ago			`* [Node.js](https://nodejs.org/) 0.12 or newer (Node 4 or 6 is recommended)`
First draft 9 years ago			`* Up to 100GB disk space to download and extract data`
Clarify memory requirements 8 years ago			`* Lots of RAM, 8GB is a good minimum. A full North America OSM import just fits in 16GB RAM`
First draft 9 years ago

			`## Choose your datasets`

			`Pelias can currently import data from four different sources. The contents and description of these`
Update installing.md 8 years ago			`sources are available on our [data sources page](./data-sources.md). Here we'll just focus on what to`
First draft 9 years ago			`download for each one.`

Whosonfirst -> Who's on First 8 years ago			`### Who's on First`
First draft 9 years ago
update WOF install instructions original instructions don't acknowledge the existence of download_data.js in the WOF importer module. 8 years ago			`The [Who's on First](https://github.com/pelias/whosonfirst#data) importer contains code and`
Fix typo 8 years ago			`instructions for downloading WOF data.`
update WOF install instructions original instructions don't acknowledge the existence of download_data.js in the WOF importer module. 8 years ago
Fix typo 8 years ago			`Alternatively, there are two other ways to download Who's on First data. The first is to use the pre-created`
First draft 9 years ago			`[bundles](https://whosonfirst.mapzen.com/bundles/). These consist of a series of archives that can`
			`be easily extracted (instructions are on the page).`

Whosonfirst -> Who's on First 8 years ago			`For more advanced uses, or to contribute back to Who's on First, use the`
First draft 9 years ago			`[whosonfirst-data](https://github.com/whosonfirst/whosonfirst-data) Github repository. Again, there`
			`are [instructions](https://github.com/whosonfirst/whosonfirst-data#git-and-github). Note that this`
			`repo requires [git-lfs](https://git-lfs.github.com/), a lot of bandwidth, and 27GB (currently) of`
			`disk space.`

			`### Geonames`

			`The [pelias/geonames](https://github.com/pelias/geonames/#importing-data) importer contains code and`
			`instructions for downloading Geonames data automatically. Individual countries, or the entire planet`
Add more info on download sizes 8 years ago			`(1.3GB compressed) can be specified.`
First draft 9 years ago
Use consistent capitalization for OpenAddresses 8 years ago			`### OpenAddresses`
			`The OpenAddresses project includes [numerous download options](https://results.openaddresses.io/),`
Clarify size of OA data The compressed file is small but it extracts to be quite large 8 years ago			all of which are `.zip` downloads. The full dataset is just over 6 gigabytes compressed (the
			`extracted files are around 30GB), but there are numerous subdivision options. In any case, the`
			`.zip` files simply need to be extracted to a directory of your choice, and Pelias can be configured
			to either import every `.csv` in that directory, or only selected files.
First draft 9 years ago
Fix OpenStreetMap capitalization 8 years ago			`### OpenStreetMap`
First draft 9 years ago
Fix OpenStreetMap capitalization 8 years ago			`OpenStreetMap has a nearly limitless array of download options, and any of them should work as long as`
First draft 9 years ago			`they're in [PBF](http://wiki.openstreetmap.org/wiki/PBF_Format) format. Generally the files will`
			have the extension `.osm.pbf`. Good sources include the [Mapzen Metro Extracts](https://mapzen.com/data/metro-extracts/)
Update info on Metro Extracts 8 years ago			`(which has popular cities available immediately, or custom areas that take only`
			`a few minutes to build), and planet files listed on the [OSM wiki](http://wiki.openstreetmap.org/wiki/Planet.osm).`
			`A full planet PBF file is about 36GB.`
First draft 9 years ago
Create table of contents The contents are reordered slightly to fit the table. 8 years ago			`## Choose your import settings`
First draft 9 years ago
			`There are several options that should be discussed before starting any data imports, as they require`
			`a compromise between import speed and resulting data quality and richness.`

			`### Admin Lookup`

			`Most data that is imported by Pelias comes to us incomplete: many data sources don't supply what we`
			`call admin hierarchy information: the neighbourhood, city, country, or other region that contains`
Use consistent capitalization for OpenAddresses 8 years ago			`the record. In OpenAddresses, for example, many records contain only a housenumber, street name, and`
First draft 9 years ago			`coordinates.`

Whosonfirst -> Who's on First 8 years ago			`Fortunately, Who's on First contains a well-developed set of geometries for all admin regions from the`
First draft 9 years ago			`neighbourhood to continent level. Through`
			`[point-in-polygon](https://en.wikipedia.org/wiki/Point_in_polygon) lookup, our importers can`
			`[derive](https://github.com/pelias/wof-admin-lookup) this information!`

			`The downsides to enabling admin lookup are increased memory requirements and longer import times.`
			`Because geometry data is quite large, expect to use about 6GB of RAM (not disk) during import just`
			`for this geometry data. And because of the complexity of the required calculations, imports with`
			`admin lookup are up to 10 times slower than without.`

Whosonfirst -> Who's on First 8 years ago			`Who's on First, of course, always includes full hierarchy information because it's built into the`
			`dataset itself, so there's no tradeoff to be made. Who's on First data will always import quite fast`
First draft 9 years ago			`and with full hierarchy information.`

			`### Address Deduplication`

Use consistent capitalization for OpenAddresses 8 years ago			`OpenAddresses data contains lots of addresses, but it also contains lots of duplicate data. To help`
First draft 9 years ago			`reduce this problem we've built an [address-deduplicator](https://github.com/pelias/address-deduplicator)`
			`that can be run at import. It uses the [OpenVenues deduplicator](https://github.com/openvenues/address_deduper)`
			`to remove records that are near each other and have names that are likely to be duplicates. Note`
			`that it's considerably smarter than simply doing exact comparisons of names and coordinates: it uses`
			`[Geohash prefixes](https://en.wikipedia.org/wiki/Geohash) to compare nearby records, and the`
			`[libpostal address normalizer](https://github.com/openvenues/libpostal#examples-of-normalization) to`
			compare names, so it can tell that records with `101 Main St` and `101 Main Street` are likely to
			`refer to the same place.`

			`Unfortunately, our current implementation is very slow, and requires about 50GB of scratch disk`
First round of tweaks 8 years ago			`space during a full planet import. It's worth noting that Mapzen Search currently does _not_`
			`deduplicate any data, although we hope to improve the performance of deduplication and resume using`
			`it eventually.`
First draft 9 years ago
			`## Considerations for full-planet builds`

			`As may be evident from the dataset section above, importing all the data in all four supported datasets is`
			`worthy of its own discussion. Current [full planet builds](https://pelias-dashboard.mapzen.com/pelias)`
Update sizes 8 years ago			`weigh in at over 320 million documents, and require about 230GB total storage in Elasticsearch.`
First round of tweaks 8 years ago			`Needless to say, a full planet build is not likely to succeed on most personal computers.`
First draft 9 years ago
			`Fortunately, because of services like AWS and the scalability of Elasticsearch, full planet builds`
			`are possible without too much extra effort. To set expectations, a cluster of 4`
			`[r3.xlarge](https://aws.amazon.com/ec2/instance-types/) AWS instances running Elasticsearch, and one`
			`c4.8xlarge instance running the importers can complete a full planet build in about two days.`

Create table of contents The contents are reordered slightly to fit the table. 8 years ago			`## Choose your Pelias code branch`

			`As part of the setup instructions below, you'll be downloading several Pelias packages from source`
			`on Github. All of these packages offer 3 branches for various use cases. Based on your needs, you`
			`should pick one of these branches and use the same one across all of the Pelias packages.`

			`production`: contains only code that has been tested against a full-planet build and is live on
			`Mapzen Search. This is the "safest" branch and it will change the least frequently, although we`
			`generally release new code at least once a week.`

			`staging`: these branches contain the code that is currently being tested against a full planet
			`build for imminent release to Mapzen Search. It's useful to track what code will be going out in the`
			`next release, but not much else.`

			`master`: master branches contain the latest code that has passed code review, unit/integration
			`tests, and is ready to be included in the next release. While we try to avoid it, the nature of the`
			`master branch is that it will sometimes be broken. That said, these are the branches to use for`
			`development of new features.`

First draft 9 years ago			`## Installation`

			`### Download the Pelias repositories`

First round of tweaks 8 years ago			`At a minimum, you'll need the Pelias [schema](https://github.com/pelias/schema/) and`
			`[api](https://github.com/pelias/api/) repositories, as well as at least one of the importers. Here's`
			`a bash snippet that will download all the repositories (they are all small enough that you don't`
			`have to worry about the space of the code itself), check out the production branch (which is`
			`probably the one you want), and install all the node module dependencies.`
First draft 9 years ago
			```bash
			`for repository in schema api whosonfirst geonames openaddresses openstreetmap; do`
			`git clone git@github.com:pelias/${repository}.git`
			`pushd $repository > /dev/null`
			`git checkout production # or staging, or remove this line to stay with master`
			`npm install`
			`popd > /dev/null`
			`done`
			```

			`### Customize Pelias Config`

			Nearly all configuration for Pelias is driven through a single config file: `pelias.json`. By
			`default, Pelias will look for this file in your home directory, but you can configure where it`
			`looks. For more details, see the [pelias-config](https://github.com/pelias/config) repository.`

			`The two main things of note to configure are where on the network to find Elasticsearch, and where`
			`to find the downloaded data files.`

			Pelias will by default look for Elasticsearch on `localhost` at port 9200 (the standard
			`Elasticsearch port).`

			`By taking a look at the [default config](https://github.com/pelias/config/blob/master/config/defaults.json#L2),`
			`you can see the Elasticsearch configuration looks something like this:`

remove json from code with comments, add missing } on other code block 8 years ago			```js
First draft 9 years ago			`{`
First round of tweaks 8 years ago			`"esclient": {`
First draft 9 years ago			`"hosts": [{`
			`"host": "localhost",`
			`"port": 9200`
			`}]`

			`... // rest of config`
			`}`
			```

			If you want to connect to Elasticsearch somewhere else, change `localhost` as needed. You can
			specify multiple hosts if you have a large cluster. In fact, the entire `esclient` section of the
			`config is sent along to the [elasticsearch-js](https://github.com/elastic/elasticsearch-js) module, so`
			`any of its [configuration options](https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/configuration.html)`
			`are valid.`

fix typo 8 years ago			The other major section, `imports`, defines settings for each importer. The defaults look like this:
First draft 9 years ago
			```json
			`{`
			`"imports": {`
			`"geonames": {`
			`"datapath": "./data",`
			`"adminLookup": false`
			`},`
			`"openstreetmap": {`
			`"datapath": "/mnt/pelias/openstreetmap",`
			`"adminLookup": false,`
			`"leveldbpath": "/tmp",`
			`"import": [{`
			`"filename": "planet.osm.pbf"`
			`}]`
			`},`
			`"openaddresses": {`
			`"datapath": "/mnt/pelias/openaddresses",`
Update config and running instructions for importers More of them behave in the same way now, hooray! 8 years ago			`"adminLookup": false,`
First draft 9 years ago			`"files": []`
			`},`
			`"whosonfirst": {`
			`"datapath": "/mnt/pelias/whosonfirst"`
			`}`
remove json from code with comments, add missing } on other code block 8 years ago			`}`
First draft 9 years ago			`}`
			```

			`As you can see, the default datapaths are meant to be changed. This is also where you can enable`
			`admin lookup by overriding the default value.`

			`### Install Elasticsearch`

update install.md for elasticsearch 2 8 years ago			`Other than requiring Elasticsearch 2.3, nothing special in the Elasticsearch setup is required for`
			`Pelias, so please refer to the [official 2.3 install docs](https://www.elastic.co/guide/en/elasticsearch/reference/2.3/setup.html).`
First draft 9 years ago
update install.md for elasticsearch 2 8 years ago			`Older versions of Elasticsearch are not supported.`
specifics about ES including heap size info, Also minor quibble on line 112, the address_dedupe libarary currently used does not use libpostal as far as I can tell. I believe it uses the earlier address_normalize library, which has since been depreciated. 8 years ago
First draft 9 years ago			`Make sure Elasticsearch is running and connectable, and then you can continue with the Pelias`
			`specific setup and importing. Using a plugin like [head](https://mobz.github.io/elasticsearch-head/)`
			`or [Marvel](https://www.elastic.co/products/marvel) can help monitor Elasticsearch as you import`
			`data.`

update install.md for elasticsearch 2 8 years ago			`If you're using a terminal, you can also search and/or monitor Elasticsearch using their [APIs.](https://www.elastic.co/guide/en/elasticsearch/reference/2.3/api-conventions.html)`
specifics about ES including heap size info, Also minor quibble on line 112, the address_dedupe libarary currently used does not use libpostal as far as I can tell. I believe it uses the earlier address_normalize library, which has since been depreciated. 8 years ago
Whitespace 8 years ago			`Note: On large imports, Elasticsearch can be very sensitive to memory issues. Be sure to modify it's [heap size](https://www.elastic.co/guide/en/elasticsearch/guide/2.x/heap-sizing.html) from the default confiration to something more appropriate to your machine.`
specifics about ES including heap size info, Also minor quibble on line 112, the address_dedupe libarary currently used does not use libpostal as far as I can tell. I believe it uses the earlier address_normalize library, which has since been depreciated. 8 years ago
First draft 9 years ago			`### Set up the Elasticsearch Schema`

			`The Elasticsearch Schema is analogous to the layout of a table in a traditional relational database,`
			`like MySQL or PostgreSQL. While Elasticsearch attempts to auto-detect a schema that works when`
			`inserting new data, this generally leads to non-optimal results. In the case of Pelias, inserting`
			`data without first applying the Pelias schema will cause all queries to fail completely: Pelias`
			`requires specific configuration settings for both performance and accuracy reasons.`

			Fortunately, now that your `pelias.json` file is configured with how to connect to Elasticsearch,
			`the Schema repository can automatically create the Pelias index and configure it exactly as needed:`

			```bash
			`cd schema # assuming you've just run the bash snippet to download the repos from earlier`
			`node scripts/create_index.js`
			```

			`If you want to reset the schema later (to start over with a new import or because the schema code`
			`has been updated), you can drop the index and start over like so:`

			```bash
First round of tweaks 8 years ago			`# !! WARNING: this will remove all your data from pelias!!`
First draft 9 years ago			`node scripts/drop_index.js # it will ask for confirmation first`
			`node scripts/create_index.js`
			```

			`Note that Elasticsearch has no analogy to a database migration, so you generally have to delete and`
			`reindex all your data after making schema changes.`

			`### Run the importers`

matching to style guide 8 years ago			`Now that the schema is set up, you're ready to begin importing data.`
First draft 9 years ago
			`Our [goal](https://github.com/pelias/pelias/issues/255) is that eventually you'll be able to run all`
matching to style guide 8 years ago			the importers with simply `cd $importer_directory; npm start`.
First draft 9 years ago
Update config and running instructions for importers More of them behave in the same way now, hooray! 8 years ago			`That importer is the [Geonames](https://github.com/pelias/geonames/) importer, please see its README file`
matching to style guide 8 years ago			`for the most up to date instructions.`
First draft 9 years ago
			`Depending on how much data you've imported, now may be a good time to grab a coffee. Without admin`
			`lookup, the fastest speeds you'll see are around 10,000 records per second. With admin lookup,`
Update import speed 8 years ago			`expect around 800-2000 inserts per second.`
First draft 9 years ago
			`### Start the API`

			`As soon as you have any data in Elasticsearch, you can start running queries against the`
			`[Pelias API server](https://github.com/pelias/api/).`

			Again thanks to `pelias.json`, the API already knows how to connect to Elasticsearch, so all that's
			required to star the API is `npm start`. You can now send queries to `http://localhost:3100/`!