Julian Simioni
6 years ago
1 changed files with 108 additions and 0 deletions
@ -0,0 +1,108 @@ |
|||||||
|
# Considerations for full-planet builds |
||||||
|
|
||||||
|
Pelias is designed to work with data ranging from a small city to the entire planet. Small cities do |
||||||
|
not require particularly significant resources and should be easy. However, full planet builds |
||||||
|
present many of their own challenges. |
||||||
|
|
||||||
|
Current [full planet builds](https://pelias-dashboard.geocode.earth) weigh in at around 550 million |
||||||
|
documents, and require about 375GB total storage in Elasticsearch. |
||||||
|
|
||||||
|
Fortunately, because of services like AWS and the scalability of Elasticsearch, full planet builds |
||||||
|
are possible without too much extra effort. The process is no different, it just requires more |
||||||
|
hardware and takes longer. |
||||||
|
|
||||||
|
To set expectations, a cluster of 4 [r4.xlarge](https://aws.amazon.com/ec2/instance-types/) AWS |
||||||
|
instances (30GB RAM each) running Elasticsearch, and one m4.4xlarge instance running the importers |
||||||
|
and PIP service can complete a full planet build in about two days. |
||||||
|
|
||||||
|
## Recommended processes |
||||||
|
|
||||||
|
### Use Docker containers and orchestration |
||||||
|
|
||||||
|
We strongly recommend using Docker to run Pelias. All our services include Dockerfiles and the |
||||||
|
resulting images are pushed to [Docker Hub](https://hub.docker.com/r/pelias/) by our CI. Using these |
||||||
|
images will drastically reduce the amount of work it takes to set up Pelias and will ensure you are |
||||||
|
on a known good configuration, minimizing the number of issues you will encounter. |
||||||
|
|
||||||
|
Additionally, there are many great tools for managing container workloads. Simple ones like |
||||||
|
[docker-compose](https://github.com/pelias/docker/) can be used for small installations, and more |
||||||
|
complex tools like [Kubernetes](https://github.com/pelias/kubernetes) can be great for larger |
||||||
|
installations. Pelias is extensively tested on both. |
||||||
|
|
||||||
|
### Use separate Pelias installations for indexing and production traffic |
||||||
|
|
||||||
|
The requirements for performant and reliable Elasticsearch clusters are very different for importing |
||||||
|
new data compared to serving queries. It is _highly_ recommended to use one cluster to do imports, |
||||||
|
save the resulting Elasticsearch index into a snapshot, and then load that snapshot into the cluster |
||||||
|
used to perform actual geocoding. |
||||||
|
|
||||||
|
### Shard count |
||||||
|
|
||||||
|
Historically, Mapzen Search has used 24 Elasticsearch shards for its builds. However, our latest |
||||||
|
guidance from the Elasticsearch team is that shards should be no larger than 50GB, but otherwise |
||||||
|
having as few shards as possible is best. At [geocode.earth](https://geocode.earth) we are |
||||||
|
experimenting with 12 shard builds, and may eventually move to 6. We would appreciate performance |
||||||
|
feedback from anyone doing large builds. |
||||||
|
|
||||||
|
### Force merge your Elasticsearch indices |
||||||
|
|
||||||
|
Pelias Elasticserach indices are generally static, as we do not recommend querying from and |
||||||
|
importing to an Elasticsearch cluster simultaneously. In such cases, the highest levels of |
||||||
|
performance can be achieved by [force-merging](https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-forcemerge.html) the Elasticsearch index. |
||||||
|
|
||||||
|
## Recommended hardware |
||||||
|
|
||||||
|
For a production ready instance of Pelias, capable of supporting a few hundred queries per second |
||||||
|
across a full planet build, a setup like the following should be sufficient. |
||||||
|
|
||||||
|
### Elasticsearch cluster for importing |
||||||
|
|
||||||
|
The main requirement of Elasticsearch is that it has lots of disk. 400GB across the |
||||||
|
cluster is a good minimum. Increased CPU power is useful to achieve a higher throughput for queries, |
||||||
|
but not as important as RAM. |
||||||
|
|
||||||
|
|
||||||
|
### Elasticsearch cluster for querying |
||||||
|
|
||||||
|
For queries, essentially the only bottleneck is CPU, although more RAM is helpful so Elasticsearch |
||||||
|
data can be cached. On AWS, `c5` instances are significantly more performant than even the `c4` |
||||||
|
instances, and should be used if high performance is needed. |
||||||
|
|
||||||
|
_Example configuration:_ 4 `c5.4xlarge` (16 CPU, 32GB RAM) to serve 250 RPS |
||||||
|
|
||||||
|
### Importer machine |
||||||
|
|
||||||
|
The importers are each single-threaded Node.js processes, which require around 8GB of RAM |
||||||
|
each with admin lookup enabled. Faster CPUs will help increase the import speed. Running multiple |
||||||
|
importers in parallel is recommended if the importer machine has enough RAM and CPU to support them. |
||||||
|
|
||||||
|
_Example configuration:_ 1 `c4.4xlarge` (16 CPU, 30GB RAM), running two parallel importers |
||||||
|
|
||||||
|
### Pelias services |
||||||
|
|
||||||
|
Each Pelias service has different memory and CPU requirements. Here are some rough guidelines: |
||||||
|
|
||||||
|
#### API |
||||||
|
RAM: 200MB per instance |
||||||
|
CPU: Single threaded, one instance can serve around 500 RPS |
||||||
|
Disk: None |
||||||
|
|
||||||
|
#### Placeholder |
||||||
|
RAM: 200MB per instance |
||||||
|
CPU: Single threaded, supports [clustering](https://nodejs.org/api/cluster.html) |
||||||
|
Disk: Requires about 2GB for a full planet index |
||||||
|
|
||||||
|
#### Libpostal |
||||||
|
RAM: 3GB per instance |
||||||
|
CPU: Multi-threaded, but extremely fast. A single core can serve 8000+ RPS |
||||||
|
Disk: about 2-3GB of data storage required |
||||||
|
|
||||||
|
### PIP |
||||||
|
RAM: ~6GB |
||||||
|
CPU: 2 cores per instance recommended, which is enough to serve 5000-7000 RPS |
||||||
|
|
||||||
|
### Interpolation |
||||||
|
RAM: 3GB per instance currently (please follow our efforts to [un-bundle |
||||||
|
libpostal](https://github.com/pelias/interpolation/issues/106) from the interpolation service) |
||||||
|
CPU: Single core. One instance can serve around 200RPS |
||||||
|
Disk: 40GB needed for a full planet interpolation dataset |
Loading…
Reference in new issue