Short autocomplete inputs are very difficult to serve in a performant
and low-latency way. With shorter inputs, many more documents match for
just about any input string.
In our testing, one to three character input texts generally match up to
100 million documents out of a 560 million document full planet build.
There's really no way to make scoring 100 million documents fast,
so in order to achieve acceptable performance (ideally, <100ms P99
latency), it's worth looking at ways to either avoid querying
Elasticsearch or reducing the scope of autocomplete queries.
Short autocomplete queries without a focus.point parameter can be
cached. There are only 47,000 possible 1-3 character alphanumerical
inputs. At this time, caching is outside the scope of Pelias itself but
can easily be implemented with Varnish, Nginx, Fastly, Cloudfront, and
lots of other tools and services.
Queries with a `focus.point` are effectively uncachable however, since
the coordinate chosen will often be unique.
This PR uses the `focus.point` coordinate to build a
hard filter limiting the search to documents only within a certain
radius of the coordinate. This can reduce the number of documents
searched and improve performance, while still returning results that are
useful.
It takes two parameters, driven by `pelias-config`:
- `api.autocomplete.focusHardLimitTextLength': the maximum length of text
for which a hard distance filter will be constructed
- `api.autocomplete.focusHardLimitMultiplier`: the length of the input
text will be multiplied by this number to get the total hard filter
radius in kilometers.
For example, with `focusHardLimitTextLength` 4, and
`focusHardLimitMultiplier` 50, the following hard filters would be
constructed:
| text length | max distance |
| ---- | ----|
| 1 | 50 |
| 2 | 100 |
| 3 | 150 |
| 4+ | unlimited |
The NSP service is shutting down, and as far as I know we have not used
this tool frequently at all. NPM has better tools built in now, like
`npm audit`.
As I recall, this was only added to mitigate issues with running `npm
ls` and correctly showing all dependencies installed (see
https://github.com/pelias/api/issues/1179).
This change moves semantic-release out of dev-dependencies, but keeps
its functionality by calling semantic-release as usual in a `release`
TravisCI build stage. There are several advantages to this method:
1.) semantic-release is run only after all builds succeed. Our previous
approach could have theoretically run semantic-release when some Node.js
versions failed with the current code
2.) semantic-release (and it's many dependencies) are removed from
`node_modules`. This increases the speed of `npm install` in all cases,
and reduces the size of our Docker images by 20MB (from 284MB to 264MB)!
Since the only time semantic-release is needed is on TravisCI anyway, it
seems pointless that every installation of Pelias should include it.
3.) Because semantic-release is not in `package.json`, Greenkeeper will
not attempt to update it. Semantic release updates _very_ frequently,
and each update attempt seems to have a decent chance of experiencing a
random TravisCI failure, causing unwanted notifications.
There are probably downsides to this approach. For example, we should
consider pinning the major version of semantic release during install.
Additionally, and for obvious reasons, we can't fully test this change
until it's merged to the `production` branch. We should consider testing
it first on a lower priority repository.
If this change _does_ work well, we should consider adopting it
everywhere.
This is a somewhat roundabout fix to #1179,
as a way to deal with the persistent npm ls
and commit hook troubles we were dealing with
due to dependencies of the iso3166 package.
Additionally it should give us a faster
definition of these ISO lookups, since the
existing approaches were implemented using
linear scans through an array rather than
map-based lookups.
The overhead of having to merge a Greenkeeper PR for every single change to
every single repository (which leads to cascading PRs), is too much.
Connects https://github.com/pelias/pelias/issues/366
In cases where several conditions are met, it is possible for results to
be returned from the API that are not sorted as they were intended.
These conditions are:
* over 10 results total were returned from Elasticsearch
* the interpolation middleware was called
* not all street results end up with possible interpolated address
matches, and some of those streets come before other interpolated
address records, necessitating a re-sorting of the results in the
interpolation middleware
In these cases, the ordering of streets as defined by Elasticsearch,
such as by linguistic match or distance from a focus point, will no
longer be respected in the results.
This is because Node.js's `Array.prototype.sort` uses an
[*un*stable QuickSort for arrays of size 11 or greater](https://github.com/nodejs/node/blob/master/deps/v8/src/js/array.js#L670).
The solution is to switch to a sorting algorithm that is always stable.
This ensures that whatever ordering was specified in the Elasticsearch
queries is observed, without any of that logic having to be duplicated,
and then possibly conflict.
Stable sorting is provided by the [stable](http://npmjs.com/stable) NPM
package.
This change will make semantic-release run only on the `production`
branch. This means only merges to production will create new NPM
packages, GitHub releases, git tags, and `:latest` Docker images on
Docker hub.
Connects https://github.com/pelias/pelias/issues/721