Short autocomplete inputs are very difficult to serve in a performant
and low-latency way. With shorter inputs, many more documents match for
just about any input string.
In our testing, one to three character input texts generally match up to
100 million documents out of a 560 million document full planet build.
There's really no way to make scoring 100 million documents fast,
so in order to achieve acceptable performance (ideally, <100ms P99
latency), it's worth looking at ways to either avoid querying
Elasticsearch or reducing the scope of autocomplete queries.
Short autocomplete queries without a focus.point parameter can be
cached. There are only 47,000 possible 1-3 character alphanumerical
inputs. At this time, caching is outside the scope of Pelias itself but
can easily be implemented with Varnish, Nginx, Fastly, Cloudfront, and
lots of other tools and services.
Queries with a `focus.point` are effectively uncachable however, since
the coordinate chosen will often be unique.
This PR uses the `focus.point` coordinate to build a
hard filter limiting the search to documents only within a certain
radius of the coordinate. This can reduce the number of documents
searched and improve performance, while still returning results that are
useful.
It takes two parameters, driven by `pelias-config`:
- `api.autocomplete.focusHardLimitTextLength': the maximum length of text
for which a hard distance filter will be constructed
- `api.autocomplete.focusHardLimitMultiplier`: the length of the input
text will be multiplied by this number to get the total hard filter
radius in kilometers.
For example, with `focusHardLimitTextLength` 4, and
`focusHardLimitMultiplier` 50, the following hard filters would be
constructed:
| text length | max distance |
| ---- | ----|
| 1 | 50 |
| 2 | 100 |
| 3 | 150 |
| 4+ | unlimited |
By definition, all boundary.country query matches will either be
identical, or not a match. Thus, it does not make sense to put the query
clause for boundary.country in the `must` section of the query.
In theory, because our queries would generally combine this `must`
clause with others, there shouldn't be any performance improvement (or
regression) from this change.
However, semantically, this clause fits better as a `filter`, and in the
case of a bug causing a degenerate query with the `boundary.country`
query clause as the only one under the `must` section, this would have a
big impact.
This query extends the standard focus query view with hardcoded layers
for which the query applies. The intent was to apply the focus scoring
only to non-admin areas, but the list of layers was already out of date,
as it was missing streets.
The query is fundamentally problematic with custom layers as well.
rather than wholesale converting to libpostal in one release, the decision was made to only use libpostal for /search and not /autocomplete. Until such time that libpostal can be used for parsing autocomplete queries, text_parser_autocomplete.js will contain the converter between addressit and internal parsing format.
This was missed by me when working on https://github.com/pelias/api/pull/580, but caught by the acceptance tests!
Unfortunately it was caught after going to production.
Using the check-types module, check that lat/lon values are numbers,
instead of checking their truthyness, to ensure that queries for null
island work correctly.