Short autocomplete inputs are very difficult to serve in a performant
and low-latency way. With shorter inputs, many more documents match for
just about any input string.
In our testing, one to three character input texts generally match up to
100 million documents out of a 560 million document full planet build.
There's really no way to make scoring 100 million documents fast,
so in order to achieve acceptable performance (ideally, <100ms P99
latency), it's worth looking at ways to either avoid querying
Elasticsearch or reducing the scope of autocomplete queries.
Short autocomplete queries without a focus.point parameter can be
cached. There are only 47,000 possible 1-3 character alphanumerical
inputs. At this time, caching is outside the scope of Pelias itself but
can easily be implemented with Varnish, Nginx, Fastly, Cloudfront, and
lots of other tools and services.
Queries with a `focus.point` are effectively uncachable however, since
the coordinate chosen will often be unique.
This PR uses the `focus.point` coordinate to build a
hard filter limiting the search to documents only within a certain
radius of the coordinate. This can reduce the number of documents
searched and improve performance, while still returning results that are
useful.
It takes two parameters, driven by `pelias-config`:
- `api.autocomplete.focusHardLimitTextLength': the maximum length of text
for which a hard distance filter will be constructed
- `api.autocomplete.focusHardLimitMultiplier`: the length of the input
text will be multiplied by this number to get the total hard filter
radius in kilometers.
For example, with `focusHardLimitTextLength` 4, and
`focusHardLimitMultiplier` 50, the following hard filters would be
constructed:
| text length | max distance |
| ---- | ----|
| 1 | 50 |
| 2 | 100 |
| 3 | 150 |
| 4+ | unlimited |
By definition, all boundary.country query matches will either be
identical, or not a match. Thus, it does not make sense to put the query
clause for boundary.country in the `must` section of the query.
In theory, because our queries would generally combine this `must`
clause with others, there shouldn't be any performance improvement (or
regression) from this change.
However, semantically, this clause fits better as a `filter`, and in the
case of a bug causing a degenerate query with the `boundary.country`
query clause as the only one under the `must` section, this would have a
big impact.
This query extends the standard focus query view with hardcoded layers
for which the query applies. The intent was to apply the focus scoring
only to non-admin areas, but the list of layers was already out of date,
as it was missing streets.
The query is fundamentally problematic with custom layers as well.
This lowers the autocomplete focus boost from 40 to 15. The idea is
that, because the boost for population is 20, the focus can't possibly
override a popular city, but it can come close.
On our default dev build, no acceptance tests fail but the san francisco
autocomplete test now passes!
It's not clear if this fixes WOF venue issues yet, since the latest
build there failed :(
- created `/component` route
- broke out trimByGranularityComponent but could conceivably be combined with existing
- added `address` support to text_parser
- added `component` sanitizer wrapper