pelias/api - api - CogtoSrc

Commit Graph

Author	SHA1	Message	Date
Julian Simioni	df8f1bb332	feat(autocomplete) add hard distance filter to short focus.point queries Short autocomplete inputs are very difficult to serve in a performant and low-latency way. With shorter inputs, many more documents match for just about any input string. In our testing, one to three character input texts generally match up to 100 million documents out of a 560 million document full planet build. There's really no way to make scoring 100 million documents fast, so in order to achieve acceptable performance (ideally, <100ms P99 latency), it's worth looking at ways to either avoid querying Elasticsearch or reducing the scope of autocomplete queries. Short autocomplete queries without a focus.point parameter can be cached. There are only 47,000 possible 1-3 character alphanumerical inputs. At this time, caching is outside the scope of Pelias itself but can easily be implemented with Varnish, Nginx, Fastly, Cloudfront, and lots of other tools and services. Queries with a `focus.point` are effectively uncachable however, since the coordinate chosen will often be unique. This PR uses the `focus.point` coordinate to build a hard filter limiting the search to documents only within a certain radius of the coordinate. This can reduce the number of documents searched and improve performance, while still returning results that are useful. It takes two parameters, driven by `pelias-config`: - `api.autocomplete.focusHardLimitTextLength': the maximum length of text for which a hard distance filter will be constructed - `api.autocomplete.focusHardLimitMultiplier`: the length of the input text will be multiplied by this number to get the total hard filter radius in kilometers. For example, with `focusHardLimitTextLength` 4, and `focusHardLimitMultiplier` 50, the following hard filters would be constructed: \| text length \| max distance \| \| ---- \| ----\| \| 1 \| 50 \| \| 2 \| 100 \| \| 3 \| 150 \| \| 4+ \| unlimited \|	6 years ago
Julian Simioni	df04aedab2	Merge pull request #1220 from pelias/structured-logs Use structured logs	6 years ago
Julian Simioni	003c182df5	feat(log): add structured data to address parser logs These logs have been very confusing to us for a long time, and we have never bothered to collect enought data to understand them better.	6 years ago
Julian Simioni	a1add3656e	feat(log): Use structured logs for place endpoint	6 years ago
Julian Simioni	a7932d0b8c	feat(log): move retry info to structured logs	6 years ago
Julian Simioni	0056c0749a	feat(log): Add structured logs for coarse_reverse	6 years ago
Julian Simioni	4adf4b3dd7	feat(queries): Normalize all query names They should start with the endpoint (ideally), and address_search_using_ids should not have the same query name as 'search_fallback'.	6 years ago
Julian Simioni	d681a114d6	feat(log): Remove most unstructured controller logs These are nose now, the structured logs have much better info	6 years ago
Julian Simioni	080cd6854f	Remove verbose Elasticsearch log lines	6 years ago
Julian Simioni	f5c6dcf882	feat(log): Add structured logs for interpolation service	6 years ago
Julian Simioni	e55fd28c90	Require microservice-wrapper version with response time	6 years ago
Julian Simioni	c91fd58ec6	fix(log): replace brittle string extraction	6 years ago
Julian Simioni	24ef0a4361	feat(log): replace console logs with pelias-logger This ensures we always get output in JSON if we ask for it.	6 years ago
Julian Simioni	49a988ce37	Remove non-JSON log lines We can have logs with a lot more fidelity by logging structured JSON directly.	6 years ago
Julian Simioni	8c37ee63dd	feat(log): Add JSON log for Elasticsearch queries This adds a structured and detailed log line for each Elasticsearch query. It includes information like the total number of Elasticsearch hits, how long Elasticsearch took to process the request, query parameters, etc. This is extremely useful for later analysis as the structured nature of the query allows for powerful filtering.	6 years ago
Julian Simioni	92d15c6687	feat(log): Log structured data about Placeholder requests Structured logs allow later analysis with details like query parameters, request time, etc.	6 years ago
Peter Johnson	fa8257cd06	feat(debug): debugging info for interpolation service	6 years ago
Peter Johnson	9d4c773ce1	testing: add test case: incorrect parsing of diagonal directionals - no subsequent element	6 years ago
Julian Simioni	ee43ec041c	fix(libpostal): check for next element before using it	6 years ago
Julian Simioni	42cfcd843a	Merge pull request #1216 from pelias/libpostal_aus_unit_numbers libpostal aus unit numbers	6 years ago
Peter Johnson	4e8a5385f4	feat(libpostal_patch): additional tests	6 years ago
Peter Johnson	27a9e1d900	feat(libpostal_patch): enable field mapping for "unit"	6 years ago
Peter Johnson	69ddbaf3be	feat(libpostal_patch): correctly parse australian-style unit numbers	6 years ago
Peter Johnson	6e2e4fc53a	feat(libpostal_patch): refactor diagonals patch to use an index and lodash functions	6 years ago
Peter Johnson	c8d7776587	feat(libpostal_patch): refactor code to use lodash methods for key checking	6 years ago
Peter Johnson	19eb0b57d1	feat(libpostal_patch): add a libpostal patch which allows recasting labels	6 years ago
Julian Simioni	a982eab215	Merge pull request #1214 from pelias/fix-autocomplete-tokenization fix(autocomplete): detect the case where input text is unsubstantial	6 years ago
Julian Simioni	b1107a0c8f	fix(autocomplete): detect the case where input text is unsubstantial It's possible for the `text` input to /v1/autocomplete to be of non-zero length after trimming whitespace and quotes, but still be insufficient to use for geocoding. One common case is that it contains only commas, slashes, or other delimiters. Our query logic currently does not handle this case, and will generate Elasticsearch queries that do not have a primary `must` clause and end up searching every document in the index. These queries are slow, take up cluster resources, and are not useful. By detecting unsubstantial inputs, we can prevent this.	6 years ago
Julian Simioni	97b5e9d2ef	Merge pull request #1212 from pelias/boundary-country-to-filter fix(boundary.country): use boundary.country query as filter	6 years ago
Julian Simioni	c20737f458	fix(boundary.country): use boundary.country as filter By definition, all boundary.country query matches will either be identical, or not a match. Thus, it does not make sense to put the query clause for boundary.country in the `must` section of the query. In theory, because our queries would generally combine this `must` clause with others, there shouldn't be any performance improvement (or regression) from this change. However, semantically, this clause fits better as a `filter`, and in the case of a bug causing a degenerate query with the `boundary.country` query clause as the only one under the `must` section, this would have a big impact.	6 years ago
Julian Simioni	06ba3a79e6	Merge pull request #1211 from pelias/trim-text fix(sanitizer): Trim whitespace in addressit queries	6 years ago
Julian Simioni	7e4559fdc2	fix(sanitizer): Trim whitespace in addressit queries This is a followup PR to https://github.com/pelias/api/pull/1171 and https://github.com/pelias/api/pull/1170. Apparently we have two different `text` sanitizers, and autocomplete queries were treating a single space as valid input. This had a particularly bad outcome as it would end up generating queries (see an [example](https://gist.github.com/orangejulius/2cc26c7eed39311b6eaf1fb0175c13e6)) that had no main query clause. This caused them to match basically every document in the index. Looking at the geocode.earth slowlog, these queries took __8 seconds per shard__.	6 years ago
Julian Simioni	a6fc61b454	Merge pull request #1208 from pelias/greenkeeper/joi-14.0.0 Update joi to the latest version 🚀	6 years ago
greenkeeper[bot]	12bc11ad30	fix(package): update joi to version 14.0.0	6 years ago
Julian Simioni	479a05f01f	Merge pull request #1207 from pelias/handle-out-of-bounds-bbox Improve invalid bounding box value handling	6 years ago
Julian Simioni	40e6054523	chore(geo_common): assert that a specific exception was thrown Without checking the message when asserting an exception is thrown, it's possible for the test to pass when undesired behavior is occuring.	6 years ago
Julian Simioni	16667199cd	feat(geo_common): improve boundary.rect error message In the case where a min lat/lon is larger than a max lat/lon, the error message was a bit confusing as it did not show the actual property name or the values that are causing errors.	6 years ago
Julian Simioni	1553dfb103	fix(geo_common): check for invalid bbox where min=max This condition will cause Elasticsearch to throw an error, we should catch it outselves first. The error is more friendly than the case where min>max, but still an error. Connects https://github.com/pelias/api/pull/1050	6 years ago
Julian Simioni	f1afda469d	Move bbox min/max check to its own function	6 years ago
Julian Simioni	76bc5c654d	fix(geo_common): check bbox parameters are within range If bounding box lat/lon values are outside the correct range, Elasticsearch throws very alarming errors. With a little validation code we can provide more friendly and actionable error messages. Fixes https://github.com/pelias/pelias/issues/750	6 years ago
Julian Simioni	8a9c31e46e	Remove usage of util.format	6 years ago
Julian Simioni	1054a634de	chore(geo_common): refactor bbox min/max validation	6 years ago
Julian Simioni	3c33b98f8a	chore(geo_common): only set bbox coords if it validates	6 years ago
Julian Simioni	75b32123d4	Merge pull request #1204 from pelias/proxyquire-config Use default pelias-config values for type mapping tests	6 years ago
Julian Simioni	ff5c66a269	fix(test): Use default pelias-config for type tests Without this change, a user's own `pelias.json` customizations can cause the unit tests to fail.	6 years ago
Julian Simioni	d5a0b9fc86	Remove unused library	6 years ago
Julian Simioni	c9f89fee3d	Whitespace	6 years ago
Julian Simioni	76d88b62b4	fix(tests): Sanitizer tests should use faked config Otherwise a user's own pelias.json settings can cause tests to fail	6 years ago
Julian Simioni	58e44da18f	feat(labels): Merge pull request #1199 from npatroni/patch-1 Fix label format in Portugal	6 years ago
Nuno Patronilo	6d7726ae42	Parse order on the label field for Portugal Added PRT to the array on var flipNumberAndStreetCountries	6 years ago

1 2 3 4 5 ...

2567 Commits (focus-point-hard-distance-filter) All Branches Search

2567 Commits (focus-point-hard-distance-filter)

All Branches