Peter Johnson
c0a0663e21
feat(dedupe): treat all non-canonical layers and analogous to a venue, prefer non-canonical records
6 years ago
Peter Johnson
14e6c6303f
refactor middleware dedupe for readability
6 years ago
Julian Simioni
800eb8ca03
Move lots of logging from info to debug
...
These are log lines that are not really useful in a production context,
and just create a lot of noise.
6 years ago
missinglink
7a01e794cf
support aliases for name fields
7 years ago
antoine-de
3489517da3
dedup osm/OA address
...
linked to https://github.com/pelias/pelias/issues/541
and maybe to https://github.com/pelias/pelias/issues/541 but I cannot
reproduce it.
and to the fact that there are lots of dupplicates in france like:
[20 rue hector malot
paris](https://mapzen.com/search/explorer/?query=search&text=20%20rue%20hector%20malot%20paris )
that returns 1 result from open address and 1 result from OSM
now we first check which source has a zipcode and if both have, we
prefer OA over OSM
7 years ago
Kevin Ennis
512cec9945
Prefer openaddresses results with zip code
8 years ago
Diana Shkolnikov
a3fa49ee52
remove splice and replace with direct assignment
8 years ago
Diana Shkolnikov
3fe113645f
feat: check for preferred record when dupe found
8 years ago
Diana Shkolnikov
a8e82b018d
Refactor deduper and write additional tests
8 years ago
Vesa Meskanen
73f64ce3e1
Cleanup: tabs -> spaces
9 years ago
Vesa Meskanen
e22b973cdf
Do not consider absence of an additional name as a difference
...
OSM data includes two almost identical 'Keskustori, Tampere' entries.
The second one does not have additional 'name.ru' property. This is
no longer considered as a difference in deduping.
9 years ago
Vesa Meskanen
00a4bd52a3
Bugfix: deduping caused an error if an array property was missing
...
Conversion from array to string should happen independently for
the compared properties, not only when both are arrays.
9 years ago
Vesa Meskanen
6ab44e0aa2
neighborhood -> neighbourhood
9 years ago
Diana Shkolnikov
4ed5296d79
Update middleware with new prop name
9 years ago
Vesa Meskanen
caaee361b2
Take absence of information as a difference, after all
...
Is '5 main street' same as 'main street'? Probably not. It may be that
less detailed data is different data, not just bad data. maybe this can
be changed if coordinates are considered, too.
9 years ago
Vesa Meskanen
0e6bf8ed00
Improve response deduping
...
Consider locality and neighborhood, too. Do not take absence of
an attribute as a difference.
9 years ago
Diana Shkolnikov
e9ceb25ca0
Fix crash when dedupe was comparing arrays as stings for parent properties
9 years ago
Diana Shkolnikov
83de24d3c4
Middleware: remove all use of alpha3/admin0/admin1
9 years ago
Diana Shkolnikov
df93be7543
Fix deduping for non-ascii strings
9 years ago
Diana Shkolnikov
9fa5fc5a77
calcSize became middleware (exposed and fixed bug in query defaults)
9 years ago
Diana Shkolnikov
42d940f8c8
Add simple normalizer (lowercase + remove punctuation)
9 years ago
Diana Shkolnikov
54187dde67
Add dedupe middleware
...
Dedupe middleware removes __exact__ dupes and truncates the results
to the specified size.
9 years ago