DB design:
CREATE TABLE dedup (
catchment_id varchar,
health_id1 varchar,
health_id2 varchar,
rule_name varchar,
rule_value varchar,
created_at timeuuid,
PRIMARY KEY (catchment_id, health_id1, health_id2, rule_name, rule_value));
Example:
Feed - p1 (create), p2 (create), p3 (create), p1 (updated occupation), p1 (updated nid2), p2 (voided, merged-with p3), h4 (create), p5 (create). pn represents HID.
All have same NID, nid1 but different catchments, c1, c2, c3, c4, c5. Only, h4, h5 have same phone #. c1 - 101112, c2 - 202122, c3 - 303133, so on.
Rules: same nid and same phone # are dedups. Dedups cannot be resolved for just one rule, but for all.
...
Catchment | HID1 | HID2 | Rule Name | Rule Value | Created At |
A30B31 | h3 | h5 | nid | nid1 | t.. |
A30B31C32 | h3 | h5 | nid | nid1 | t.. |
A40B41 | h4 | h5 | phone # | ph1 | t.. |
A40B41C42 | h4 | h5 | phone # | ph1 | t.. |
A40B41 | h4 | h5 | nid | nid1 | t.. |
A40B41C42 | h4 | h5 | nid | nid1 | t.. |
A50B51 | h5 | h3 | nid | nid1 | t.. |
A50B51C52 | h5 | h3 | nid | nid1 | t.. |
A50B51 | h5 | h4 | phone # | ph1 | t.. |
A50B51C52 | h5 | h4 | phone # | nid1 | t.. |
Find duplicates by catchment:
select * from dedup where catchment = 102030.
Then sort the result by timeuuid (time component only) in-memory and return.