Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

DB design: 

 CREATE TABLE dedup (
catchment_id varchar,
health_id1 varchar,
health_id2 varchar,
rule_name varchar,
rule_value varchar,
created_at timeuuid,
PRIMARY KEY (catchment_id, health_id1, health_id2, rule_name, rule_value));

Example:

Feed - p1 (create), p2 (create), p3 (create), p1 (updated occupation), p1 (updated nid2), p2 (voided, merged-with p3), h4 (create), p5 (create). pn represents HID.
All have same NID, nid1 but different catchments, c1, c2, c3, c4, c5. Only, h4, h5 have same phone #. c1 - 101112, c2 - 202122, c3 - 303133, so on.
Rules: same nid and same phone # are dedups. Dedups cannot be resolved for just one rule, but for all.

...

#4 Dedup db looks same as in scenario 3. 

Note: If catchment is updated, it will be similar to scenario 3.

Scenario 5: 

Similar to scenario 4. 

...

CatchmentHID1HID2Rule NameRule ValueCreated At
A30B31h3h5nidnid1t..
A30B31C32h3h5nidnid1t..
      
A40B41h4h5phone #ph1t..
A40B41C42h4h5phone #ph1t..
A40B41h4h5nidnid1t..
A40B41C42h4h5nidnid1t..
      
A50B51h5h3nidnid1t..
A50B51C52h5h3nidnid1t..
A50B51h5h4phone #ph1t..
A50B51C52h5h4phone #nid1t..

 

Find duplicates by catchment:

select * from dedup where catchment = 102030.

Then sort the result by timeuuid (time component only) in-memory and return.