Dedup
DB design:
CREATE TABLE dedup (
catchment_id varchar,
health_id1 varchar,
health_id2 varchar,
rule_name varchar,
rule_value varchar,
created_at timeuuid,
PRIMARY KEY (catchment_id, health_id1, health_id2, rule_name, rule_value));
Example:
Feed - p1 (create), p2 (create), p3 (create), p1 (updated occupation), p1 (updated nid2), p2 (voided, merged-with p3), h4 (create), p5 (create). pn represents HID.
All have same NID, nid1 but different catchments, c1, c2, c3, c4, c5. Only, h4, h5 have same phone #. c1 - 101112, c2 - 202122, c3 - 303133, so on.
Rules: same nid and same phone # are dedups. Dedups cannot be resolved for just one rule, but for all.
Scenarios | Action | Event in Feed | Scheduled job runs | Dupes in C1 | Dupes in C2 | Dupes in C3 | Dupes in C4 | Dupes in C5 |
1 | Patient created (P1) | created:p1 | No dupes found. |
|
|
|
|
|
2 | Patient created (P2) | created:p2 | Finds P1 with same NID | P2-P1 (NID) | P2-P1(NID) |
|
|
|
3 | Patient created (P3) | created:p3 | Finds that P3 shares NID with P1 and P2 | P2-P1(NID) | P2-P1(NID) | P3-P1(NID) + |
|
|
4 | Patient updated (P1) | updated:p1 (occupation) | All entries related to p1 are deleted and recreated. | P2-P1(NID) | P2-P1(NID) | P3-P1(NID) |
|
|
5 | Patient updated (P1) | updated:p1 (NID) | All entries related to p1 are deleted and recreated. | Empty | P3-P2(NID) | P3-P2(NID) |
|
|
6 | Resolution. P2-P3 conflict for all rules. | voided:P2 (merged with p3) | Since P2 does not exist anymore, we must remove all pairs with P2 from all dedup tables. | Empty | Empty | Empty |
|
|
7 |
| updated:P3 | Run deduplication rule anyway. Found no duplicates | Empty | Empty | Empty |
|
|
8 | Patient created (P4) | created: P4 | P4 matches P3 in NID. (P1's NID has changed in earlier step.) |
|
| P4-P3(NID) + |
|
|
9 | Patient created (P5) | created:P5 | P5 is detected for NID duplication, matching with P3, P4. And Also with P4 for Ph. num |
|
| P4-P3(NID) | P4-P3(NID) | P5-P3(NID) + |
10 | P3-P4 resolved for all rules, | ignore-duplicate: P3, P4 | As soon as approver says these must remain separate, we must go back and take them out of all catchments |
|
| P5-P3 (NID) | P5-P4 (NID+Ph.No) | P5-P3(NID) |
Scenario 2 (DB Calls):
#1 select * from nid_mapping where nid = nid1
#2 select catchment from patient where hid in (h1, h2)
#3. Dedup DB:
Catchment | HID1 | HID2 | Rule Name | Rule Value | Created At |
A10B11 | h1 | h2 | nid | nid1 | t1 |
A10B11C12 | h1 | h2 | nid | nid1 | t2 |
|
|
|
|
|
|
A20B21 | h2 | h1 | nid | nid1 | t3 |
A20B21C22 | h2 | h1 | nid | nid1 | t4 |
Searches all patients by phone # ph2. No matching patient. No action taken.
Scenario 3:
#1 select * from nid_mapping where nid = nid1
#2 select catchment from patient where hid in (h1, h2, h3)
#3. Dedup DB:
Catchment | HID1 | HID2 | Rule Name | Rule Value | Created At |
A10B11 | h1 | h2 | nid | nid1 | t1 |
A10B11C12 | h1 | h2 | nid | nid1 | t2 |
A10B11 | h1 | h3 | nid | nid1 | t9 |
A10B11C12 | h1 | h3 | nid | nid1 | t10 |
|
|
|
|
|
|
A20B21 | h2 | h1 | nid | nid1 | t3 |
A20B21C22 | h2 | h1 | nid | nid1 | t4 |
A20B21 | h2 | h3 | nid | nid1 | t11 |
A20B21C22 | h2 | h3 | nid | nid1 | t12 |
|
|
|
|
|
|
A30B31 | h3 | h1 | nid | nid1 | t5 |
A30B31C32 | h3 | h1 | nid | nid1 | t6 |
A30B31 | h3 | h2 | nid | nid1 | t7 |
A30B31C32 | h3 | h2 | nid | nid1 | t8 |
Searches all patients by phone # ph3. No matching patient. No action taken.
Scenario 4:
#1 select hid2 from dedup where catchment in (1011, 101112) and hid1 = h1.
#2 select catchment from patient where hid = .. (for each hid returned in #1)
#3
(Batch begin)
delete from dedup where catchment_id in (1011, 101112) and hid1=h1
delete from dedup where catchment_id in (...) and hid1=.. and hid2=h1 (repeated as many times required)
repeat scenario 2.
(Batch ends)
#4 Dedup db looks same as in scenario 3.
Note: If catchment is updated, it will be similar to scenario 3.
Scenario 5:
Similar to scenario 4.
Dedup db:
Catchment | HID1 | HID2 | Rule Name | Rule Value | Created At |
A20B21 | h2 | h3 | nid | nid1 | t11 |
A20B21C22 | h2 | h3 | nid | nid1 | t12 |
|
|
|
|
|
|
A30B31 | h3 | h2 | nid | nid1 | t7 |
A30B31C32 | h3 | h2 | nid | nid1 | t8 |
Scenario 6:
Similar to scenario 4. Dedup table becomes empty.
Scenario 8:
Similar to scenario 2.
Dedup table:
Catchment | HID1 | HID2 | Rule Name | Rule Value | Created At |
A30B31 | h3 | h4 | nid | nid1 | t.. |
A30B31C32 | h3 | h4 | nid | nid1 | t.. |
|