Dedup

DB design:

Example:

Feed - p1 (create), p2 (create), p3 (create), p1 (updated occupation), p1 (updated nid2), p2 (voided, merged-with p3), h4 (create), p5 (create). pn represents HID.
All have same NID, nid1 but different catchments, c1, c2, c3, c4, c5. Only, h4, h5 have same phone #. c1 - 101112, c2 - 202122, c3 - 303133, so on.
Rules: same nid and same phone # are dedups. Dedups cannot be resolved for just one rule, but for all.

Scenarios	Action	Event in Feed	Scheduled job runs	Dupes in C1	Dupes in C2	Dupes in C3	Dupes in C4	Dupes in C5
1	Patient created (P1)	created:p1	No dupes found.
2	Patient created (P2)	created:p2	Finds P1 with same NID	P2-P1 (NID)	P2-P1(NID)
3	Patient created (P3)	created:p3	Finds that P3 shares NID with P1 and P2	P2-P1(NID) P3-P1(NID) +	P2-P1(NID) P3-P2(NID) +	P3-P1(NID) + P3-P2(NID) +
4	Patient updated (P1)	updated:p1 (occupation)	All entries related to p1 are deleted and recreated.	P2-P1(NID) P3-P1(NID)	P2-P1(NID) P3-P2(NID)	P3-P1(NID) P3-P2(NID)
5	Patient updated (P1)	updated:p1 (NID)	Now then job runs, no dupes wil be found for P1. But we must also search the dedup tables to remove any pairs with P1 (reason NID match) that were earlier created.	Empty	P3-P2(NID)	P3-P2(NID)
6	Resolution. P2-P3 conflict for all rules. p2 - voided and merged with p3	voided:P2 (merged with p3)	Since P2 does not exist anymore, we must remove all pairs with P2 from all dedup tables.	Empty	Empty	Empty
7		updated:P3	Run deduplication rule anyway. Found no duplicates	Empty	Empty	Empty
8	Patient created (P4)	created: P4	P4 matches P3 in NID. P1's NID has changed in earlier step.			P4-P3(NID) +	P4-P3(NID) +
9	Patient created (P5)	created:P5	P5 is detected for NID duplication, matching with P3, P4. And Also with P4 for Ph. num			P4-P3(NID) P5-P3 (NID) +	P4-P3(NID) P5-P4 (NID+Ph.No) +	P5-P3(NID) + P5-P4 (NID+Ph.No) +
10	P3-P4 resolved for all rules, both are unique.	ignore-duplicate: P3, P4	As soon as approver says these must remain separate, we must go back and take them out of all catchments			P5-P3 (NID)	P5-P4 (NID+Ph.No)	P5-P3(NID) P5-P4 (NID+Ph.No)

Scenario 2 (DB Calls):

#1 Find by nid

#2 Find catchments where hid in (h1, h2)

#3. Dedup DB:

Catchment	HID1	HID2	Rule Name	Rule Value	Created At
A10B11	h1	h2	nid	nid1	t1
A10B11C12	h1	h2	nid	nid1	t2

A20B21	h2	h1	nid	nid1	t3
A20B21C22	h2	h1	nid	nid1	t4

Searches all patients by phone # ph1. No matching patient. No action taken.

Scenario 3:

#1 Find by nid

#2 Find catchments where hid in (h1, h2)

#3. Dedup DB:

Catchment	HID1	HID2	Rule Name	Rule Value	Created At
A10B11	h1	h2	nid	nid1	t1
A10B11C12	h1	h2	nid	nid1	t2
A10B11	h1	h3	nid	nid1	t9
A10B11C12	h1	h3	nid	nid1	t10

A20B21	h2	h1	nid	nid1	t3
A20B21C22	h2	h1	nid	nid1	t4
A20B21	h2	h3	nid	nid1	t11
A20B21C22	h2	h3	nid	nid1	t12

A30B31	h3	h1	nid	nid1	t5
A30B31C32	h3	h1	nid	nid1	t6
A30B31	h3	h2	nid	nid1	t7
A30B31C32	h3	h2	nid	nid1	t8

Scenario 4:

DB: