Implement code changes to enhance functionality and improve performance

This commit is contained in:
Ireneusz Bachanowicz 2025-07-28 02:10:18 +02:00
parent c81ce7cc57
commit 0b9c53b7a3
5 changed files with 2667 additions and 1 deletions

5
bs.py
View File

@ -1,4 +1,5 @@
import sys
import re
import json
from markdownify import markdownify as md
@ -15,7 +16,9 @@ def convert_html_to_markdown(input_file, output_file):
html_content = item.get("content", "")
# Convert HTML content to Markdown
markdown_content = md(html_content)
markdown_content = md(html_content, strip=['a'])
# Remove unwanted image links (e.g., ![](...) )
markdown_content = re.sub(r'!\[.*?\]\(.*?\)', '', markdown_content)
# Prepend other attributes
markdown_output.append(f"# {title}\n\n")

89
data/DCR_mappings_ALL.md Normal file
View File

@ -0,0 +1,89 @@
The `/dcr` API serves as a canonical interface for initiating data change requests. The JSON payload sent to this endpoint contains a standardized representation of the desired changes. The MDM HUB is then responsible for validating this request and mapping its attributes to the distinct models of the target systems.
***
### **`/dcr` API to Target System Attribute Mapping**
The following sections break down the mapping from the attributes in the `/dcr` request payload to their corresponding fields in OneKey and Veeva.
---
#### **1. Request-Level Attributes**
These attributes are defined at the top level of each object within the `DCRRequests` array in the JSON payload.
| HUB /dcr Attribute | OneKey Mapping | Veeva OpenData (VOD) Mapping | Notes / Description |
| :--- | :--- | :--- | :--- |
| **`extDCRRequestId`** | Used to populate the `DCRID` in the Reltio DCR tracking entity. OneKey's `validation.clientRequestId` is typically a HUB-generated internal ID, but `extDCRRequestId` is the primary key for client-side tracking. | **`dcr_key`** (in all CSV files: `change_request.csv`, `change_request_hco.csv`, etc.) | **This is the primary external identifier for the entire DCR.** It is crucial for clients like PforceRx to track the request's status and is used as the main correlation ID across all systems and files. |
| **`extDCRComment`** | **`validation.requestComment`** | **`description`** (in `change_request.csv`) | A free-text field for the requester to provide context or justification for the DCR. For OneKey, it has a special function: due to API limitations, requests to **remove** an attribute are specified in this comment field (e.g., "Please remove attributes: [Address: ...]"). |
| **`country`** | **`isoCod2`** | **`primary_country__v`** (in `change_request_hcp.csv` and `change_request_hco.csv`) and **`country__v`** (in `change_request_address.csv`) | The mandatory two-letter ISO country code. This is a critical routing attribute that determines which validator instance to use and, for Veeva, which S3/SFTP directory the DCR files are placed in. |
| **`action`** (within HCP/HCO object) | This determines the logic. `update` and `insert` map to a `submitVR` call. An `update` requires a valid `individual.individualEid` or `workplace.workplaceEid`. `delete` is handled by updating the entity's end date in Reltio. | **`change_request_type`** (in `change_request.csv`). Mapped to **`ADD_REQUEST`** for an `insert` action, and **`CHANGE_REQUEST`** for an `update` action. | This defines the fundamental operation being performed on the entity. |
| **`refId`** (within HCP/HCO object) | Used to query Reltio to find the OneKey crosswalk (`individual.individualEid` or `workplace.workplaceEid`) which is mandatory for update operations. | Used to query Reltio to find the Veeva crosswalk (`vid__v`), which is mandatory for update operations. The Reltio URI from the `refId` is also used to populate the **`entity_key`** field in the VOD CSVs. | This object contains the necessary identifiers (`CrosswalkTargetObjectId`, `EntityURITargetObjectId`, etc.) to locate the target entity in Reltio for an update or delete operation. |
---
#### **2. Healthcare Professional (HCP) Attributes**
These attributes are provided within the `HCP` object of a DCR request.
| HUB /dcr Attribute | OneKey Mapping | Veeva OpenData (VOD) Mapping | Notes / Description |
| :--- | :--- | :--- | :--- |
| **`firstName`** | `individual.firstName` | `first_name__v` | HCP's first name. |
| **`lastName`** | `individual.lastName` | `last_name__v` | HCP's last name. Mandatory for creating a new HCP in OneKey. |
| **`middleName`** | `individual.middleName` | `middle_name__v` | HCP's middle name. |
| **`prefix`** | `individual.prefixNameCode` | `prefix__v` | Name prefix (e.g., Mr., Dr.). Requires a lookup from the canonical code (`HCPPrefix`) to the target system's specific code. |
| **`title`** | `individual.titleCode` | `professional_title__v` | Professional title. Requires a lookup from the canonical code (`HCPTitle` or `HCPProfessionalTitle`) to the target system's specific code. |
| **`gender`** | `individual.genderCode` | `gender__v` | HCP's gender. Requires a lookup to map the canonical value (e.g., M, F) to the target system's code. |
| **`subTypeCode`** | `individual.typeCode` | `hcp_type__v` | The professional subtype of the HCP (e.g., Physician, Nurse). Requires a lookup from the canonical code (`HCPSubTypeCode` or `HCPType`). |
| **`specialties`** (List) | `individual.speciality1`, `individual.speciality2`, `individual.speciality3` | `specialty_1__v` through `specialty_10__v` | The list of specialties is **flattened**. OneKey accepts up to 3 ranked specialties. Veeva accepts up to 10. A lookup is required to map the canonical `HCPSpecialty` code to the target system's value. |
| **`emails`** (List) | `individual.email` | `email_1__v`, `email_2__v` | The list of emails is flattened. OneKey typically takes the highest-ranked email. Veeva takes the top two. |
| **`phones`** (List) | `individual.mobilePhone` | `phone_1__v` to `phone_3__v` (for office), `fax_1__v` to `fax_2__v` (for fax) | The list is filtered by type and ranked. OneKey maps the highest-ranked to `mobilePhone`. Veeva separates numbers into distinct `phone` and `fax` columns based on the Reltio phone type. |
---
#### **3. Healthcare Organization (HCO) Attributes**
These attributes are provided within the `HCO` object of a DCR request.
| HUB /dcr Attribute | OneKey Mapping | Veeva OpenData (VOD) Mapping | Notes / Description |
| :--- | :--- | :--- | :--- |
| **`name`** | `workplace.usualName` / `workplace.officialName` | `corporate_name__v` | The primary, official name of the organization. |
| **`otherNames`** (List) | `workplace.usualName2` | `alternate_name_1__v` | The list of alternative names is flattened. Both systems typically take the first or highest-ranked alternative name. |
| **`subTypeCode`** | `workplace.typeCode` | `major_class_of_trade__v` | The HCO's subtype, often representing facility type. Requires a lookup from the canonical code (`COTFacilityType`). |
| **`typeCode`** | Not Mapped | `hco_type__v` | Maps to the HCO Type. Requires a lookup (`HCOType`). The OneKey mapping document indicates this is not used for their system. |
| **`websiteURL`** | `workplace.website` | `URL_1__v` | The official website of the organization. |
| **`specialties`** (List) | `workplace.speciality1` to `speciality3` | `specialty_1__v` to `specialty_10__v` | Similar to HCPs, the list of specialties is flattened and ranked. A lookup from the canonical `COTSpecialty` code is required. |
| **`emails`** (List) | `workplace.email` | `email_1__v`, `email_2__v` | List of emails is flattened, with the highest-ranked ones being used. |
| **`phones`** (List) | `workplace.telephone`, `workplace.fax` | `phone_1__v` to `phone_3__v`, `fax_1__v` to `fax_2__v` | Similar to HCPs, the list is filtered by type (`TEL.OFFICE` vs. `TEL.FAX`) and ranked before mapping. |
---
#### **4. Nested Object Mapping: Addresses**
Address information is provided as a list of objects within the `HCP` or `HCO` DCR payload.
| HUB /dcr `addresses` Attribute | OneKey Mapping | Veeva OpenData (VOD) Mapping | Notes / Description |
| :--- | :--- | :--- | :--- |
| **(Address Object)** | Mapped to a single `address` complex object in the JSON request. Only the primary address is sent. | Each address object is mapped to a **separate row** in the **`change_request_address.csv`** file. | OneKey's API takes a single address per DCR, while Veeva's file-based approach can handle multiple address changes in one DCR. |
| **`refId`** | Used to match and update an existing address. | **`address_key`** | This is the `PfizerAddressID`, a unique identifier for the address record in Reltio. |
| **`addressLine1`** | `address.longLabel` or `address.addressLine1` | `address_line_1__v` | The first line of the street address. |
| **`addressLine2`** | `address.longLabel2` or `address.addressLine2` | `address_line_2__v` | The second line of the street address. |
| **`city`** | `address.city` | `locality__v` | The city name. |
| **`stateProvince`** | `address.countyCode` | `administrative_area__v` | The state or province. Requires a lookup from the canonical value to the target system's code. |
| **`zip`** | `address.longPostalCode` or `address.Zip5` | `postal_code__v` | The postal or ZIP code. |
| **`addressType`** | Not explicitly mapped in `submitVR` request, but used in logic. | `address_type__v` | The type of address (e.g., Office, Home). Requires a lookup (`AddressType`). |
---
#### **5. Nested Object Mapping: Affiliations**
Affiliations are provided as a list of objects (`contactAffiliations` for HCP, `otherHCOAffiliations` for HCO) in the DCR payload.
| HUB /dcr `affiliations` Attribute | OneKey Mapping | Veeva OpenData (VOD) Mapping | Notes / Description |
| :--- | :--- | :--- | :--- |
| **(Affiliation Object)** | The `refId` of the target HCO is used to find its `workplace.workplaceEid`, which is then sent as part of the HCP or parent HCO record. | Each affiliation object is mapped to a **separate row** in the **`change_request_parenthco.csv`** file. | The mapping for affiliations is fundamentally different. OneKey handles it as an attribute of the child entity, while Veeva treats it as a distinct relationship record. |
| **Relation `refId`** | The `workplace.workplaceEid` of the target (parent) HCO is populated in the request. | **`parenthco_key`** (populated with the Reltio `relationUri`) | Veeva requires a unique key for the relationship itself. |
| **Start/Child Object** | The main body of the request contains the child's details (e.g., the HCP). | **`child_entity_key`** | This field is populated with the `entity_key` of the child entity (e.g., the HCP). |
| **End/Parent Object** | The `workplace.workplaceEid` of the parent HCO is sent. | **`parent_entity_key`** | This field is populated with the `entity_key` of the parent entity (e.g., the HCO). |
| **`type`** | `activity.role` (requires lookup, e.g., `TIH.W.*`) | **`relationship_type__v`** (requires lookup from `RelationType`) | The type or role of the affiliation (e.g., Employed, Manages). |
| **`primary`** | Not explicitly mapped. | **`is_primary_relationship__v`** | A boolean flag (`Y`/`N`) indicating if this is the primary affiliation. |

156
data/DCR_mappings_OK_VOD.md Normal file
View File

@ -0,0 +1,156 @@
***
### **Attribute Mapping to IQVIA OneKey**
The following mappings are used when creating or updating entities in OneKey via the `submitVR` operation. The data is compiled from the direct mapping table and the comparator rules for Data Steward-suggested changes.
#### **1. Common Validation Attributes (HCP & HCO)**
These attributes are part of the `validation` block in the OneKey request and are mandatory for all DCRs.
| OneKey Attribute | Value/Source | Description |
| :--- | :--- | :--- |
| `validation.clientRequestId` | HUB\_GENERATED\_ID | A unique identifier generated by the MDM HUB to trace the request. |
| `validation.process` | "Q" | A static value indicating the processing type. |
| `validation.requestDate` | Current Timestamp | The date the DCR was created. |
| `validation.callDate` | Current Timestamp | The date the API call was made. |
| `validation.requestProcess` | "I" | A static value indicating an "Insert" or "Update" process. |
| `validation.requestComment` | `extDCRComment` | A free-text comment provided in the initial DCR, often used for context or to specify removals. |
| `isoCod2` | `Country` from DCR | The two-letter ISO country code, which is mandatory for all requests. |
---
#### **2. Healthcare Organization (HCO) Attribute Mapping**
| Reltio / HUB Attribute | OneKey Attribute | Mandatory | Description & Notes |
| :--- | :--- | :--- | :--- |
| **Entity Type** | `entityType` | **Yes** | Static value set to **`WORKPLACE`** to identify the entity as an HCO. |
| **OneKey ID** | `workplace.workplaceEid` | **Yes (for updates)** | The unique OneKey identifier for the HCO. It is sourced from the `ONEKEY` crosswalk on the Reltio entity. This is mandatory for any update operation. |
| **Name** | `workplace.usualName` / `workplace.officialName` | Optional | The primary name of the HCO. Both `usualName` and `officialName` are typically populated with the same value from the Reltio `Name` attribute. |
| **Other Names** | `workplace.usualName2` | Optional | Sourced from the `OtherNames.Name` nested attribute in Reltio. |
| **Type Code** | `workplace.typeCode` | Optional | Maps the HCO's `subTypeCode` from the HUB model. This requires a lookup from a canonical code (e.g., `COTFacilityType`, `TET.W.*`) to the OneKey specific code. |
| **Website** | `workplace.website` | Optional | The official website URL of the HCO, sourced from `WebsiteURL`. |
| **Parent HCO Affiliation** | `workplace.parentWorkplaceEid` | Optional | The OneKey ID of the parent HCO. This is sourced by looking up the `otherHCOAffiliations` relation in Reltio and finding the OneKey crosswalk of the referenced parent entity. |
| **Addresses** | `address.*` fields | **Yes** | A complex object representing the HCO's primary address. It is mandatory for creating a new HCO. <br> - `address.country` <br> - `address.city` <br> - `address.addressLine1` (from `addressLine1`) <br> - `address.addressLine2` (from `addressLine2`) <br> - `address.Zip5` (from `zip`) <br> - `address.countyCode` (from `stateProvince`, requires lookup) |
| **Specialties** | `workplace.speciality1` / `2` / `3` | Optional | Reltio's list of `Specialities` is flattened into three separate fields in the OneKey request, ranked by priority. |
| **Phone (Telephone)** | `workplace.telephone` | Optional | The primary telephone number. Sourced from the `Phone` list in Reltio where the type is **not** FAX. The number with the highest rank is chosen. |
| **Phone (Fax)** | `workplace.fax` | Optional | The primary fax number. Sourced from the `Phone` list where the type **is** FAX. The number with the highest rank is chosen. |
| **Email** | `workplace.email` | Optional | The primary email address of the HCO. The email with the highest rank from Reltio's `Email` list is used. |
---
#### **3. Healthcare Professional (HCP) Attribute Mapping**
| Reltio / HUB Attribute | OneKey Attribute | Mandatory | Description & Notes |
| :--- | :--- | :--- | :--- |
| **Entity Type** | `entityType` | **Yes** | Static value set to **`ACTIVITY`** to identify the entity as an HCP. |
| **OneKey ID** | `individual.individualEid` | **Yes (for updates)** | The unique OneKey identifier for the HCP, sourced from the `ONEKEY` crosswalk on the Reltio entity. Mandatory for updates. |
| **Last Name** | `individual.lastName` | **Yes** | The HCP's last name. This is a mandatory field for creating an HCP. |
| **First Name** | `individual.firstName` | Optional | The HCP's first name. |
| **Middle Name** | `individual.middleName` | Optional | The HCP's middle name. |
| **Type Code** | `individual.typeCode` | Optional | The HCP's subtype (e.g., Physician, Nurse). This is sourced from `subTypeCode` in the HUB model and requires a lookup (`HCPSubTypeCode`, `TYP..*`). |
| **Title** | `individual.titleCode` | Optional | The HCP's professional title (e.g., Dr.). Requires lookup (`HCPTitle`, `TIT.*`). |
| **Prefix** | `individual.prefixNameCode` | Optional | The HCP's name prefix (e.g., Mr., Mrs.). Requires lookup (`HCPPrefix`, `APP.*`). |
| **Gender** | `individual.genderCode` | Optional | The HCP's gender. Requires a lookup to map the Reltio value to the OneKey code. |
| **Year of Birth** | `individual.birthYear` | Optional | The four-digit year of birth of the HCP. |
| **Day of Birth** | `individual.birthDay` | Optional | The day of birth of the HCP. |
| **Language** | `individual.languageEid` | Optional | The preferred language of the HCP. |
| **Website** | `individual.website` | Optional | The personal or professional website of the HCP. |
| **Identifiers** | `individual.externalId1` / `externalId2` | Optional | Mapped from Reltio's `Identifier` values. |
| **Affiliation** | `workplace.workplaceEid` | Optional | The OneKey ID of the HCO to which the HCP is affiliated. This is sourced from the `contactAffiliations` relation in Reltio by finding the OneKey crosswalk of the referenced HCO. |
| **Addresses** | `address.*` fields | **Yes** | A complex object representing the HCP's primary address (often a workplace address). See HCO section for field details. |
| **Specialties** | `individual.speciality1` / `2` / `3` | Optional | Reltio's list of `Specialities` is flattened into three separate fields, ranked by priority. A lookup is required to map the canonical code (`HCPSpecialty`, `SP.W.*`) to the OneKey code. |
| **Phone (Mobile)** | `individual.mobilePhone` | Optional | The primary mobile phone number, sourced from the `Phone` list in Reltio. |
| **Email** | `individual.email` | Optional | The primary email address of the HCP. |
***
### **Attribute Mapping to Veeva OpenData (VOD)**
DCRs for Veeva are processed asynchronously. The HUB maps the DCR data into a series of CSV file lines, which are then zipped and sent to Veeva via S3/SFTP. The mapping connects Reltio/HUB concepts to specific columns in different Veeva CSV files.
#### **1. `change_request.csv` - Manifest File**
This file contains metadata for each DCR. A single line is created for each DCR.
| Veeva Field Name | Source / Logic | Required | Description |
| :--- | :--- | :--- | :--- |
| `dcr_key` | Mongo Generated DCR ID | **Yes** | A unique ID generated by the HUB for the DCR. This key is used to link all related records across the different CSV files. |
| `description` | `extDCRComment` | **Yes** | Free-text comments from the requester explaining the purpose of the DCR. |
| `created_by` | `createdBy` from DCR | **Yes** | Identifies the user or system that initiated the DCR. |
| `change_request_type` | Inferred | **Yes** | Set to **`ADD_REQUEST`** if a new profile is being created (no Veeva crosswalk exists) or **`CHANGE_REQUEST`** if an existing profile is being updated. |
| `entity_type` | Main DCR object type | **Yes** | Set to **`HCP`** or **`HCO`** depending on the primary subject of the DCR. |
#### **2. `change_request_hco.csv` - HCO Details**
This file contains the specific attribute changes for an HCO.
| Veeva Field Name | Reltio / HUB Attribute | Required (Add) | Description & Notes |
| :--- | :--- | :--- | :--- |
| `dcr_key` | Mongo Generated DCR ID | **Yes** | Links this record back to the manifest file. |
| `entity_key` | `refId.entityURI` or Reltio Crosswalk | **Yes** | The unique identifier for the HCO in the source system (Reltio). It is a concatenation of the source name and value (e.g., `Reltio:rvu44dm`). |
| `vid__v` | VEEVA Crosswalk Value | No (Yes for Change) | The unique Veeva ID for the HCO. This is required for `CHANGE_REQUEST` types to identify which Veeva profile to update. |
| `corporate_name__v` | `Name` | No | The official name of the HCO. |
| `alternate_name_1__v` | `OtherNames.Name` (first element) | Yes | An alternative name for the HCO. |
| `major_class_of_trade__v` | `HCO.subTypeCode` | No | Maps to Reltio's `FacilityType`. This requires a lookup from the `COTFacilityType` canonical code to the Veeva source code. |
| `hco_type__v` | `typecode` | No | The type of the HCO. Requires lookup from `HCOType`. |
| `hco_status__v` | `StatusDetail` | No | The operational status of the HCO. Requires lookup from `HCOStatus`. |
| `specialty_1__v` to `specialty_10__v` | `specialties` | No | Reltio's list of HCO specialties is flattened into 10 separate columns, ranked by priority. |
| `email_1__v`, `email_2__v` | `emails` | No | The top two ranked email addresses from the Reltio `Email` list. |
| `phone_1__v`, `phone_2__v`, `phone_3__v` | `phones` | No | Top-ranked phone numbers where type is `TEL.OFFICE`. |
| `fax_1__v`, `fax_2__v` | `phones` | No | Top-ranked fax numbers where type is `TEL.FAX`. |
| `primary_country__v` | `DCRRequest.country` | No | The two-letter ISO country code. |
#### **3. `change_request_hcp.csv` - HCP Details**
This file contains the specific attribute changes for an HCP.
| Veeva Field Name | Reltio / HUB Attribute | Required (Add) | Description & Notes |
| :--- | :--- | :--- | :--- |
| `dcr_key` | Mongo Generated DCR ID | **Yes** | Links this record back to the manifest file. |
| `entity_key` | `refId.entityURI` or Reltio Crosswalk | **Yes** | The unique identifier for the HCP in Reltio. |
| `vid__v` | VEEVA Crosswalk Value | No (Yes for Change) | The unique Veeva ID for the HCP. Required for `CHANGE_REQUEST` types. |
| `first_name__v` | `firstName` | **Yes** | The HCP's first name. |
| `last_name__v` | `lastName` | **Yes** | The HCP's last name. |
| `middle_name__v` | `middleName` | No | The HCP's middle name. |
| `prefix__v` | `prefix` | No | Name prefix. Requires lookup (`HCPPrefix`). |
| `professional_title__v` | `title` | No | Professional title. Requires lookup (`HCPProfessionalTitle`). |
| `hcp_type__v` | `subTypeCode` | **Yes** | HCP subtype. Requires lookup (`HCPType`). |
| `hcp_status__v` | `StatusDetail` | No | Operational status of the HCP. Requires lookup (`HCPStatus`). |
| `gender__v` | `gender` | No | HCP's gender. Requires lookup (`HCPGender`). |
| `specialty_1__v` to `specialty_10__v` | `specialties` | **Yes (at least 1)** | Reltio's list of HCP specialties is flattened into 10 columns. At least one specialty is required for an `ADD_REQUEST`. |
| `medical_degree_1__v`, `medical_degree_2__v` | Not specified | No | HCP's medical degrees. Requires lookup (`HCPMedicalDegree`). |
| `primary_country__v` | `DCRRequest.country` | **Yes** | The two-letter ISO country code. |
#### **4. `change_request_address.csv` - Address Details**
This file contains address information, linked to either an HCP or an HCO.
| Veeva Field Name | Reltio / HUB Attribute | Required (Add) | Description & Notes |
| :--- | :--- | :--- | :--- |
| `dcr_key` | Mongo Generated DCR ID | **Yes** | Links this record back to the manifest file. |
| `entity_key` | `refId.entityURI` of HCP/HCO | **Yes** | The Reltio identifier of the parent HCP or HCO to which this address belongs. |
| `address_key` | `address.refId` (`PfizerAddressID`) | **Yes** | A unique identifier for the address itself within Reltio. |
| `address_line_1__v` | `addressLine1` | **Yes** | The first line of the street address. |
| `address_line_2__v` | `addressLine2` | No | The second line of the street address. |
| `locality__v` | `city` | **Yes** | The city. |
| `administrative_area__v` | `stateProvince` | **Yes** | The state or province. Requires lookup (`AddressAdminArea`). |
| `postal_code__v` | `zip` | **Yes** | The postal code. |
| `country__v` | `country` | **Yes** | The two-letter ISO country code. |
| `address_type__v` | `addressType` | **Yes** | The type of address (e.g., Office, Home). Requires lookup (`AddressType`). |
| `address_status__v` | Static "A" | No | The status of the address, typically set to "A" for Active. |
| `vid__v` | VEEVA `SourceAddressID` | No (Yes for Change) | The unique Veeva ID for an existing address being updated. |
#### **5. `change_request_parenthco.csv` - Affiliation/Relation Details**
This file is used to define or update relationships between entities (HCP-HCO or HCO-HCO).
| Veeva Field Name | Reltio / HUB Attribute | Required (Add) | Description & Notes |
| :--- | :--- | :--- | :--- |
| `dcr_key` | Mongo Generated DCR ID | **Yes** | Links this record back to the manifest file. |
| `parenthco_key` | Relation URI (`relationUri`) | **Yes** | The unique identifier for the relationship object in Reltio. |
| `child_entity_key` | Start Object's `entity_key` | **Yes** | The `entity_key` of the child entity in the relationship (e.g., the HCP). |
| `parent_entity_key` | End Object's `entity_key` | **Yes** | The `entity_key` of the parent entity in the relationship (e.g., the HCO). |
| `relationship_type__v` | Affiliation `type` | **Yes** | The type of relationship. This requires a lookup from the `RelationType` canonical values. |
| `is_primary_relationship__v` | Affiliation `primary` flag | No | A boolean flag indicating if this is the primary affiliation. |
| `vid__v` | VEEVA Relation ID | No (Yes for Change) | The unique Veeva ID for an existing relationship being updated. |

View File

@ -0,0 +1,191 @@
### **1. Overall DCR Process Overview**
* **Purpose of DCR Process:**
The Data Change Request (DCR) process is designed to improve and validate the quality of existing data in source systems. It provides a formal mechanism for proposing, validating, and applying data changes.
* **General DCR Process Flow:**
1. A source system creates a proposal for a data change, known as a DCR or Validation Request (VR).
2. The MDM HUB routes this request to the appropriate validation channel.
3. Validation is performed either by internal Data Stewards (DS) within Reltio or by external, third-party validator services like OneKey or Veeva OpenData.
4. A response is sent back, which includes metadata about the DCR's status (e.g., accepted, rejected) and the actual data profile update (payload) resulting from the processed DCR.
* **High-Level Solution Architecture:**
The architecture involves source systems initiating DCRs through the **MDM HUB**. The HUB acts as a central router, directing requests to either **Reltio** for internal data steward review or to **Third-Party Validators** (OneKey, Veeva). The HUB is also responsible for receiving responses and facilitating the update of data in Reltio, often through ETL processes that handle payload delivery (e.g., via S3).
***
### **2. System-Specific DCR Implementations**
Here are the details for each system involved in the DCR process.
---
#### **OneKey (OK)**
* **Role in DCR Process:** Functions as a third-party validator for DCRs. It receives validation requests, processes them, and returns the results.
* **Key Components:** DCR Service 2, OK DCR Service, OneKey Adapter, Publisher, Hub Store (Mongo DB), Manager (Reltio Adapter).
* **Actors Involved:** PforceRX, Data Stewards (in Reltio), Reltio, HUB, OneKey.
* **Core Process Details:**
* **PforceRX-initiated:** DCRs are created via the HUB's API. The HUB integrates with OneKey's API to submit requests (`/vr/submit`) and periodically checks for status updates (`/vr/trace`).
* **Reltio-initiated:** Data Stewards can suggest changes in Reltio and use the "Send to Third Party Validation" feature, which triggers a flow to submit a validation request to OneKey. Singleton entities created in Reltio can also trigger an automatic validation request to OneKey.
* **Integration Methods:**
* **API:** Real-time integration with OneKey via REST APIs (`/vr/submit`, `/vr/trace`).
* **File Transfer:** Data profile updates (payload) are delivered back to Reltio via CSV files on an S3 bucket, which are then processed by an ETL job.
* **DCR Types Handled:** Create, update, delete operations for HCP/HCO profiles; validation of newly created singleton entities; changes suggested by Data Stewards.
---
#### **Veeva OpenData (VOD)**
* **Role in DCR Process:** Functions as a third-party validator, primarily handling DCRs initiated by Data Stewards from within Reltio.
* **Key Components:** DCR Service 2, Veeva DCR Service, Veeva Adapter, GMTF (Global Master Template & Foundation) jobs.
* **Actors Involved:** Data Stewards (in Reltio), HUB, Veeva OpenData.
* **Core Process Details:**
1. Data Stewards in Reltio create DCRs using the "Suggest / Send to 3rd Party Validation" functionality.
2. The HUB stores these requests in a Mongo collection (`DCRRegistryVeeva`).
3. A scheduled job gathers these DCRs, packages them into ZIP files containing multiple CSVs, and places them on an S3 bucket.
4. Files are synchronized from S3 to Veeva's SFTP server in batches (typically every 24 hours).
5. Veeva processes the files and returns response files to an inbound S3 directory, which the HUB traces to update DCR statuses.
* **Integration Methods:**
* **File Transfer:** Asynchronous, batch-based communication via ZIP/CSV files exchanged through S3 and SFTP.
* **DCR Types Handled:** Primarily handles changes suggested by Data Stewards for existing profiles that need external validation from Veeva.
---
#### **IQVIA Highlander (HL) - *Decommissioned April 2025***
* **Role in DCR Process:** Acted as a wrapper to translate DCRs from a Veeva format into a format that could be loaded into Reltio for Data Steward review.
* **Key Components:** DCR Service (first version), IQVIA DCR Wrapper.
* **Actors Involved:** Veeva (on behalf of PforceRX), Reltio, HUB, IQVIA wrapper, Data Stewards.
* **Core Process Details:**
1. Veeva uploaded DCR requests as CSV files to an FTP location.
2. The HUB translated the Veeva CSV format into the IQVIA wrapper's CSV format.
3. The IQVIA wrapper processed this file and created DCRs directly in Reltio.
4. Data Stewards would then review, approve, or reject these DCRs within Reltio.
* **Integration Methods:**
* **File Transfer:** Communication was entirely file-based via S3 and SFTP.
* **DCR Types Handled:** Aggregated 21 specific use cases into six generic types: `NEW_HCP_GENERIC`, `UPDATE_HCP_GENERIC`, `DELETE_HCP_GENERIC`, `NEW_HCO_GENERIC`, `UPDATE_HCO_GENERIC`, `DELETE_HCO_GENERIC`.
***
### **3. Key DCR Operations and Workflows**
---
#### **Create DCR**
* **Description:** This is the main entry point for clients like PforceRx to create DCRs. The process validates the request, routes it to the correct target system (Reltio, OneKey, or Veeva), and creates the DCR.
* **Triggers:** An API call to `POST /dcr`.
* **Detailed Steps:**
1. The DCR service receives and validates the request (e.g., checks for duplicate IDs, existence of referenced objects for updates).
2. It uses a decision table to determine the target system based on attributes like country, source, and operation type.
3. It calls the appropriate internal method to create the DCR in the target system (Reltio, OneKey, or Veeva).
4. A corresponding DCR tracking entity is created in Reltio, and the state is saved in the Mongo DCR Registry.
5. For Reltio-targeted DCRs, a workflow is initiated for Data Steward review.
6. Pre-close logic may be applied to auto-accept or auto-reject the DCR based on the country.
* **Decision Logic/Rules:** A configurable decision table routes DCRs based on `userName`, `sourceName`, `country`, `operationType`, `affectedAttributes`, and `affectedObjects`.
---
#### **Submit Validation Request**
* **Description:** This process submits validation requests for newly created "singleton" entities in Reltio to the OneKey service.
* **Triggers:** Reltio events (e.g., `HCP_CREATED`, `HCO_CREATED`) are aggregated in a time window (e.g., 4 hours).
* **Detailed Steps:**
1. After an event aggregation window closes, the process performs several checks (e.g., entity is active, no existing OneKey crosswalk, no potential matches found via Reltio's `getMatches` API).
2. If all checks pass, the entity data is mapped to a OneKey `submitVR` request.
3. The request is sent to OneKey via `POST /vr/submit`.
4. A DCR entity is created in Reltio to track the status, and the request is logged in Mongo.
---
#### **Trace Validation Request**
* **Description:** This scheduled process checks the status of pending validation requests that have been sent to an external validator like OneKey or Veeva.
* **Triggers:** A timed scheduler (cron job) that runs every N hours.
* **Detailed Steps (OneKey Example):**
1. The process queries the Mongo DCR cache for requests with a `SENT` status.
2. For each request, it calls the OneKey `POST /vr/trace` API.
3. It evaluates the `processStatus` and `responseStatus` from the OneKey response.
4. If the request is resolved (`VAS_FOUND`, `VAS_NOT_FOUND`, etc.), the DCR status in Reltio and Mongo is updated to `ACCEPTED` or `REJECTED`.
5. If the response indicates a match was found but an OK crosswalk doesn't yet exist in Reltio, a new workflow is triggered for Data Steward manual review (`DS_ACTION_REQUIRED`).
---
#### **Data Steward Response**
* **Description:** This process handles the final outcome of a DCR that was reviewed internally by a Data Steward in Reltio.
* **Triggers:** Reltio change request events (`CHANGE_REQUEST_CHANGED`, `CHANGE_REQUEST_REMOVED`) that **do not** have the `ThirdPartyValidation` flag.
* **Detailed Steps:**
1. The process consumes the event from Reltio.
2. It checks the `state` of the change request.
3. If the state is `APPLIED` or `REJECTED`, the corresponding DCR entity in Reltio and the record in Mongo are updated to a final status of `ACCEPTED` or `REJECTED`.
---
#### **Data Steward OK Validation Request**
* **Description:** This process handles DCRs created by a Data Steward in Reltio using the "Suggest" and "Send to Third Party Validation" features, routing them to an external validator like OneKey.
* **Triggers:** Reltio change request events that **do** have the `ThirdPartyValidation` flag set to `true`.
* **Detailed Steps:**
1. The HUB retrieves the "preview" state of the entity from Reltio to see the suggested changes.
2. It compares the current entity with the preview to calculate the delta.
3. It maps these changes to a OneKey `submitVR` request. Attribute removals are sent as a comment due to API limitations.
4. The request is sent to OneKey.
5. Upon successful submission, the original change request in Reltio is programmatically rejected (since the validation is now happening externally), and a new DCR entity is created for tracking the OneKey validation.
***
### **4. Data Comparison and Mapping Details**
* **OneKey Comparator:**
When a Data Steward suggests changes, the HUB compares the current Reltio entity with the "preview" state to send to OneKey.
* **Simple Attributes** (e.g., `FirstName`): Values are compared for equality. The suggested value is taken if different.
* **Complex Attributes** (e.g., `Addresses`, `Specialties`): Nested attributes are matched using their Reltio URI. New nested objects are added, and changes to existing ones are applied.
* **Mandatory Fields:** For HCP, `LastName` and `Country` are mandatory. For HCO, `Country` and `Addresses` are mandatory.
* **Attribute Removal:** Due to API limitations, removing an attribute is not done directly but by generating a comment in the request, e.g., "Please remove attributes: [Address: ...]".
* **Veeva Mapping:**
The process of mapping Reltio canonical codes to Veeva's source-specific codes is multi-layered.
1. **Veeva Defaults:** The system first checks custom CSV mapping files stored in configuration (`mdm-veeva-dcr-service/defaults`). These files define direct mappings for a specific country and canonical code (e.g., `IN;SP.PD;PD`).
2. **RDM Lookups:** If no default is found, it queries RDM (via a Mongo `LookupValues` collection) for the canonical code and looks for a `sourceMapping` where the source is "VOD".
3. **Veeva Fallback:** If no mapping is found, it consults fallback CSV files (`mdm-veeva-dcr-service/fallback`) for certain attributes (e.g., `hco-specialty.csv`). A regular expression is often used to extract the correct code. If all else fails, a question mark (`?`) is used as the default fallback.
***
### **5. Status Management and Error Handling**
* **DCR Statuses:**
The system uses a combination of statuses to track the DCR lifecycle.
| RequestStatus | DCRStatus | Internal Cache Status | Description |
| :--- | :--- | :--- | :--- |
| REQUEST\_ACCEPTED | CREATED | SENT\_TO\_OK | DCR sent to OneKey, pending DS review. |
| REQUEST\_ACCEPTED | CREATED | SENT\_TO\_VEEVA | DCR sent to Veeva, pending DS review. |
| REQUEST\_ACCEPTED | CREATED | DS\_ACTION\_REQUIRED | DCR pending internal DS validation in Reltio. |
| REQUEST\_ACCEPTED | ACCEPTED | ACCEPTED | DS accepted the DCR; changes were applied. |
| REQUEST\_ACCEPTED | ACCEPTED | PRE\_ACCEPTED | Pre-close logic automatically accepted the DCR. |
| REQUEST\_REJECTED | REJECTED | REJECTED | DS rejected the DCR. |
| REQUEST\_REJECTED | REJECTED | PRE\_REJECTED | Pre-close logic automatically rejected the DCR. |
| REQUEST\_FAILED | - | FAILED | DCR failed due to a validation or system error. |
* **Error Codes:**
| Error Code | Description | HTTP Code |
| :--- | :--- | :--- |
| `DUPLICATE_REQUEST` | The `extDCRRequestId` has already been registered. | 403 |
| `NO_CHANGES_DETECTED` | The request contained no changes compared to the existing entity. | 400 |
| `VALIDATION_ERROR` | A referenced object (HCP/HCO) or attribute could not be found. | 404 / 400 |
* **DCR State Changes:**
A DCR begins in an `OPEN` state. It is then sent to a target system, moving to states like `SENT_TO_OK`, `SENT_TO_VEEVA`, or `DS_ACTION_REQUIRED`. Pre-close logic can immediately move it to `PRE_ACCEPTED` or `PRE_REJECTED`. Following data steward review (either internal or external), the DCR reaches a terminal state of `ACCEPTED` or `REJECTED`. If an error occurs, it moves to `FAILED`.
***
### **6. Technical Artifacts and Infrastructure**
* **Mongo Collections:**
* **`DCRRegistryONEKEY` / `DCRRegistryVeeva`:** System-specific collections to store and track the state of DCRs sent to OneKey and Veeva, respectively. They hold the mapped request data and trace the response.
* **`DCRRequest` / `DCRRegistry`:** General-purpose collections for tracking DCRs, their metadata, and overall status within the HUB.
* **`DCRRegistryVeeva`:** Specifically stores Veeva-bound DCRs, including the raw CSV line content, before they are submitted in a batch.
* **File Specifications:**
* **Veeva Integration:** Uses `ZIP` files containing multiple `CSV` files (`change_request.csv`, `change_request_hcp.csv`, `change_request_address.csv`, etc.). Response files follow a similar pattern and are named with the convention `<country>_DCR_Response_<Date>.zip`.
* **Highlander Integration:** Used `CSV` files for requests.
* **Event Models:**
The system uses internal Kafka events to communicate DCR status changes between components.
* **`OneKeyDCREvent`:** Published by the trace process after receiving a response from OneKey. It contains the DCR ID and the `OneKeyChangeRequest` details (`vrStatus`, `vrStatusDetail`, comments, validated IDs).
* **`VeevaDCREvent`:** Published by the Veeva trace process. It contains the DCR ID and `VeevaChangeRequestDetails` (`vrStatus`, `vrStatusDetail`, comments, new Veeva IDs).

2227
data/HUB_DCR_output_2.md Normal file

File diff suppressed because it is too large Load Diff