P03: A Quantitative and Metrics Driven Framework For Maximizing Data Utility While Balancing Re-Identification Risk
Chief Product Officer
Real Life Sciences, Inc. United States
To share best practices for using a structured, quantitative methodology for counterbalancing re-identification risk and data utility when performing dataset and document anonymization.
A quantitative, metrics driven process was used to support risk and data utility modeling, document anonymization and regulator reporting across five Health Canada PRCI study submissions. Studies ranged from ultra-rare to large scale diseases for newly approved therapeutics.
Quantitative approaches enabled by software were used for both risk and data utility modelling across studies. Over the course of these submissions a standardized and repeatable risk and data utility modeling process that adheres to Health Canada PRCI guidance was developed as a guideline for sponsors. This Metrics Driven Data Utility Optimization (MD-DUO) process encompasses the following rules for preserving data utility while meeting risk thresholds:
1. Determine a set of possible transformations across all identifiers to meet minimum 0.09 threshold in line with the HC guidance on anonymization techniques to preserve clinical value of the data after transformation. Transformations which allow for a higher level of granularity are preferred over outright redaction. This process often yields multiple possible transformation options across identifiers.
2. Measure the ‘Risk of Re-identification’ (ROR) across all possible transformation scenarios. The ROR metric can then be used to filter transformation options that meet the risk threshold.
3. Prioritize transformation scenarios by using the ‘Information Loss’ (IL) metric to ensure the optimal anonymization solution with minimal loss of data quality. The IL metric is used to quantitatively rank the different transformation scenarios in terms of data utility/data loss.
4. Optimize transformation options in consort with clinical scientists, who independently and qualitatively prioritize identifiers and evaluate the Clinical Utility (CU) based on the context of drug/condition in study. Clinical Scientists also further review risk associated with medical events found in the documents.
5. Select optimal transformation scenarios using optimal ROR, IL, and CU trade-offs. If new information is found that was previously missed, the process may be updated and/or repeated to incorporate new measurements.
In summary, the MD-DUO process ensures that multiple metrics are taken into account when choosing transformations.
Preserving data utility during the anonymization process must involve both quantitative measurements at the document and data level, as well as subjective input from medical experts to guide the process at the clinical level. Similarly, it must include a well-defined and precise implementation of the selected rules to prevent ‘over-redaction’ or ’over-suppression’, and comply with regulatory guidelines.
The five step MD-DUO process enables sponsors to iterate through different anonymization scenarios across identifiers and ‘pick and choose’ a set of identifier transformations that counterbalances risk and data utility. Metrics provide objective guidance into what and how to anonymize identifiers and enable more streamlined decision making than qualitative approaches.
The process combines Information Loss metrics with Clinical Utility by first ranking a breadth of possible transformation scenarios by information loss, and allowing Clinical Experts to select and filter those scenarios which are most in line with study outcomes. A special focus is placed on those identifiers which fulfill the purpose of Health Canada PRCI, which is to preserve the interpretation and evaluation of safety related information, such as Serious Adverse Events.
Lastly, the MD-DUO process is fully transferable and scalable beyond mandatory disclosures for PRCI and can also be used for voluntary and/or internal disclosure initiatives that are subject to other data privacy policies such as General Data Protection Regulation (GDPR).