Written by: Andre Sublett, VP Data and Advanced Analytics
Our patent, Automated Batch De-Identification of Unstructured Healthcare Documents, was awarded by the U.S. Patent and Trademark Office in November 2024. The award of this patent recognizes the company’s ability to ensure that no protected health information, or PHI, is used to train the parts of Concord’s Practical AI™ focused on intelligent document processing and data distribution. In addition to myself, the other inventors named were Tim Osten, Principal Software Architect, and John Scott, Principal Engineer, Cloud Infrastructure.
A critical part of using AI is the ability to train the model to recognize specific information and then execute various tasks. However, it’s essential that the information used for that process be de-identified so there is no specific information that could potentially identify a patient when combined with other data.
Concord takes customer privacy very seriously and so created a tool that meets all requirements set up by the HIPAA Privacy Rule, which establishes essential safe harbor rules for specific elements of protected health information (PHI). All told, there are 18 of those identifiers, as defined by HIPAA, which include:
· Names, physical and email addresses, phone numbers, and age
· Social Security and medical records numbers
· Vehicle identifiers and serial numbers, including license plate numbers
· Biometric identifiers, including finger and voice prints
· Full face photographic images and any comparable image
Here’s how it works: Concord’s de-identification tool reviews a document and identifies the specific types of PHI. Then it extracts them and generates replacements that are placed into a testing document. That new information is wholly fabricated and synthetic but still follows the original’s rules and formatting. Those can include the numerical pattern of a specific state’s driver’s licenses, or how an electronic health record (EHR) system formats dates of birth. Another example? If the location data is being changed from Maine to Georgia by the de-identification process, then the area code for a newly generated, synthetic phone number will be accurate to the new location.
Another positive result from our testing is the ability to reduce inaccurate data processing. As data is de-identified, we can uncover patterns leading to misidentified data that we may not have seen before — and then create synthetic data to address that issue. For instance, something like “Jack Smith Insurance” might not read as an insurance company, but as a person, during automated data processing. Then our team can take that research to improve the model, and our customers benefit from greater accuracy.
In every case, we have de-identification specialists on staff to supervise and make sure nothing has escaped the process. Our customers know the sensitive PHI they entrust us with is always safeguarded. That’s why, in addition to doing our own testing, we will be working with customers so they can train their own classification models as well.
This patent solidifies our position as a healthcare data company and underpins Concord’s Practical AI™ approach to enhancing data usage. New data assets about patient flow, for instance, can yield valuable insight for our customers to manage costs and staffing. We are committed to building those assets in a contractually and regulatory compliant way, and this patent demonstrates Concord’s unwavering commitment to data privacy and security.
To learn more about why Concord is trusted by some of the most security-conscious healthcare companies in the United States, visit our Security & Compliance information center.