Add remaining elements of protected health information #61

higgi13425 · 2018-02-20T21:16:11Z

Many of these are included already, but the full list is here:

https://medschool.duke.edu/research/clinical-and-translational-research/duke-office-clinical-research/irb-and-institutional-14

would be nice to add:
- random street names
- random zip code
- random city name
- random email, perhaps related to name
- random county name
- random SSN

Name
Address (all geographic subdivisions smaller than state, including street address, city county, and zip code)
All elements (except years) of dates related to an individual (including birthdate, admission date, discharge date, date of death, and exact age if over 89)
Telephone numbers
Fax number
Email address
Social Security Number
Medical record number
Health plan beneficiary number
Account number
Certificate or licence number
Any vehicle or other device serial number
Web URL
Internet Protocol (IP) Address
Finger or voice print
Photographic image - Photographic images are not limited to images of the face.
Any other characteristic that could uniquely identify the individual

sckott · 2018-02-20T22:31:52Z

thanks @higgi13425

Is the idea that people managing data under HIPAA will replace real data with fake data?

higgi13425 · 2018-02-20T22:54:13Z

Exactly. To deidentify a clinical dataset. zipcode replaced with deid_zipcodename replaced with deid_namestreet with deid_streetdob with deid_dobetc. Ideally, the date of birth(dob) would be the index date, and could be assigned a random date in the year 1900. then all other dates in the dataset could be adjusted relative to deid_dob, to preserve the sequence of events and relative time, while keeping data deidentified. This would be really helpful for folks like me with HIPAA issues with PHI-containing datasets. Even cooler - a function to 1) add a deid_x version of each PHI variable in the dataset, then2) split dataset into two - one with PHI plus unique key (stored securely)- and the 2nd with unique key plus deid_x versions of PHI data (plus all the other data). then you could share the 2nd dataframe (on GitHub, etc),but if you really needed to, you could merge to re-identify. thanks for considering it. Peter

added in a bunch of locales that we had data for but were not using yet added in many methods on PersonProvider for parts of names tweaked internals of personprovider to work with names that have probabilities - so far only in en_gb so far #62 #61 fixing addressprovider adding en_GB and en_US not done yet

sckott · 2018-02-23T01:20:38Z

higgi13425 · 2018-02-24T13:19:41Z

birthdate - the idea was to randomly select a day/month, and place the date of birth in a year that clearly is *not* the real date of birth - so that there is no confusion later between true dob and deid_dob. 1900 is a reasonable year, in that there are no people born in 1900 still alive. county name - for my purposes, US county only.I could imagine that if this becomes popular, the equivalent in other countries would be worthwhile. I agree, Most of the numbers can already be done. fax number ~ phone number This sounds promising! Peter

sckott · 2018-02-26T18:09:14Z

DOB: okay, i see now what you mean. can do it like

z <- DateTimeProvider$new()
z$date_time_between("1900-01-01", "1900-12-31")

counties: thanks, my feeling is to only do us counties for now

Fix #52 Fix #61

sckott added this to the v0.2 milestone Feb 23, 2018

sckott mentioned this issue Feb 23, 2018

function to prepare deidentified data.frame and split in two #65

Open

sckott modified the milestones: v0.2, v0.3 Jul 3, 2018

sckott removed this from the v0.3 milestone Oct 18, 2018

aalexandersson mentioned this issue Feb 15, 2022

Feature request: Add random U.S. Social Security Numbers #121

Closed

RMHogervorst added a commit that referenced this issue Oct 10, 2023

Add new vignette creating realistic data

abe58ca

Fix #52 Fix #61

RMHogervorst added this to the v0.6 milestone Oct 21, 2023

RMHogervorst closed this as completed in 47fcd5a Oct 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add remaining elements of protected health information #61

Add remaining elements of protected health information #61

Add remaining elements of protected health information #61

Add remaining elements of protected health information #61

Comments