Class PseudonymizeAndSequester


  • public class PseudonymizeAndSequester
    extends java.lang.Object

    A class to implement bulk de-identification and pseudonymization of DICOM files with sequesteration of files that may have risk of identity leakage.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      protected class  PseudonymizeAndSequester.OurMediaImporter
      A protected class that actually does all the work of finding and processing the files.
    • Constructor Summary

      Constructors 
      Constructor Description
      PseudonymizeAndSequester​(java.lang.String inputPathName, java.lang.String outputFolderCleanName, java.lang.String outputFolderDirtyName, java.lang.String pseudonymizationControlFileName, java.lang.String pseudonymizationResultByOriginalPatientIDFileName, java.lang.String pseudonymizationResultByOriginalStudyInstanceUIDFileName, java.lang.String failedFilesFileName, java.lang.String uidMapResultFileName, java.lang.String seed, boolean keepAllPrivate, boolean addContributingEquipmentSequence, boolean keepDescriptors, boolean keepSeriesDescriptors, boolean keepProtocolName, boolean keepPatientCharacteristics, boolean keepDeviceIdentity, boolean keepInstitutionIdentity, int handleDates, int handleStructuredContent)
      Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.
    • Field Detail

      • ourCalledAETitle

        protected static java.lang.String ourCalledAETitle
      • radixForRandomPseudonymousID

        protected static int radixForRandomPseudonymousID
      • epochForDateModification

        protected static java.util.Date epochForDateModification
      • defaultEarliestDateInSet

        protected static java.util.Date defaultEarliestDateInSet
      • newPatientIDByOriginalPatientID

        protected java.util.Map<java.lang.String,​java.lang.String> newPatientIDByOriginalPatientID
      • newPatientIDByOriginalStudyInstanceUID

        protected java.util.Map<java.lang.String,​java.lang.String> newPatientIDByOriginalStudyInstanceUID
      • newPatientNameByNewPatientID

        protected java.util.Map<java.lang.String,​java.lang.String> newPatientNameByNewPatientID
      • earliestDateByOrignalPatientID

        protected java.util.Map<java.lang.String,​java.util.Date> earliestDateByOrignalPatientID
      • random

        protected java.util.Random random
    • Constructor Detail

      • PseudonymizeAndSequester

        public PseudonymizeAndSequester​(java.lang.String inputPathName,
                                        java.lang.String outputFolderCleanName,
                                        java.lang.String outputFolderDirtyName,
                                        java.lang.String pseudonymizationControlFileName,
                                        java.lang.String pseudonymizationResultByOriginalPatientIDFileName,
                                        java.lang.String pseudonymizationResultByOriginalStudyInstanceUIDFileName,
                                        java.lang.String failedFilesFileName,
                                        java.lang.String uidMapResultFileName,
                                        java.lang.String seed,
                                        boolean keepAllPrivate,
                                        boolean addContributingEquipmentSequence,
                                        boolean keepDescriptors,
                                        boolean keepSeriesDescriptors,
                                        boolean keepProtocolName,
                                        boolean keepPatientCharacteristics,
                                        boolean keepDeviceIdentity,
                                        boolean keepInstitutionIdentity,
                                        int handleDates,
                                        int handleStructuredContent)
                                 throws DicomException,
                                        java.io.FileNotFoundException,
                                        java.io.IOException

        Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.

        Searches the specified input path recursively for suitable files.

        The pseudonymizationControlFileName and pseudonymizationResultFileName files are three columns of tab delimited UTF-8 text, the original PatientID, the new PatientID and the new PatientName.

        Parameters:
        inputPathName - the path to search for DICOM files
        outputFolderCleanName - where to store all the low risk processed output files (must already exist)
        outputFolderDirtyName - where to store all the high risk processed output files (must already exist)
        pseudonymizationControlFileName - values to use for pseudonymization, may be null or empty in which case random values are used
        pseudonymizationResultByOriginalPatientIDFileName - file into which to store pseudonymization by original PatientID performed
        pseudonymizationResultByOriginalStudyInstanceUIDFileName - file into which to store pseudonymization by original StudyInstanceUID performed
        failedFilesFileName - file into which to store the paths of files that failed to process
        uidMapResultFileName - file into which to store the map of original to new UIDs
        seed - the initial seed to generate random pseudonymous identifiers, long integer as string or null or zero length if none (for deterministic creation of pseudonyms)
        keepAllPrivate - retain all private attributes, not just known safe ones
        addContributingEquipmentSequence - whether or not to add ContributingEquipmentSequence
        keepDescriptors - if true, keep the text description and comment attributes
        keepSeriesDescriptors - if true, keep the series description even if all other descriptors are removed
        keepProtocolName - if true, keep protocol name even if all other descriptors are removed
        keepPatientCharacteristics - if true, keep patient characteristics (such as might be needed for PET SUV calculations)
        keepDeviceIdentity - if true, keep device identity
        keepInstitutionIdentity - if true, keep institution identity
        handleDates - keep, remove or modify dates and times
        handleStructuredContent - keep, remove or modify structured content
        Throws:
        DicomException
        java.io.IOException
        java.io.FileNotFoundException
    • Method Detail

      • makeOutputFileName

        protected static java.lang.String makeOutputFileName​(java.lang.String outputFolderName,
                                                             java.lang.String inputFileName,
                                                             java.lang.String sopInstanceUID)
                                                      throws java.io.IOException

        Make a suitable file name to use for a deidentified and redacted input file.

        The default is the UID plus "_Anon.dcm" in the outputFolderName (ignoring the inputFileName).

        Override this method in a subclass if a different file name is required.

        Parameters:
        outputFolderName - where to store all the processed output files
        inputFileName - the path to search for DICOM files
        sopInstanceUID - the SOP Instance UID of the output file
        Throws:
        java.io.IOException - if a filename cannot be constructed
      • readPseudonymizationControlFile

        protected void readPseudonymizationControlFile​(java.lang.String pseudonymizationControlFileName)
                                                throws java.io.IOException

        Read a file mapping original PatientID or StudyInstanceUID to new PatientID and PatientName and add them to the maps.

        Type of file is detected based on header line of the form: originalPatientID newPatientID newPatientName or originalStudyInstanceUID newPatientID newPatientName
        Parameters:
        pseudonymizationControlFileName - the control file, if any
        Throws:
        java.io.IOException
      • createNewPseudonymousPatientAndAddToMaps

        protected java.lang.String createNewPseudonymousPatientAndAddToMaps​(java.lang.String originalPatientID,
                                                                            java.lang.String originalStudyInstanceUID)

        Create a new PatientID and PatientName and them to the maps.

        Parameters:
        originalPatientID - the old PatientID
        originalStudyInstanceUID - the old StudyInstanceUID
        Returns:
        the new PatientID
      • writePseudonymizationResultByOriginalPatientID

        protected void writePseudonymizationResultByOriginalPatientID​(java.io.PrintWriter w)
      • writePseudonymizationResultByOriginalStudyInstanceUID

        protected void writePseudonymizationResultByOriginalStudyInstanceUID​(java.io.PrintWriter w)
      • writeUIDMapResult

        protected void writeUIDMapResult​(java.io.PrintWriter uidMapResultWriter)
      • containsOverlay

        protected static boolean containsOverlay​(AttributeList list)
      • isDirty

        protected static boolean isDirty​(AttributeList list)
      • main

        public static void main​(java.lang.String[] arg)

        Read DICOM format image files, de-identify and pseudonymize them and sequester any files that may have risk of identity leakage.

        Searches the specified input path recursively for suitable files The pseudonymizationControlFile and pseudonymizationResultFile are tab delimited with a header row containing either: originalPatientID newPatientID newPatientName or originalStudyInstanceUID newPatientID newPatientName
        Parameters:
        arg - seven or eight parameters plus options, the inputPath (file or folder), outputFolderClean, outputFolderDirty, pseudonymizationControlFile, pseudonymizationResultByOriginalPatientIDFile, pseudonymizationResultByOriginalStudyInstanceUIDFile, failedFilesFile, uidMapResultFile, and optionally a random seed for deterministic creation of pseudonyms, then various options controlling de-identification