Package picard.illumina.parser
Class MultiTileBclParser
- java.lang.Object
-
- picard.illumina.parser.MultiTileBclParser
-
-
Field Summary
Fields Modifier and Type Field Description protected BclQualityEvaluationStrategy
bclQualityEvaluationStrategy
protected int
currentTile
The current tile numberstatic byte
MASKING_QUALITY
-
Constructor Summary
Constructors Constructor Description MultiTileBclParser(File directory, int lane, picard.illumina.parser.CycleIlluminaFileMap tilesToCycleFiles, OutputMapping outputMapping, boolean applyEamssFilter, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, TileIndex tileIndex)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
int
getTileOfNextCluster()
Returns the tile of the next cluster that will be returned by PerTilePerCycleParser and therefore should be called before next() if you want to know the tile for the data returned by next()boolean
hasNext()
void
initialize()
protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData>
makeCycleFileParser(List<File> files)
Create a Bcl parser for an individual cycle and wrap it with the CycleFilesParser interface which populates the correct cycle in BclData.protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData>
makeCycleFileParser(List<File> files, picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> cycleFilesParser)
For a given cycle, return a CycleFilesParser.BclData
next()
Return the data for the next cluster by: 1.void
remove()
protected static void
runEamssForReadInPlace(byte[] bases, byte[] qualities)
EAMSS is an Illumina Developed Algorithm for detecting reads whose quality has deteriorated towards their end and revising the quality to the masking quality (2) if this is the case.void
seekToTile(int tile)
Clear the current set of cycleFileParsers and replace them with the ones for the tile indicated by oneBasedTileNumberSet<IlluminaDataType>
supportedTypes()
void
verifyData(List<Integer> tiles, int[] cycles)
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface java.util.Iterator
forEachRemaining
-
-
-
-
Field Detail
-
MASKING_QUALITY
public static final byte MASKING_QUALITY
- See Also:
- Constant Field Values
-
bclQualityEvaluationStrategy
protected final BclQualityEvaluationStrategy bclQualityEvaluationStrategy
-
currentTile
protected int currentTile
The current tile number
-
-
Constructor Detail
-
MultiTileBclParser
public MultiTileBclParser(File directory, int lane, picard.illumina.parser.CycleIlluminaFileMap tilesToCycleFiles, OutputMapping outputMapping, boolean applyEamssFilter, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, TileIndex tileIndex)
-
-
Method Detail
-
initialize
public void initialize()
-
makeCycleFileParser
protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> makeCycleFileParser(List<File> files, picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> cycleFilesParser)
For a given cycle, return a CycleFilesParser. It will close the cycleFilesParser if not null.- Parameters:
files
- The file to parsecycleFilesParser
- The previous cycle file parser, null otherwise.- Returns:
- A CycleFilesParser that will populate the correct position in the IlluminaData object with that cycle's data.
-
makeCycleFileParser
protected picard.illumina.parser.PerTileCycleParser.CycleFilesParser<BclData> makeCycleFileParser(List<File> files)
Create a Bcl parser for an individual cycle and wrap it with the CycleFilesParser interface which populates the correct cycle in BclData.- Parameters:
files
- The files to parse.- Returns:
- A CycleFilesParser that populates a BclData object with data for a single cycle
-
supportedTypes
public Set<IlluminaDataType> supportedTypes()
-
next
public BclData next()
Return the data for the next cluster by: 1. Advancing tiles if we reached the end of the current tile. 2. For each cycle, get the appropriate parser and have it populate it's data into the IlluminaData object.
-
runEamssForReadInPlace
protected static void runEamssForReadInPlace(byte[] bases, byte[] qualities)
EAMSS is an Illumina Developed Algorithm for detecting reads whose quality has deteriorated towards their end and revising the quality to the masking quality (2) if this is the case. This algorithm works as follows (with one exception): Start at the end (high indices, at the right below) of the read and calculate an EAMSS tally at each location as follow: if(quality[i] < 15) tally += 1 if(quality[i] >= 15 and < 30) tally = tally if(quality[i] >= 30) tally -= 2 For each location, keep track of this tally (e.g.) Read Starts at <- this end Cycle: 1 2 3 4 5 6 7 8 9 Bases: A C T G G G T C A Qualities: 32 32 16 15 8 10 32 2 2 Cycle Score: -2 -2 0 0 1 1 -2 1 1 //The EAMSS Score determined for this cycle alone EAMSS TALLY: 0 0 2 2 2 1 0 2 1 X - Earliest instance of Max-Score You must keep track of the maximum EAMSS tally (in this case 2) and the earliest(lowest) cycle at which it occurs. If and only if, the max EAMSS tally >= 1 then from there until the end(highest cycle) of the read reassign these qualities as 2 (the masking quality). The output qualities would therefore be transformed from: Original Qualities: 32 32 16 15 8 10 32 2 2 to Final Qualities: 32 32 2 2 2 2 2 2 2 X - Earliest instance of max-tally/end of masking IMPORTANT: The one exception is: If the max EAMSS Tally is preceded by a long string of G basecalls (10 or more, with a single basecall exception per10 bases) then the masking continues to the beginning of that string of G's. E.g.: Cycle: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Bases: C T A C A G A G G G G G G G G C A T Qualities: 30 22 26 27 28 30 7 34 20 19 38 15 32 32 10 4 2 5 Cycle Score: -2 0 0 0 0 -2 1 -2 0 0 -2 0 -2 -2 1 1 1 1 EAMSS TALLY: -2 -5 -5 -5 -5 -5 -3 -4 -2 -2 -2 0 0 2 4 3 2 1 X- Earliest instance of Max-Tally Resulting Transformation: Bases: C T A C A G A G G G G G G G G C A T Original Qualities: 30 22 26 27 28 30 7 34 20 19 38 15 32 32 10 4 2 5 Final Qualities: 30 22 26 27 28 2 2 2 2 2 2 2 2 2 2 2 2 2 X- Earliest instance of Max-Tally X - Start of EAMSS masking due to G-Run To further clarify the exception rule here are a few examples: A C G A C G G G G G G G G G G G G G G G G G G G G A C T X - Earliest instance of Max-Tally X - Start of EAMSS masking (with a two base call jump because we have 20 bases in the run already) T T G G A G G G G G G G G G G G G G G G G G G A G A C T X - Earliest instance of Max-Tally X - We can skip this A as well as the earlier A because we have 20 or more bases in the run already X - Start of EAMSS masking (with a two base call jump because we have 20 bases in the run) T T G G G A A G G G G G G G G G G G G G G G G G G T T A T X - Earliest instance of Max-Tally X X - WE can skip these bases because the first A counts as the first skip and as far as the length of the string of G's is concerned, these are both counted like G's X - This A is the 20th base in the string of G's and therefore can be skipped X - Note that the A's previous to the G's are only included because there are G's further on that are within the number of allowable exceptions away (i.e. 2 in this instance), if there were NO G's after the A's you CANNOT count the A's as part of the G strings (even if no exceptions have previously occured) In other words, the end of the string of G's MUST end in a G not an "exception" However, if the max-tally occurs to the right of the run of Gs then this is still part of the string of G's but does count towards the number of exceptions allowable (e.g.) T T G G G G G G G G G G A C G X - Earliest instance of Max-tally The first index CAN be considered as an exception, the above would be masked to the following point: T T G G G G G G G G G G A C G X - End of EAMSS masking due to G-Run To sum up the final points, a string of G's CAN START with an exception but CANNOT END in an exception.- Parameters:
bases
- Bases for a single read in the cluster ( not the entire cluster )qualities
- Qualities for a single read in the cluster ( not the entire cluster )
-
seekToTile
public void seekToTile(int tile)
Clear the current set of cycleFileParsers and replace them with the ones for the tile indicated by oneBasedTileNumber- Parameters:
tile
- requested tile
-
hasNext
public boolean hasNext()
-
getTileOfNextCluster
public int getTileOfNextCluster()
Returns the tile of the next cluster that will be returned by PerTilePerCycleParser and therefore should be called before next() if you want to know the tile for the data returned by next()- Returns:
- The tile number of the next ILLUMINA_DATA object to be returned
-
remove
public void remove()
-
close
public void close()
-
-