Package htsjdk.samtools.reference
Class ReferenceSequenceFileFactory
- java.lang.Object
-
- htsjdk.samtools.reference.ReferenceSequenceFileFactory
-
public class ReferenceSequenceFileFactory extends Object
Factory class for creating ReferenceSequenceFile instances for reading reference sequences store in various formats.
-
-
Field Summary
Fields Modifier and Type Field Description static Set<String>
FASTA_EXTENSIONS
-
Constructor Summary
Constructors Constructor Description ReferenceSequenceFileFactory()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static boolean
canCreateIndexedFastaReader(Path fastaFile)
Checks if the provided FASTA file can be open as indexed.static File
getDefaultDictionaryForReferenceSequence(File file)
Returns the default dictionary name for a FASTA file.static Path
getDefaultDictionaryForReferenceSequence(Path path)
Returns the default dictionary name for a FASTA file.static String
getFastaExtension(Path path)
Returns the FASTA extension for the path.static Path
getFastaIndexFileName(Path fastaFile)
Returns the index name for a FASTA file.static ReferenceSequenceFile
getReferenceSequenceFile(File file)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile(File file, boolean truncateNamesAtWhitespace)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile(File file, boolean truncateNamesAtWhitespace, boolean preferIndexed)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile(String source, SeekableStream in, FastaSequenceIndex index)
Return an instance of ReferenceSequenceFile using the given fasta sequence file stream, optional index stream, and no sequence dictionarystatic ReferenceSequenceFile
getReferenceSequenceFile(String source, SeekableStream in, FastaSequenceIndex index, SAMSequenceDictionary dictionary, boolean truncateNamesAtWhitespace)
Return an instance of ReferenceSequenceFile using the given fasta sequence file stream and optional index stream and sequence dictionary.static ReferenceSequenceFile
getReferenceSequenceFile(Path path)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile(Path path, boolean truncateNamesAtWhitespace)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static ReferenceSequenceFile
getReferenceSequenceFile(Path path, boolean truncateNamesAtWhitespace, boolean preferIndexed)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.static SAMSequenceDictionary
loadDictionary(InputStream in)
Loads the sequence dictionary from a FASTA file input stream.
-
-
-
Method Detail
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(File file)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it. Sequence names will be truncated at first whitespace, if any.- Parameters:
file
- the reference sequence file on disk
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(File file, boolean truncateNamesAtWhitespace)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.- Parameters:
file
- the reference sequence file on disktruncateNamesAtWhitespace
- if true, only include the first word of the sequence name
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(File file, boolean truncateNamesAtWhitespace, boolean preferIndexed)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.- Parameters:
file
- the reference sequence file on disktruncateNamesAtWhitespace
- if true, only include the first word of the sequence namepreferIndexed
- if true attempt to return an indexed reader that supports non-linear traversal, else return the non-indexed reader
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(Path path)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it. Sequence names will be truncated at first whitespace, if any.- Parameters:
path
- the reference sequence file on disk
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(Path path, boolean truncateNamesAtWhitespace)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.- Parameters:
path
- the reference sequence file on disktruncateNamesAtWhitespace
- if true, only include the first word of the sequence name
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(Path path, boolean truncateNamesAtWhitespace, boolean preferIndexed)
Attempts to determine the type of the reference file and return an instance of ReferenceSequenceFile that is appropriate to read it.- Parameters:
path
- the reference sequence file pathtruncateNamesAtWhitespace
- if true, only include the first word of the sequence namepreferIndexed
- if true attempt to return an indexed reader that supports non-linear traversal, else return the non-indexed reader
-
canCreateIndexedFastaReader
public static boolean canCreateIndexedFastaReader(Path fastaFile)
Checks if the provided FASTA file can be open as indexed.For a FASTA file to be indexed, it requires to have:
- Associated .fai index (
FastaSequenceIndex
). - Associated .gzi index if it is block-compressed (
GZIIndex
).
- Parameters:
fastaFile
- the reference sequence file path.- Returns:
true
if the file can be open as indexed;false
otherwise.
- Associated .fai index (
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(String source, SeekableStream in, FastaSequenceIndex index)
Return an instance of ReferenceSequenceFile using the given fasta sequence file stream, optional index stream, and no sequence dictionary- Parameters:
source
- The named source of the reference file (used in error messages).in
- The input stream to read the fasta file from.index
- The index, or null to return a non-indexed reader.
-
getReferenceSequenceFile
public static ReferenceSequenceFile getReferenceSequenceFile(String source, SeekableStream in, FastaSequenceIndex index, SAMSequenceDictionary dictionary, boolean truncateNamesAtWhitespace)
Return an instance of ReferenceSequenceFile using the given fasta sequence file stream and optional index stream and sequence dictionary.- Parameters:
source
- The named source of the reference file (used in error messages).in
- The input stream to read the fasta file from.index
- The index, or null to return a non-indexed reader.dictionary
- The sequence dictionary, or null if there isn't one.truncateNamesAtWhitespace
- if true, only include the first word of the sequence name
-
getDefaultDictionaryForReferenceSequence
public static File getDefaultDictionaryForReferenceSequence(File file)
Returns the default dictionary name for a FASTA file.- Parameters:
file
- the reference sequence file on disk.
-
getDefaultDictionaryForReferenceSequence
public static Path getDefaultDictionaryForReferenceSequence(Path path)
Returns the default dictionary name for a FASTA file.- Parameters:
path
- the reference sequence file path.
-
loadDictionary
public static SAMSequenceDictionary loadDictionary(InputStream in)
Loads the sequence dictionary from a FASTA file input stream.- Parameters:
in
- the FASTA file input stream.- Returns:
- the sequence dictionary, or
null
if the header has no dictionary or it was empty.
-
getFastaExtension
public static String getFastaExtension(Path path)
Returns the FASTA extension for the path.- Parameters:
path
- the reference sequence file path.- Throws:
IllegalArgumentException
- if the file is not a supported reference file.
-
-