vg
tools for working with variation graphs
|
#include <gapless_extender.hpp>
Public Types | |
typedef GaplessExtension::seed_type | seed_type |
typedef pair_hash_set< seed_type > | cluster_type |
Public Member Functions | |
GaplessExtender () | |
Create an empty GaplessExtender. More... | |
GaplessExtender (const gbwtgraph::GBWTGraph &graph, const Aligner &aligner) | |
Create a GaplessExtender using the given GBWTGraph and Aligner objects. More... | |
std::vector< GaplessExtension > | extend (cluster_type &cluster, const std::string &sequence, const gbwtgraph::CachedGBWTGraph *cache=nullptr, size_t max_mismatches=MAX_MISMATCHES, double overlap_threshold=OVERLAP_THRESHOLD) const |
void | unfold_haplotypes (const std::unordered_set< nid_t > &subgraph, std::vector< std::vector< handle_t >> &haplotype_paths, bdsg::HashGraph &unfolded, const gbwtgraph::CachedGBWTGraph *cache=nullptr) const |
void | transform_alignment (Alignment &aln, const std::vector< std::vector< handle_t >> &haplotype_paths) const |
Static Public Member Functions | |
static seed_type | to_seed (pos_t pos, size_t read_offset) |
Convert (graph position, read offset) to a seed. More... | |
static pos_t | get_pos (seed_type seed) |
Get the graph position from a seed. More... | |
static handle_t | get_handle (seed_type seed) |
Get the handle from a seed. More... | |
static size_t | get_node_offset (seed_type seed) |
Get the node offset from a seed. More... | |
static size_t | get_read_offset (seed_type seed) |
Get the read offset from a seed. More... | |
static bool | full_length_extensions (const std::vector< GaplessExtension > &result, size_t max_mismatches=MAX_MISMATCHES) |
Public Attributes | |
const gbwtgraph::GBWTGraph * | graph |
const Aligner * | aligner |
Static Public Attributes | |
constexpr static size_t | MAX_MISMATCHES = 4 |
The default value for the maximum number of mismatches. More... | |
constexpr static double | OVERLAP_THRESHOLD = 0.8 |
A class that supports haplotype-consistent seed extension using GBWTGraph. Each seed is a pair of matching read/graph positions and each extension is a gapless alignment of an interval of the read to a haplotype. A cluster is an unordered set of distinct seeds. Seeds in the same node with the same (read_offset - node_offset) difference are considered equivalent. GaplessExtender also needs an Aligner object for scoring the extension candidates.
vg::GaplessExtender::GaplessExtender | ( | ) |
Create an empty GaplessExtender.
|
explicit |
Create a GaplessExtender using the given GBWTGraph and Aligner objects.
std::vector< GaplessExtension > vg::GaplessExtender::extend | ( | cluster_type & | cluster, |
const std::string & | sequence, | ||
const gbwtgraph::CachedGBWTGraph * | cache = nullptr , |
||
size_t | max_mismatches = MAX_MISMATCHES , |
||
double | overlap_threshold = OVERLAP_THRESHOLD |
||
) | const |
Find the highest-scoring extension for each seed in the cluster. If there is a full-length extension with at most max_mismatches mismatches, sort them in descending order by score and return the best non-overlapping full-length extensions. Two extensions overlap if the fraction of identical base mappings is greater than overlap_threshold. If there are no good enough full-length extensions, trim the extensions to maximize the score and remove duplicates. In this case, the extensions are sorted by read interval. Use full_length_extensions() to determine the type of the returned extension set. Allow any number of mismatches in the initial node, at least max_mismatches mismatches in the entire extension, and at least max_mismatches / 2 mismatches on each flank. Use the provided CachedGBWTGraph or allocate a new one.
|
static |
Determine whether the extension set contains non-overlapping full-length extensions sorted in descending order by score. Use the same value of max_mismatches as in extend().
Get the handle from a seed.
|
inlinestatic |
Get the node offset from a seed.
Get the graph position from a seed.
|
inlinestatic |
Get the read offset from a seed.
Convert (graph position, read offset) to a seed.
void vg::GaplessExtender::transform_alignment | ( | Alignment & | aln, |
const std::vector< std::vector< handle_t >> & | haplotype_paths | ||
) | const |
Transform an alignment to a single node in the unfold_haplotypes() graph to an alignment to the corresponding path in the original graph.
void vg::GaplessExtender::unfold_haplotypes | ( | const std::unordered_set< nid_t > & | subgraph, |
std::vector< std::vector< handle_t >> & | haplotype_paths, | ||
bdsg::HashGraph & | unfolded, | ||
const gbwtgraph::CachedGBWTGraph * | cache = nullptr |
||
) | const |
Find the distinct local haplotypes in the given subgraph and return the corresponding paths. For each path haplotype_paths[i], the output graph will contain node 2i + 1 with sequence corresponding to the path and node 2i + 2 with the reverse complement of the sequence. Use the provided CachedGBWTGraph or allocate a new one.
const Aligner* vg::GaplessExtender::aligner |
const gbwtgraph::GBWTGraph* vg::GaplessExtender::graph |
|
staticconstexpr |
The default value for the maximum number of mismatches.
|
staticconstexpr |
Two full-length alignments are distinct, if the fraction of overlapping position pairs is at most this.