vg
tools for working with variation graphs
Classes | Public Member Functions | Private Member Functions | Private Attributes | List of all members
vg::XdropAligner Class Reference

#include <xdrop_aligner.hpp>

Classes

struct  graph_pos_s
 
struct  OrderedGraph
 

Public Member Functions

 XdropAligner ()
 
 XdropAligner (XdropAligner const &)
 
XdropAligneroperator= (XdropAligner const &)
 
 XdropAligner (XdropAligner &&)
 
XdropAligneroperator= (XdropAligner &&)
 
 XdropAligner (int8_t _match, int8_t _mismatch, int8_t _gap_open, int8_t _gap_extension, int32_t _full_length_bonus, uint32_t _max_gap_length)
 
 XdropAligner (int8_t const *_score_matrix, int8_t _gap_open, int8_t _gap_extension, int32_t _full_length_bonus, uint32_t _max_gap_length)
 
 ~XdropAligner (void)
 
void align (Alignment &alignment, OrderedGraph const &graph, const vector< MaximalExactMatch > &mems, bool reverse_complemented)
 
void align (Alignment &alignment, Graph const &graph, const vector< MaximalExactMatch > &mems, bool reverse_complemented)
 Implementation of align() that automatically wraps up a topologically-ordered Protobuf graph as an OrderedGraph. More...
 
void align_pinned (Alignment &alignment, const HandleGraph &g, bool pin_left)
 
void align_pinned (Alignment &alignment, const HandleGraph &g, const vector< handle_t > &topological_order, bool pin_left)
 

Private Member Functions

void build_id_index_table (OrderedGraph const &graph)
 Fill in the id_to_index map. More...
 
void build_index_edge_table (OrderedGraph const &graph, uint32_t const seed_node_index, bool left_to_right)
 
graph_pos_s calculate_seed_position (OrderedGraph const &graph, vector< MaximalExactMatch > const &mems, size_t query_length, bool direction)
 
graph_pos_s calculate_max_position (OrderedGraph const &graph, graph_pos_s const &seed_pos, size_t max_node_index, bool direction)
 
graph_pos_s scan_seed_position (OrderedGraph const &graph, std::string const &query_seq, bool direction)
 
size_t push_edit (Mapping *mapping, uint8_t op, char const *alt, size_t len)
 
size_t extend (OrderedGraph const &graph, vector< uint64_t >::const_iterator begin, vector< uint64_t >::const_iterator end, dz_query_s const *packed_query, size_t seed_node_index, uint64_t seed_offset, bool right_to_left)
 
void calculate_and_save_alignment (Alignment &alignment, OrderedGraph const &graph, graph_pos_s const &head_pos, size_t tail_node_index, bool left_to_right)
 
void align_downward (Alignment &alignment, OrderedGraph const &graph, graph_pos_s const &head_pos, bool left_to_right)
 

Private Attributes

dz_s * dz
 This is the backing dozeu library problem instance. More...
 
int8_t aa_match
 
std::unordered_map< id_t, uint64_t > id_to_index
 Maps from node ID to the index in our internal subgraph storage at which that node occurs. More...
 
std::vector< uint64_t > index_edges
 
std::vector< uint64_t > index_edges_head
 TODO: what is this? More...
 
std::vector< struct dz_forefront_s const * > forefronts
 Stores all of the currently outstanding dozeu library forefronts. More...
 
const std::function< bool(uint64_t const &, uint64_t const &)> compare [2]
 
const std::function< uint64_t(uint64_t const &, uint64_t const &)> edge [2]
 

Detailed Description

Align to a graph using the xdrop algorithm, as implemented in dozeu.

Not thread-safe. Each align() call stores state in the object.

Can be re-used for multiple problems in a row.

The underlying Dozeu library is fundamentally based around semi-global alignment: extending an alignment from a known matching position (what in other parts of vg we call "pinned" alignment).

To simulate non-pinned alignment, we align in two passes in different directions. One from a guess of a pinning position, to get a more accurate "head" pinning position for the other end, and once back from where the previous pass ended up, to get an overall hopefully-optimal alignment.

If the input graph is not reverse-complemented, direction = false (reverse, right to left) on the first pass, and direction = true (forward, left to right) on the second. If it is reverse complemented, we flip them.

This won't actually work in theory to get the optimal local alignment in all cases, but it works well in practice.

Constructor & Destructor Documentation

◆ XdropAligner() [1/5]

XdropAligner::XdropAligner ( )

◆ XdropAligner() [2/5]

XdropAligner::XdropAligner ( XdropAligner const &  rhs)

◆ XdropAligner() [3/5]

XdropAligner::XdropAligner ( XdropAligner &&  rhs)

◆ XdropAligner() [4/5]

XdropAligner::XdropAligner ( int8_t  _match,
int8_t  _mismatch,
int8_t  _gap_open,
int8_t  _gap_extension,
int32_t  _full_length_bonus,
uint32_t  _max_gap_length 
)

◆ XdropAligner() [5/5]

XdropAligner::XdropAligner ( int8_t const *  _score_matrix,
int8_t  _gap_open,
int8_t  _gap_extension,
int32_t  _full_length_bonus,
uint32_t  _max_gap_length 
)

◆ ~XdropAligner()

XdropAligner::~XdropAligner ( void  )

Member Function Documentation

◆ align() [1/2]

void XdropAligner::align ( Alignment alignment,
Graph const &  graph,
const vector< MaximalExactMatch > &  mems,
bool  reverse_complemented 
)

Implementation of align() that automatically wraps up a topologically-ordered Protobuf graph as an OrderedGraph.

◆ align() [2/2]

void XdropAligner::align ( Alignment alignment,
OrderedGraph const &  graph,
const vector< MaximalExactMatch > &  mems,
bool  reverse_complemented 
)

align query: forward-backward banded alignment

Compute an alignment of the given Alignment's sequence against the given topologically sorted graph, using (one of) the given MEMs to seed the alignment.

reverse_complemented is true if the topologically sorted graph we have was reverse-complemented when extracted from a larger containing graph, and false if it is in the same orientation as it exists in the larger containing graph. The MEMs and the Alignment are interpreted as being against the forward strand of the passed subgraph no matter the value of this setting.

reverse_complemented true means we will compute the alignment forward in the topologically-sorted order of the given graph (anchoring to the first node if no MEMs are provided) and false if we want to compute the alignment backward in the topological order (anchoring to the last node).

All the graph edges must go from earlier to later nodes, and from_start and to_end must alsways be false.

First the head (the most upstream) seed in MEMs is selected and extended downward to detect the downstream breakpoint. Next the alignment path is generated by second upward extension from the downstream breakpoint.

The MEM list may be empty. If MEMs are provided, uses only the begin, end, and nodes fields of the MaximalExactMatch objects. It uses the first occurrence of the last MEM if reverse_complemented is true, and the last occurrence of the first MEM otherwise.

◆ align_downward()

void XdropAligner::align_downward ( Alignment alignment,
OrderedGraph const &  graph,
graph_pos_s const &  head_pos,
bool  left_to_right 
)
private

After doing the upward pass and finding head_pos to anchor from, do the downward alignment pass and traceback. If left_to_right is set, goes left to right and traces back the other way. If it is unset, goes right to left and traces back the other way.

◆ align_pinned() [1/2]

void XdropAligner::align_pinned ( Alignment alignment,
const HandleGraph g,
bool  pin_left 
)

Compute a pinned alignment, where the start (pin_left=true) or end (pin_left=false) end of the Alignment sequence is pinned to the start of the first (pin_left=true) or end of the last (pin_left=false) node in the graph's topological order.

Does not account for multiple sources/sinks in the topological order; whichever comes first/last ends up being used for the pin.

TODO: This should become const and the class should become thread safe.

◆ align_pinned() [2/2]

void XdropAligner::align_pinned ( Alignment alignment,
const HandleGraph g,
const vector< handle_t > &  topological_order,
bool  pin_left 
)

Version of align_pinned that allows you to pass your own topological order. The topological order MUST be left to right, no matter whether you are pinning left or right. If alignment needs to proceed backward, it will be reversed internally. TODO: This should become const and the class should become thread safe.

◆ build_id_index_table()

void XdropAligner::build_id_index_table ( OrderedGraph const &  graph)
private

Fill in the id_to_index map.

◆ build_index_edge_table()

void XdropAligner::build_index_edge_table ( OrderedGraph const &  graph,
uint32_t const  seed_node_index,
bool  left_to_right 
)
private

Fill in index_edges and index_edges_head. Needs to know the index of the "seed node" in our graph's list of nodes, and the direction of the pass we are setting up for (false = right to left, true = left to right)

◆ calculate_and_save_alignment()

void XdropAligner::calculate_and_save_alignment ( Alignment alignment,
OrderedGraph const &  graph,
graph_pos_s const &  head_pos,
size_t  tail_node_index,
bool  left_to_right 
)
private

After all the alignment work has been done, do the traceback and save into the given Alignment object.

If left_to_right is true, the nodes were filled left to right, and the internal traceback will come out in left to right order, so we can emit it as is. If it is false, the nodes were filled right to left, and the internal traceback comes out in right to left order, so we need to flip it.

◆ calculate_max_position()

XdropAligner::graph_pos_s XdropAligner::calculate_max_position ( OrderedGraph const &  graph,
graph_pos_s const &  seed_pos,
size_t  max_node_index,
bool  direction 
)
private

Given the index of the node at which the winning score occurs, find the position in the node and read sequence at which the winning match is found.

◆ calculate_seed_position()

XdropAligner::graph_pos_s XdropAligner::calculate_seed_position ( OrderedGraph const &  graph,
vector< MaximalExactMatch > const &  mems,
size_t  query_length,
bool  direction 
)
private

Given the subgraph we are aligning to, the MEM hist against it, the length of the query, and the direction we are aligning the query in (true = forward), select a single anchoring match between the graph and the query to align out from.

This replaces scan_seed_position for the case where we have MEMs.

◆ extend()

size_t XdropAligner::extend ( OrderedGraph const &  graph,
vector< uint64_t >::const_iterator  begin,
vector< uint64_t >::const_iterator  end,
dz_query_s const *  packed_query,
size_t  seed_node_index,
uint64_t  seed_offset,
bool  right_to_left 
)
private

Do alignment. Takes the graph, the sorted packed edges in ascending order for a forward pass or descending order for a reverse pass, the packed query sequence, the index of the seed node in the graph, the offset (TODO: in the read?) of the seed position, and the direction to traverse the graph topological order.

Note that we take our direction as right_to_left, whole many other functions take it as left_to_right.

If a MEM seed is provided, this is run in two passes. The first is left to right (right_to_left = false) if align did not have reverse_complement set and the second is right to left (right_to_left = true).

If we have no MEM seed, we only run one pass (the second one).

Returns the index in the topological order of the node with the highest scoring alignment.

Note that if no non-empty local alignment is found, it may not be safe to call dz_calc_max_qpos on the associated forefront!

◆ operator=() [1/2]

XdropAligner & XdropAligner::operator= ( XdropAligner &&  rhs)

◆ operator=() [2/2]

XdropAligner & XdropAligner::operator= ( XdropAligner const &  rhs)

◆ push_edit()

size_t XdropAligner::push_edit ( Mapping mapping,
uint8_t  op,
char const *  alt,
size_t  len 
)
private

Append an edit at the end of the current mapping array. Returns the length passed in.

◆ scan_seed_position()

XdropAligner::graph_pos_s XdropAligner::scan_seed_position ( OrderedGraph const &  graph,
std::string const &  query_seq,
bool  direction 
)
private

If no seeds are provided as alignment input, we need to compute our own starting anchor position. This function does that. Takes the topologically-sorted graph, the query sequence, and the direction. If direction is false, finds a seed hit on the first node of the graph. If it is true, finds a hit on the last node.

This replaces calculate_seed_position for the case where we have no MEMs.

Member Data Documentation

◆ aa_match

int8_t vg::XdropAligner::aa_match
private

◆ compare

const std::function<bool (uint64_t const &, uint64_t const &)> vg::XdropAligner::compare[2]
private
Initial value:
= {
[](uint64_t const &x, uint64_t const &y) -> bool { return((int64_t)x < (int64_t)y); },
[](uint64_t const &x, uint64_t const &y) -> bool { return((int64_t)x > (int64_t)y); }
}

Lookup table for forward- and reverse-sorting comparators, interpreting unsigned arguments as signed. Use [0] for forward and [1] for reverse (FIXME: can we embed them in the vtable?)

◆ dz

dz_s* vg::XdropAligner::dz
private

This is the backing dozeu library problem instance.

◆ edge

const std::function<uint64_t (uint64_t const &, uint64_t const &)> vg::XdropAligner::edge[2]
private
Initial value:
= {
[](uint64_t const &from, uint64_t const &to) -> uint64_t { return((to<<32) | from); },
[](uint64_t const &from, uint64_t const &to) -> uint64_t { return((from<<32) | to); }
}

Lookup table for functions to compose packed edge uint64_t values from the from and to indexes. Use [0] to put to in the high bits and [1] to put from in the high bits. TODO: When would you use either?

◆ forefronts

std::vector< struct dz_forefront_s const * > vg::XdropAligner::forefronts
private

Stores all of the currently outstanding dozeu library forefronts.

◆ id_to_index

std::unordered_map< id_t, uint64_t > vg::XdropAligner::id_to_index
private

Maps from node ID to the index in our internal subgraph storage at which that node occurs.

◆ index_edges

std::vector< uint64_t > vg::XdropAligner::index_edges
private

List of edges. Stored as two int32_ts packed together. TODO: what is the order of packing?

◆ index_edges_head

std::vector< uint64_t > vg::XdropAligner::index_edges_head
private

TODO: what is this?


The documentation for this class was generated from the following files: