vg
tools for working with variation graphs
Classes | Public Member Functions | Public Attributes | Private Member Functions | List of all members
vg::ReadFilter Class Reference

#include <readfilter.hpp>

Classes

struct  Counts
 

Public Member Functions

Counts filter_alignment (Alignment &aln)
 
int filter (istream *alignment_stream)
 
bool trim_ambiguous_ends (Alignment &alignment, int k)
 

Public Attributes

bool complement_filter = false
 Actually take the complement of the filter. More...
 
vector< string > name_prefixes
 
vector< regex > excluded_refpos_contigs
 Read must not have a refpos set with a contig name containing a match to any of these. More...
 
unordered_set< string > excluded_features
 
double min_secondary = numeric_limits<double>::lowest()
 
double min_primary = numeric_limits<double>::lowest()
 
bool rescore = false
 
bool frac_score = false
 
bool sub_score = false
 
int max_overhang = numeric_limits<int>::max() / 2
 
int min_end_matches = numeric_limits<int>::min() / 2
 
bool verbose = false
 
double min_mapq = numeric_limits<double>::lowest()
 
int repeat_size = 0
 
bool drop_split = false
 Should we drop split reads that follow edges not in the graph? More...
 
double downsample_probability = 1.0
 We can also pseudorandomly drop reads. What's the probability that we keep a read? More...
 
uint32_t downsample_seed_mask = 0
 
int defray_length = 0
 
int defray_count = 99999
 Limit defray recursion to visit this many nodes. More...
 
int threads = -1
 Number of threads from omp. More...
 
int buffer_size = 512
 GAM output buffer size. More...
 
bool write_output = true
 
const HandleGraphgraph = nullptr
 A HandleGraph is required for some filters (Note: ReadFilter doesn't own/free this) More...
 
bool interleaved = false
 Interleaved input. More...
 
bool filter_on_all = false
 
int min_base_quality = numeric_limits<int>::min() / 2
 
double min_base_quality_fraction = numeric_limits<double>::lowest()
 
bool append_regions = false
 

Private Member Functions

bool has_repeat (Alignment &aln, int k)
 
bool trim_ambiguous_end (Alignment &alignment, int k)
 
bool is_split (Alignment &alignment)
 
bool sample_read (const Alignment &read)
 

Member Function Documentation

◆ filter()

int vg::ReadFilter::filter ( istream *  alignment_stream)

Filter the alignments available from the given stream, placing them on standard output or in the appropriate file. Returns 0 on success, exit code to use on error.

◆ filter_alignment()

ReadFilter::Counts vg::ReadFilter::filter_alignment ( Alignment aln)

Run all the filters on an alignment. The alignment may get modified in-place by the defray filter

◆ has_repeat()

bool vg::ReadFilter::has_repeat ( Alignment aln,
int  k 
)
private

 * quick and dirty filter to see if removing reads that can slip around and still map perfectly helps vg call. returns true if at either end of read sequence, at least k bases are repetitive, checking repeats of up to size 2k

◆ is_split()

bool vg::ReadFilter::is_split ( Alignment alignment)
private

Return false if the read only follows edges in the graph, and true if the read is split (or just incorrect) and takes edges not in the index.

Throws an error if no graph is specified.

◆ sample_read()

bool vg::ReadFilter::sample_read ( const Alignment read)
private

Based on the read name and paired-ness, compute the SAM-style QNAME and use that and the configured sampling probability and seed in the Samtools read sampling algorithm, to determine if the read should be kept. Returns true if the read should stay, and false if it should be removed. Always accepts or rejects paired reads together.

◆ trim_ambiguous_end()

bool vg::ReadFilter::trim_ambiguous_end ( Alignment alignment,
int  k 
)
private

Trim only the end of the given alignment, leaving the start alone. Two calls of this implement trim_ambiguous_ends above.

◆ trim_ambiguous_ends()

bool vg::ReadFilter::trim_ambiguous_ends ( Alignment alignment,
int  k 
)

Look at either end of the given alignment, up to k bases in from the end. See if that tail of the alignment is mapped such that another embedding in the given graph can produce the same sequence as the sequence along the embedding that the read actually has, and if so trim back the read.

In the case of softclips, the aligned portion of the read is considered, and if trimmign is required, the softclips are hard-clipped off.

Returns true if the read had to be modified, and false otherwise.

MUST NOT be called with a null index.

Member Data Documentation

◆ append_regions

bool vg::ReadFilter::append_regions = false

◆ buffer_size

int vg::ReadFilter::buffer_size = 512

GAM output buffer size.

◆ complement_filter

bool vg::ReadFilter::complement_filter = false

Actually take the complement of the filter.

◆ defray_count

int vg::ReadFilter::defray_count = 99999

Limit defray recursion to visit this many nodes.

◆ defray_length

int vg::ReadFilter::defray_length = 0

How far in from the end should we look for ambiguous end alignment to clip off?

◆ downsample_probability

double vg::ReadFilter::downsample_probability = 1.0

We can also pseudorandomly drop reads. What's the probability that we keep a read?

◆ downsample_seed_mask

uint32_t vg::ReadFilter::downsample_seed_mask = 0

Samtools-compatible internal seed mask, for deciding which read pairs to keep. To be generated with rand() after srand() from the user-visible seed.

◆ drop_split

bool vg::ReadFilter::drop_split = false

Should we drop split reads that follow edges not in the graph?

◆ excluded_features

unordered_set<string> vg::ReadFilter::excluded_features

If a read has one of the features in this set as annotations, the read is filtered out.

◆ excluded_refpos_contigs

vector<regex> vg::ReadFilter::excluded_refpos_contigs

Read must not have a refpos set with a contig name containing a match to any of these.

◆ filter_on_all

bool vg::ReadFilter::filter_on_all = false

When outputting paired reads, fail the pair only if both (all) reads fail (true) instead of if either (any) read fails (false)

◆ frac_score

bool vg::ReadFilter::frac_score = false

◆ graph

const HandleGraph* vg::ReadFilter::graph = nullptr

A HandleGraph is required for some filters (Note: ReadFilter doesn't own/free this)

◆ interleaved

bool vg::ReadFilter::interleaved = false

Interleaved input.

◆ max_overhang

int vg::ReadFilter::max_overhang = numeric_limits<int>::max() / 2

◆ min_base_quality

int vg::ReadFilter::min_base_quality = numeric_limits<int>::min() / 2

◆ min_base_quality_fraction

double vg::ReadFilter::min_base_quality_fraction = numeric_limits<double>::lowest()

◆ min_end_matches

int vg::ReadFilter::min_end_matches = numeric_limits<int>::min() / 2

◆ min_mapq

double vg::ReadFilter::min_mapq = numeric_limits<double>::lowest()

◆ min_primary

double vg::ReadFilter::min_primary = numeric_limits<double>::lowest()

◆ min_secondary

double vg::ReadFilter::min_secondary = numeric_limits<double>::lowest()

◆ name_prefixes

vector<string> vg::ReadFilter::name_prefixes

Read name must have one of these prefixes, if any are present. TODO: This should be a trie but I don't have one handy. Must be sorted for vaguely efficient search.

◆ repeat_size

int vg::ReadFilter::repeat_size = 0

◆ rescore

bool vg::ReadFilter::rescore = false

Should we rescore each alignment with default parameters and no e.g. haplotype info?

◆ sub_score

bool vg::ReadFilter::sub_score = false

◆ threads

int vg::ReadFilter::threads = -1

Number of threads from omp.

◆ verbose

bool vg::ReadFilter::verbose = false

◆ write_output

bool vg::ReadFilter::write_output = true

Sometimes we only want a report, and not a filtered gam. toggling off output speeds things up considerably.


The documentation for this class was generated from the following files: