#include <readfilter.hpp>
◆ filter()
int vg::ReadFilter::filter |
( |
istream * |
alignment_stream | ) |
|
Filter the alignments available from the given stream, placing them on standard output or in the appropriate file. Returns 0 on success, exit code to use on error.
◆ filter_alignment()
Run all the filters on an alignment. The alignment may get modified in-place by the defray filter
◆ has_repeat()
bool vg::ReadFilter::has_repeat |
( |
Alignment & |
aln, |
|
|
int |
k |
|
) |
| |
|
private |
* quick and dirty filter to see if removing reads that can slip around and still map perfectly helps vg call. returns true if at either end of read sequence, at least k bases are repetitive, checking repeats of up to size 2k
◆ is_split()
bool vg::ReadFilter::is_split |
( |
Alignment & |
alignment | ) |
|
|
private |
Return false if the read only follows edges in the graph, and true if the read is split (or just incorrect) and takes edges not in the index.
Throws an error if no graph is specified.
◆ sample_read()
bool vg::ReadFilter::sample_read |
( |
const Alignment & |
read | ) |
|
|
private |
Based on the read name and paired-ness, compute the SAM-style QNAME and use that and the configured sampling probability and seed in the Samtools read sampling algorithm, to determine if the read should be kept. Returns true if the read should stay, and false if it should be removed. Always accepts or rejects paired reads together.
◆ trim_ambiguous_end()
bool vg::ReadFilter::trim_ambiguous_end |
( |
Alignment & |
alignment, |
|
|
int |
k |
|
) |
| |
|
private |
Trim only the end of the given alignment, leaving the start alone. Two calls of this implement trim_ambiguous_ends above.
◆ trim_ambiguous_ends()
bool vg::ReadFilter::trim_ambiguous_ends |
( |
Alignment & |
alignment, |
|
|
int |
k |
|
) |
| |
Look at either end of the given alignment, up to k bases in from the end. See if that tail of the alignment is mapped such that another embedding in the given graph can produce the same sequence as the sequence along the embedding that the read actually has, and if so trim back the read.
In the case of softclips, the aligned portion of the read is considered, and if trimmign is required, the softclips are hard-clipped off.
Returns true if the read had to be modified, and false otherwise.
MUST NOT be called with a null index.
◆ append_regions
bool vg::ReadFilter::append_regions = false |
◆ buffer_size
int vg::ReadFilter::buffer_size = 512 |
◆ complement_filter
bool vg::ReadFilter::complement_filter = false |
Actually take the complement of the filter.
◆ defray_count
int vg::ReadFilter::defray_count = 99999 |
Limit defray recursion to visit this many nodes.
◆ defray_length
int vg::ReadFilter::defray_length = 0 |
How far in from the end should we look for ambiguous end alignment to clip off?
◆ downsample_probability
double vg::ReadFilter::downsample_probability = 1.0 |
We can also pseudorandomly drop reads. What's the probability that we keep a read?
◆ downsample_seed_mask
uint32_t vg::ReadFilter::downsample_seed_mask = 0 |
Samtools-compatible internal seed mask, for deciding which read pairs to keep. To be generated with rand() after srand() from the user-visible seed.
◆ drop_split
bool vg::ReadFilter::drop_split = false |
Should we drop split reads that follow edges not in the graph?
◆ excluded_features
unordered_set<string> vg::ReadFilter::excluded_features |
If a read has one of the features in this set as annotations, the read is filtered out.
◆ excluded_refpos_contigs
vector<regex> vg::ReadFilter::excluded_refpos_contigs |
Read must not have a refpos set with a contig name containing a match to any of these.
◆ filter_on_all
bool vg::ReadFilter::filter_on_all = false |
When outputting paired reads, fail the pair only if both (all) reads fail (true) instead of if either (any) read fails (false)
◆ frac_score
bool vg::ReadFilter::frac_score = false |
◆ graph
A HandleGraph is required for some filters (Note: ReadFilter doesn't own/free this)
◆ interleaved
bool vg::ReadFilter::interleaved = false |
◆ max_overhang
int vg::ReadFilter::max_overhang = numeric_limits<int>::max() / 2 |
◆ min_base_quality
int vg::ReadFilter::min_base_quality = numeric_limits<int>::min() / 2 |
◆ min_base_quality_fraction
double vg::ReadFilter::min_base_quality_fraction = numeric_limits<double>::lowest() |
◆ min_end_matches
int vg::ReadFilter::min_end_matches = numeric_limits<int>::min() / 2 |
◆ min_mapq
double vg::ReadFilter::min_mapq = numeric_limits<double>::lowest() |
◆ min_primary
double vg::ReadFilter::min_primary = numeric_limits<double>::lowest() |
◆ min_secondary
double vg::ReadFilter::min_secondary = numeric_limits<double>::lowest() |
◆ name_prefixes
vector<string> vg::ReadFilter::name_prefixes |
Read name must have one of these prefixes, if any are present. TODO: This should be a trie but I don't have one handy. Must be sorted for vaguely efficient search.
◆ repeat_size
int vg::ReadFilter::repeat_size = 0 |
◆ rescore
bool vg::ReadFilter::rescore = false |
Should we rescore each alignment with default parameters and no e.g. haplotype info?
◆ sub_score
bool vg::ReadFilter::sub_score = false |
◆ threads
int vg::ReadFilter::threads = -1 |
Number of threads from omp.
◆ verbose
bool vg::ReadFilter::verbose = false |
◆ write_output
bool vg::ReadFilter::write_output = true |
Sometimes we only want a report, and not a filtered gam. toggling off output speeds things up considerably.
The documentation for this class was generated from the following files: