Defines the "vg minimizer" subcommand, which builds the experimental minimizer index.
The index contains the lexicographically smallest kmer in a window of w successive kmers and their reverse complements. If the kmer contains characters other than A, C, G, and T, it will not be indexed.
The index contains either all or haplotype-consistent minimizers. Indexing all minimizers from complex graph regions can take a long time (e.g. 65 hours vs 10 minutes for 1000GP), because many windows have the same minimizer. As the total number of minimizers is manageable (e.g. 2.1 billion vs. 1.4 billion for 1000GP), it should be possible to develop a better algorithm for finding the minimizers.
A quick idea:
- For each node v, extract the subgraph for the windows starting in v.
- Extract all k'-mers from the subgraph and use them to determine where the minimizers can start.