Class LogMergePolicy
- java.lang.Object
-
- org.apache.lucene.index.MergePolicy
-
- org.apache.lucene.index.LogMergePolicy
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
- Direct Known Subclasses:
LogByteSizeMergePolicy
,LogDocMergePolicy
public abstract class LogMergePolicy extends MergePolicy
This class implements a
MergePolicy
that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor usinggetMergeFactor()
andsetMergeFactor(int)
respectively.This class is abstract and requires a subclass to define the
size(org.apache.lucene.index.SegmentInfo)
method which specifies how a segment's size is determined.LogDocMergePolicy
is one subclass that measures size by document count in the segment.LogByteSizeMergePolicy
is another subclass that measures size as the total byte size of the file(s) for the segment.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.index.MergePolicy
MergePolicy.MergeAbortedException, MergePolicy.MergeException, MergePolicy.MergeSpecification, MergePolicy.OneMerge
-
-
Field Summary
Fields Modifier and Type Field Description protected boolean
calibrateSizeByDeletes
static int
DEFAULT_MAX_MERGE_DOCS
Default maximum segment size.static int
DEFAULT_MERGE_FACTOR
Default merge factor, which is how many segments are merged at a timestatic double
DEFAULT_NO_CFS_RATIO
Default noCFSRatio.static double
LEVEL_LOG_SPAN
Defines the allowed range of log(size) for each level.protected int
maxMergeDocs
protected long
maxMergeSize
protected long
maxMergeSizeForForcedMerge
protected int
mergeFactor
protected long
minMergeSize
protected double
noCFSRatio
protected boolean
useCompoundFile
-
Fields inherited from class org.apache.lucene.index.MergePolicy
writer
-
-
Constructor Summary
Constructors Constructor Description LogMergePolicy()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
close()
Release all resources for the policy.MergePolicy.MergeSpecification
findForcedDeletesMerges(SegmentInfos segmentInfos)
Finds merges necessary to force-merge all deletes from the index.MergePolicy.MergeSpecification
findForcedMerges(SegmentInfos infos, int maxNumSegments, Map<SegmentInfo,Boolean> segmentsToMerge)
Returns the merges necessary to merge the index down to a specified number of segments.MergePolicy.MergeSpecification
findMerges(SegmentInfos infos)
Checks if any merges are now necessary and returns aMergePolicy.MergeSpecification
if so.boolean
getCalibrateSizeByDeletes()
Returns true if the segment size should be calibrated by the number of deletes when choosing segments for merge.int
getMaxMergeDocs()
Returns the largest segment (measured by document count) that may be merged with other segments.int
getMergeFactor()
Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.double
getNoCFSRatio()
boolean
getUseCompoundFile()
Returns true if newly flushed and newly merge segments are written in compound file format.protected boolean
isMerged(SegmentInfo info)
Returns true if this single info is already fully merged (has no pending norms or deletes, is in the same dir as the writer, and matches the current compound file settingprotected boolean
isMerged(SegmentInfos infos, int maxNumSegments, Map<SegmentInfo,Boolean> segmentsToMerge)
protected void
message(String message)
void
setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.void
setMaxMergeDocs(int maxMergeDocs)
Determines the largest segment (measured by document count) that may be merged with other segments.void
setMergeFactor(int mergeFactor)
Determines how often segment indices are merged by addDocument().void
setNoCFSRatio(double noCFSRatio)
If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled.void
setUseCompoundFile(boolean useCompoundFile)
Sets whether compound file format should be used for newly flushed and newly merged segments.protected abstract long
size(SegmentInfo info)
protected long
sizeBytes(SegmentInfo info)
protected long
sizeDocs(SegmentInfo info)
String
toString()
boolean
useCompoundFile(SegmentInfos infos, SegmentInfo mergedInfo)
Returns true if a new segment (regardless of its origin) should use the compound file format.protected boolean
verbose()
-
Methods inherited from class org.apache.lucene.index.MergePolicy
setIndexWriter
-
-
-
-
Field Detail
-
LEVEL_LOG_SPAN
public static final double LEVEL_LOG_SPAN
Defines the allowed range of log(size) for each level. A level is computed by taking the max segment log size, minus LEVEL_LOG_SPAN, and finding all segments falling within that range.- See Also:
- Constant Field Values
-
DEFAULT_MERGE_FACTOR
public static final int DEFAULT_MERGE_FACTOR
Default merge factor, which is how many segments are merged at a time- See Also:
- Constant Field Values
-
DEFAULT_MAX_MERGE_DOCS
public static final int DEFAULT_MAX_MERGE_DOCS
Default maximum segment size. A segment of this size or larger will never be merged. @see setMaxMergeDocs- See Also:
- Constant Field Values
-
DEFAULT_NO_CFS_RATIO
public static final double DEFAULT_NO_CFS_RATIO
Default noCFSRatio. If a merge's size is >= 10% of the index, then we disable compound file for it.- See Also:
setNoCFSRatio(double)
, Constant Field Values
-
mergeFactor
protected int mergeFactor
-
minMergeSize
protected long minMergeSize
-
maxMergeSize
protected long maxMergeSize
-
maxMergeSizeForForcedMerge
protected long maxMergeSizeForForcedMerge
-
maxMergeDocs
protected int maxMergeDocs
-
noCFSRatio
protected double noCFSRatio
-
calibrateSizeByDeletes
protected boolean calibrateSizeByDeletes
-
useCompoundFile
protected boolean useCompoundFile
-
-
Method Detail
-
verbose
protected boolean verbose()
-
getNoCFSRatio
public double getNoCFSRatio()
- See Also:
setNoCFSRatio(double)
-
setNoCFSRatio
public void setNoCFSRatio(double noCFSRatio)
If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled. Set to 1.0 to always use CFS regardless of merge size.
-
message
protected void message(String message)
-
getMergeFactor
public int getMergeFactor()
Returns the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index.
-
setMergeFactor
public void setMergeFactor(int mergeFactor)
Determines how often segment indices are merged by addDocument(). With smaller values, less RAM is used while indexing, and searches are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches is slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.
-
useCompoundFile
public boolean useCompoundFile(SegmentInfos infos, SegmentInfo mergedInfo) throws IOException
Description copied from class:MergePolicy
Returns true if a new segment (regardless of its origin) should use the compound file format.- Specified by:
useCompoundFile
in classMergePolicy
- Throws:
IOException
-
setUseCompoundFile
public void setUseCompoundFile(boolean useCompoundFile)
Sets whether compound file format should be used for newly flushed and newly merged segments.
-
getUseCompoundFile
public boolean getUseCompoundFile()
Returns true if newly flushed and newly merge segments are written in compound file format. @see #setUseCompoundFile
-
setCalibrateSizeByDeletes
public void setCalibrateSizeByDeletes(boolean calibrateSizeByDeletes)
Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.
-
getCalibrateSizeByDeletes
public boolean getCalibrateSizeByDeletes()
Returns true if the segment size should be calibrated by the number of deletes when choosing segments for merge.
-
close
public void close()
Description copied from class:MergePolicy
Release all resources for the policy.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in classMergePolicy
-
size
protected abstract long size(SegmentInfo info) throws IOException
- Throws:
IOException
-
sizeDocs
protected long sizeDocs(SegmentInfo info) throws IOException
- Throws:
IOException
-
sizeBytes
protected long sizeBytes(SegmentInfo info) throws IOException
- Throws:
IOException
-
isMerged
protected boolean isMerged(SegmentInfos infos, int maxNumSegments, Map<SegmentInfo,Boolean> segmentsToMerge) throws IOException
- Throws:
IOException
-
isMerged
protected boolean isMerged(SegmentInfo info) throws IOException
Returns true if this single info is already fully merged (has no pending norms or deletes, is in the same dir as the writer, and matches the current compound file setting- Throws:
IOException
-
findForcedMerges
public MergePolicy.MergeSpecification findForcedMerges(SegmentInfos infos, int maxNumSegments, Map<SegmentInfo,Boolean> segmentsToMerge) throws IOException
Returns the merges necessary to merge the index down to a specified number of segments. This respects themaxMergeSizeForForcedMerge
setting. By default, and assumingmaxNumSegments=1
, only one segment will be left in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so theMergeScheduler
in use may make use of concurrency.- Specified by:
findForcedMerges
in classMergePolicy
- Parameters:
infos
- the total set of segments in the indexmaxNumSegments
- requested maximum number of segments in the index (currently this is always 1)segmentsToMerge
- contains the specific SegmentInfo instances that must be merged away. This may be a subset of all SegmentInfos. If the value is True for a given SegmentInfo, that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.- Throws:
IOException
-
findForcedDeletesMerges
public MergePolicy.MergeSpecification findForcedDeletesMerges(SegmentInfos segmentInfos) throws CorruptIndexException, IOException
Finds merges necessary to force-merge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.- Specified by:
findForcedDeletesMerges
in classMergePolicy
- Parameters:
segmentInfos
- the total set of segments in the index- Throws:
CorruptIndexException
IOException
-
findMerges
public MergePolicy.MergeSpecification findMerges(SegmentInfos infos) throws IOException
Checks if any merges are now necessary and returns aMergePolicy.MergeSpecification
if so. A merge is necessary when there are more thansetMergeFactor(int)
segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing theMergeScheduler
to use concurrency.- Specified by:
findMerges
in classMergePolicy
- Parameters:
infos
- the total set of segments in the index- Throws:
IOException
-
setMaxMergeDocs
public void setMaxMergeDocs(int maxMergeDocs)
Determines the largest segment (measured by document count) that may be merged with other segments. Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches.
The default value is
Integer.MAX_VALUE
.The default merge policy (
LogByteSizeMergePolicy
) also allows you to set this limit by net size (in MB) of the segment, usingLogByteSizeMergePolicy.setMaxMergeMB(double)
.
-
getMaxMergeDocs
public int getMaxMergeDocs()
Returns the largest segment (measured by document count) that may be merged with other segments.- See Also:
setMaxMergeDocs(int)
-
-