|
|
|
|
|
Marzullo's AlgorithmMarzullo's algorithm, invented by Keith Marzullo for his Ph.D. dissertation, is an agreement algorithm used to select sources for estimating accurate time from a number of noisy time sources. A refined version of it, renamed the "Intersection algorithm", forms part of the modern Network Time Protocol. Purpose In general, Marzullo's algorithm is an algorithm, efficient in terms of time, for finding an optimal interval from a set of estimates with confidence intervals where the actual value may be outside the confidence interval for some sources. In this case the best estimate is taken to be the smallest interval consistent with the largest number of source. If we have the estimates 10 +/- 2, 12 +/- 1 and 11 +/- 1 then these intervals are 8,12 11,13 and 10,12 which intersect to form 11,12 or 11.5 +/- 0.5 as consistent with all the values. If the ranges were instead 8,12 11,13 and 14,15 then there is no interval consistent with all these values but 11,12 is consistent with the largest number of sources (2). Finally, if the ranges were 8,9 8,12 10,12 then both the intervals 8,9 and 10,12 would be consistent with the largest number of sources. This procedure determines an interval. If the desired result is a best value from that interval then a naive approach would be to take the center of the interval as the value, this is what was specified in the original Marzullo algorithm. A more sophisticated approach would recognize that this could be throwing away useful information from the confidence intervals of the sources and that a probabilistic model of the sources could return a value other than the center. Method Marzullo's algorithm begins by preparing a table of the sources, sorting it and then searching (efficiently) for the intersections of intervals. For each source there is a range c-r,c+r defined by c +/- r. For each range the table will have two tuples of the form . One tuple will represent the beginning of the range, marked with type -1 as and the other will represent the end with type +1 as . The description of the algorithm uses the following variables: best (largest number of overlapping intervals found), cnt (current number of overlapping intervals), beststart and bestend (the beginning and end of best interval found so far), i (an index), and the table of tuples. 0) Build the table of tuples. 1) Sort the table by the offset. (If two tuples with the same offset but opposite types exist, indicating that one interval ends just as another begins, then a method of deciding which comes first is necessary. Such an occurrence can be considered an overlap with no duration, which can be found by the algorithm by putting type -1 before type +1. If such pathological overlaps are considered objectionable they can be avoided by putting type +1 before -1 in this case .) 2)initialize best=0 cnt=0 3)loop go through each tuple in the table in ascending order - 4)number of overlapping intervals cnt=cnt-typei
- 5)if cnt>best then best=cnt beststart=offseti bestend=offseti+1
- commentary: the next tuple, at i+1, will either be an end of an interval (type=+1) in which case it ends this best interval, or it will be a beginning of an interval (type=-1) and in the next step will replace best.
- ambiguity: unspecified is what to do if best=cnt. This is a condition of a tie for greatest overlap. The decision can either be made to take the smaller of bestend-beststart or offseti+1-offseti or just take an arbitrary one of the two equally good entries.
6)loop return beststart,bestend as optimal interval. The number of false sources (ones which do not overlap the optimal interval returned) is the number of sources minus the value of best. Efficiency Marzullo's algorithm is efficient in both space and time. Where m is the number of source the asymptotic space usage is O(m). In considering the asymptotic time requirement the algorithm can be considered to consist of building the table, sorting it and searching it. Sorting can be done in O(m log m) time, and this dominates the building and searching phases which can be performed in linear time. Therefore the time efficiency of Marzullo's algorithm is O(m log m). Once the table has been built and sorted it is possible to update the interval for one source (when new information is received) in linear time. Therefore, updating data for one source and finding the best interval can be done in O(m) time. References - K. A. Marzullo. Maintaining the Time in a Distributed System: An Example of a Loosely-Coupled Distributed Service. Ph.D. dissertation, Stanford University, Department of Electrical Engineering, February 1984.
External links
|
 |
|
| Copyright 2005-2009 OnPedia.com. All Rights Reserved |
|
|