Boyer-Moore Voting Algorithm
Links:
Questions
Variations
Find majority element (> n/2) : the classic.
Find elements > n/3 : extended version for more than one majority.
Majority in a stream : constant space, streaming data.
Weighted votes : when counts arenβt equal.
Verification step : avoid false positives when no majority exists.
Naive Algorithm
In a given array, if there exists a majority element find it i.e. n/2
The general idea of solving the problem is:
Create a HashMap which stores frequencies of each element
Iterate through the input array
Update the frequency of each element
Find the threshold i.e. n/2
Iterate over the HashMap and keep checking if any item's frequency is greater than the threshold
Return the item value whose frequency is greater than threshold
Code
Time and Space Complexity
n β Length of the array k β Size of the Hashmap Point to note: nβ₯k
Time Complexity: O(n)
Iterating over the array once - O(n)
Iterating over the hashmap - O(k)
Final = O(n)+O(k)=O(n)
Space Complexity: O(k)
Interview Setting Limitations
"Your solution is O(n) time and O(n) space. Can you do it in O(1) space?"
"What if the array is so large it doesn't fit in memory, but we can stream it? How would you find the majority element without storing every unique value?"
"Assume the majority element always exists in the input."
"Imagine you are processing a stream of data and can only keep one or two variables in memory at a time."
Boyers-Moore Voting Algorithm
These limitations can be solved using "Boyers-Moore Voting Algorithm"
Crux of the algorithm
Since the majority element appears more than half the time, it can 'outvote' all other elements combined. If we pair up two different elements and remove them, the majority element will still be the one remaining at the end.
The above algorithm is valid for majority element i.e. n/2
The Core Intuition: Pairing Off
The algorithm works by trying to pair up different elements and removing them from the total count.
If you have two different numbers, they cancel each other out.
Because the majority element appears more than 50% of the time, even if every other number in the array "sacrifices" itself to cancel out one instance of the majority element, there will still be at least one majority element left standing at the end.
The Logic of the "Counter"
The counter isn't actually counting the frequency of a number; it is counting the "surplus" of the current candidate.
When
count == 0: It means all previous elements have been paired off and cancelled out. We pick the next available number as our newcandidate.When we see the same number: We increment the count (the surplus increases).
When we see a different number: We decrement the count (one-to-one cancellation).
Basically
A HashMap tracks everything (full state).
Boyer-Moore tracks only the relationship between elements (minimal state).
You are essentially saying: "I don't need to know how many times every number appeared; I only need to know which number is currently 'winning' the cancellation war."
Crucial Note: This algorithm only guarantees finding the majority element if one actually exists. If there's a chance the array has no majority, you must perform one final pass through the array at the end to verify your candidate's total count is >n/2.
Code
Time and Space Complexity
n β Length of the array
Time Complexity: O(n)
Iterates twice over the input array O(n)+O(n)=O(2n)=O(n)
Space Complexity: O(1)
No extra space is used
Twist the scenarios
Streaming (single pass, no verification)
Use Misra-Gries (generalized BMV): maintain kβ1 candidates with counts.
Guarantees: any element with freq>(n/k) appears in candidates.
Trade-off: false positives possible; no strict majority guarantee without 2nd pass.
Big Data/Parallel Streaming
Local Sketching: Each machine independently computes a Misra-Gries summary (size kβ1) over its partition β capturing potential heavy hitters with bounded error and no false negatives.
Global Aggregation: All local summaries are merged (via shuffle/group-by) into a single multiset of candidateβcount pairs β total size is O(k Γ machines).
Global Sketching: Run Misra-Gries again on the merged candidate set to produce the final β€ kβ1 candidates β guaranteed to include any element with frequency > n/k.
Boyers-Moore Algorithm for n/3
It can be achieved by tracking two items and two counts
Time and Space Complexity
n β Length of the array
Time Complexity: O(n)
Iterates twice over the input array O(n)+O(n)=O(2n)=O(n)
Space Complexity: O(1)
No extra space is used
Misra-Gries Heavy Hitters vs Count-Min Sketch for n/k items
It is generally not expected to implement Misra-Gries algorithm
We can say:
For arbitrary k, Iβd switch to a parameterized Misra-Gries sketch β maintain a map of up to kβ1 candidates, decrement all on overflow. Itβs composable, so we can merge per-machine sketches in a reduce phase. BMV is just k=2
Misra-Gries: Better for identifying exact items (if a second pass is possible).
Count-Min Sketch: Better for frequency estimation and handling deletes (using a "Count-Min Sketch with conservative update" or "Longest Frequent Elements").
Technical Refinement:
Ensure you explicitly state that in the "reduce" phase, merging two sketches involves summing counts of common elements and then decrementing counts by the minimum shared values to maintain the 1/k property.
Last updated