Boyer-Moore Voting Algorithm

Questions

Variations

  1. Find majority element (> n/2) : the classic.

  2. Find elements > n/3 : extended version for more than one majority.

  3. Majority in a stream : constant space, streaming data.

  4. Weighted votes : when counts aren’t equal.

  5. Verification step : avoid false positives when no majority exists.


Naive Algorithm

circle-check

The general idea of solving the problem is:

  1. Create a HashMap which stores frequencies of each element

  2. Iterate through the input array

  3. Update the frequency of each element

  4. Find the threshold i.e. n/2n/2

  5. Iterate over the HashMap and keep checking if any item's frequency is greater than the threshold

  6. Return the item value whose frequency is greater than threshold

Code

Time and Space Complexity

circle-info

nn βž” Length of the array kk βž” Size of the Hashmap Point to note: nβ‰₯kn\ge k

  • Time Complexity: O(n)O(n)

    • Iterating over the array once - O(n)O(n)

    • Iterating over the hashmap - O(k)O(k)

    • Final = O(n)+O(k)=O(n)O(n) + O(k) = O(n)

  • Space Complexity: O(k)O(k)

Interview Setting Limitations

  • "Your solution is O(n)O(n) time and O(n)O(n) space. Can you do it in O(1)O(1) space?"

  • "What if the array is so large it doesn't fit in memory, but we can stream it? How would you find the majority element without storing every unique value?"

  • "Assume the majority element always exists in the input."

  • "Imagine you are processing a stream of data and can only keep one or two variables in memory at a time."


Boyers-Moore Voting Algorithm

These limitations can be solved using "Boyers-Moore Voting Algorithm"

Crux of the algorithm

Since the majority element appears more than half the time, it can 'outvote' all other elements combined. If we pair up two different elements and remove them, the majority element will still be the one remaining at the end.

circle-info

The above algorithm is valid for majority element i.e. n/2n/2

The Core Intuition: Pairing Off

The algorithm works by trying to pair up different elements and removing them from the total count.

  • If you have two different numbers, they cancel each other out.

  • Because the majority element appears more than 50% of the time, even if every other number in the array "sacrifices" itself to cancel out one instance of the majority element, there will still be at least one majority element left standing at the end.

The Logic of the "Counter"

The counter isn't actually counting the frequency of a number; it is counting the "surplus" of the current candidate.

  • When count == 0: It means all previous elements have been paired off and cancelled out. We pick the next available number as our new candidate.

  • When we see the same number: We increment the count (the surplus increases).

  • When we see a different number: We decrement the count (one-to-one cancellation).


Basically

  • A HashMap tracks everything (full state).

  • Boyer-Moore tracks only the relationship between elements (minimal state).

  • You are essentially saying: "I don't need to know how many times every number appeared; I only need to know which number is currently 'winning' the cancellation war."

Crucial Note: This algorithm only guarantees finding the majority element if one actually exists. If there's a chance the array has no majority, you must perform one final pass through the array at the end to verify your candidate's total count is >n/2> n/2.

Code

Time and Space Complexity

circle-info

nn βž” Length of the array

  • Time Complexity: O(n)O(n)

    • Iterates twice over the input array O(n)+O(n)=O(2n)=O(n)O(n) + O(n) = O(2n) = O(n)

  • Space Complexity: O(1)O(1)

    • No extra space is used


Twist the scenarios

  1. Streaming (single pass, no verification)

    • Use Misra-Gries (generalized BMV): maintain kβˆ’1k-1 candidates with counts.

    • Guarantees: any element with freq>(n/k)freq\gt (n/k) appears in candidates.

    • Trade-off: false positives possible; no strict majority guarantee without 2nd pass.

  2. Big Data/Parallel Streaming

    1. Local Sketching: Each machine independently computes a Misra-Gries summary (size kβˆ’1) over its partition β€” capturing potential heavy hitters with bounded error and no false negatives.

    2. Global Aggregation: All local summaries are merged (via shuffle/group-by) into a single multiset of candidate–count pairs β€” total size is O(k Γ— machines).

    3. Global Sketching: Run Misra-Gries again on the merged candidate set to produce the final ≀ kβˆ’1 candidates β€” guaranteed to include any element with frequency > n/k.


Boyers-Moore Algorithm for n/3n/3

It can be achieved by tracking two items and two counts

Time and Space Complexity

circle-info

nn βž” Length of the array

  • Time Complexity: O(n)O(n)

    • Iterates twice over the input array O(n)+O(n)=O(2n)=O(n)O(n) + O(n) = O(2n) = O(n)

  • Space Complexity: O(1)O(1)

    • No extra space is used


Misra-Gries Heavy Hitters vs Count-Min Sketch for n/kn/k items

It is generally not expected to implement Misra-Gries algorithm

We can say:

For arbitrary k, I’d switch to a parameterized Misra-Gries sketch β€” maintain a map of up to kβˆ’1 candidates, decrement all on overflow. It’s composable, so we can merge per-machine sketches in a reduce phase. BMV is just k=2

  • Misra-Gries: Better for identifying exact items (if a second pass is possible).

  • Count-Min Sketch: Better for frequency estimation and handling deletes (using a "Count-Min Sketch with conservative update" or "Longest Frequent Elements").

Technical Refinement:

Ensure you explicitly state that in the "reduce" phase, merging two sketches involves summing counts of common elements and then decrementing counts by the minimum shared values to maintain the 1/k1/k property.

Last updated