A Bloom filter is a data structure that can be used to inform the user whether a particular item is part of a set. Though it cannot say with certainty whether an element is in the set, it can say with certainty if the element is not.
Created by Burton Howard Bloom in 1970, Bloom filters are an attractive offering for a range of applications due to their efficiency in space usage. In some cryptocurrencies (most notably Bitcoin), they play an integral role in Simplified Payment Verification, or SPV.
In using an SPV client, users can interact with the Bitcoin network without running a full node. Full nodes come with certain storage and computational requirements that make them unwieldy to run on low-powered devices like smartphones. SPV clients, on the other hand, simply query full nodes for information relevant to the user’s wallet(s).
The most straightforward solution to relay this data to the user would be by making full nodes aware of the client’s keys, so that only pertinent transactions are sent to them. Evidently, this is a subpar solution as the client’s privacy would be sacrificed. On the other hand, downloading all transactions only to discard the majority of them would be a waste of bandwidth. Enter Bloom filters.
Let’s illustrate. Suppose that a client, Alice, has a high-value transaction she doesn’t want Bob, who runs a full node, to be aware of. She constructs a Bloom filter, which we’ll illustrate as a 10x1 grid:
She passes the transaction data she is interested in through two different hash functions, which outputs two numbers between 0 and 9. Let’s call them 4 and 7. She sends the filter to Bob.
Looking at this grid, you have no idea what data Alice has passed to the filter. If you had a set in which the data was contained, you would be able to hash it and compare it with the filter – if there’s a match, there’s a possibility that it was the information that Alice requested.
At the same time, there’s likely to be many inputs that will map to 4 and 7, so Bob cannot determine what part of the data Alice is interested in. He simply returns all of the matches, and Alice filters them on her end.
This is, of course, a gross oversimplification, but it demonstrates the concept: Bloom filters obfuscate the true interests of the client. It’s not a perfect method (there are still concerns over its privacy), but it’s an improvement over an unshielded request to a node.