Hashing is the process of generating a fixed-sized output from a variable-sized input. This process is accomplished through a mathematical formula called a "hash function" (implemented as a hashing algorithm).
Not all hash functions involve the use of cryptography, but "cryptographic hash functions" are the core of cryptocurrency. Thanks to cryptographic hash functions, high levels of data integrity and security can be achieved in blockchain and other distributed systems.
Both traditional hash functions and cryptographic hash functions are deterministic. Determinism refers to the fact that a hashing algorithm will always produce the same output (also called a "digest" or "hash value") as long as the input does not change.
Typically, cryptocurrency hashing algorithms are designed as one-way functions, which means that without a lot of computing time and resources, these functions cannot be easily restored. In other words, it is extremely easy to create an output from an input, but it is relatively difficult to create an input from an output in reverse. In general, the harder the input is to find, the more secure the hashing algorithm is.
Different hash functions produce outputs of different sizes, but the output size of each hashing algorithm is always the same. For example, the SHA-256 algorithm can only produce a 256-bit output, while SHA-1 always produces a 160-bit digest.
To illustrate this, we ran "Binance" through the SHA-256 hashing algorithm (the algorithm used in Bitcoin) and the word “binance.”
SHA-256 | |
Enter | Output (256 bits) |
Binance | f1624fcc63b615ac0e95daf9ab78434ec2e8ffe402144dc631b055f711225191 | tr>
binance | 59bba357145ca539dcd1ac957abc1ec5833319ddcae7f5e8b5da0c36624784b2 |
Please note the slight change in capitalization will generate completely different hash values. Regardless of the length of the input value, the output using SHA-256 is always a fixed length of 256 bits (or 64 characters). Furthermore, no matter how many times the algorithm runs these two words, the output of both remains the same.
In contrast, if you run the same input through the SHA-1 hashing algorithm, you get the following results:
SHA-1 | |
Input | Output (160 bits) |
Binance | 7f0dc9146570c608ac9d6e0d11f8d409a1ee6ed1 p> |
binance | e58605c14a76ff98679322cca0eae7b3c4e08936 |
Please note that "SHA" is the acronym for Secure Hash Algorithms. The algorithm refers to a set of cryptographic hash functions, which includes the SHA-0 and SHA-1 algorithms, as well as the SHA-2 and SHA-3 sets. SHA-256, along with SHA-512 and other variants, are part of the SHA-2 group. Currently only the SHA-2 and SHA-3 groups are considered safe.
Traditional hash functions have a variety of use cases, including database lookups, large file analysis, and data management. Cryptographic hash functions are widely used in information security applications such as message authentication and digital fingerprinting. In the case of Bitcoin, cryptographic hash functions are an integral part of the mining process, affecting both new address and key generation.
Hash operations are truly powerful when processing massive amounts of information. For example, run a large file or data set through a hash function and then use its output to quickly verify the accuracy and completeness of the data. Hash functions work because they are deterministic: an input always produces a compressed, simplified output (i.e., a hash value). This technology eliminates the need to store and "remember" large amounts of data.
Hash operation is particularly practical in the field of blockchain technology. The Bitcoin blockchain has many operations involving hash operations, most of which are part of the mining process. In fact, almost all cryptocurrency protocols rely on hashing to connect groups of transactions and compress them into blocks, while generating cryptographic links between individual blocks, effectively creating a blockchain.
Similarly, the hash function deploying the cryptographic technology can be defined as a cryptographic hash function. Generally speaking, breaking a cryptographic hash function requires countless brute force attempts. To "restore" a cryptographic hash function, guess the input through trial and error until the corresponding output is generated. However, it is also possible for different inputs to produce exactly the same output, so a "conflict" occurs.
Technically speaking, cryptographic hash functions need to have three properties to effectively ensure security. The three major attributes are anti-collision, anti-pre-image and anti-second pre-image.
Before discussing each attribute individually, let's briefly summarize the logic of each attribute.
Collision resistance: It is impossible for any two different inputs to generate the same hash value output.
Anti-imageability: It is impossible to "restore" a hash function (i.e. the input cannot be found from a given output).
Second preimage resistance: It is impossible to find another input that conflicts with a specific input.
As mentioned before, different inputs generate the exact same hash A conflict occurs when the value is exceeded. Therefore, a hash function is considered collision-resistant as long as no one detects a collision. Note that since the possible inputs are infinite and the possible outputs are finite, there are always collisions with hash functions.
Assuming that the probability of finding a collision is as low as millions of years, it can be said that the hash function is collision resistant sex. Therefore, while there are no collision-free hash functions in reality, some of them (such as SHA-256) are considered collision-resistant as long as they are strong enough.
In various SHA algorithms, the SHA-0 and SHA-1 groups have conflicts, so they are no longer safe. Currently the SHA-2 and SHA-3 sets are considered collision resistant.
Anti-imageability property is related to the concept of one-way function. A hash function is said to be anti-imagery, assuming that the probability of finding an input that produces a specific output is extremely low.
Note that an attacker will look at the given output to guess the input, so this property is different from collision resistance. Furthermore, two different inputs producing the same output will conflict, but it doesn't really matter which input is used.
Anti-imageability properties are valuable for protecting data because there is no need to disclose information and a simple hash of the message can prove its authenticity . In practice, many service providers and web applications store and use hashes generated from passwords rather than the password in clear text.
In short, second preimage resistance is somewhere between the two properties mentioned above. A second preimage attack occurs if someone can find a specific input that produces an output that is the same as another known input.
In other words, the second preimage attack is related to finding conflicts, but not searching for two random inputs that generate the same hash value , instead, given a specific input, search for another input that generates the same hash value.
Second preimage attack usually means there is a conflict. Therefore, any collision-resistant hash function is also resistant to second preimage attacks. However, being collision-resistant means that a single input can be found from a single output, so an attacker can still launch a preimage attack against a collision-resistant function.
Hash functions are used in multiple steps of Bitcoin mining, such as checking balances, connecting transaction inputs and outputs, and hashing the same area Transactions within a block to form a Merkle tree. However, one of the main reasons why the Bitcoin blockchain is secure is that miners need to perform countless hashing operations before they can finally find a valid solution for the next block.
Specifically, miners must try several different inputs when creating candidate block hashes. Essentially, miners can only validate a block if the generated output hash starts with a certain number of zeros. The number of zeros determines the mining difficulty and changes with the hash rate dedicated to the network.
In this case, the hash rate represents the amount of computing power invested in Bitcoin mining. If the network's hash rate increases, the Bitcoin protocol will automatically adjust the mining difficulty so that the average time required to generate a block remains close to 10 minutes. On the contrary, if many miners decide to stop mining, causing the hash rate to drop significantly, the mining difficulty will be lowered until the average block time returns to 10 minutes.
Please note that miners can generate multiple hashes as valid output (starting with a certain number of zeros), so they do not need Find conflicts. There are multiple possible solutions to a block, but depending on the mining difficulty threshold, miners only have to find one of the solutions.
Bitcoin mining is a high-cost task. There is no need for miners to cheat the system, which will cause significant economic losses. The more miners join the blockchain, the larger and more powerful the blockchain becomes.
There is no doubt that the hash function is an indispensable tool in computer science, and its ability to process massive data is particularly outstanding. Combined with cryptography, hashing algorithms can be used in a variety of ways to provide security and authentication in a variety of ways. For almost all cryptocurrency networks, cryptographic hash functions are crucial. Therefore, if you are interested in blockchain technology, it will be of great benefit to understand the properties and operating mechanism of cryptographic hash functions.