HASH Algorithms: What are They, Security, Use and Operation

A cryptographic hash function is a mathematical algorithm that transforms any incoming data into a series of output characters, with a fixed or variable length, depending on the hash algorithm that we are using. In hashing algorithms with fixed output length, this length will be the same regardless of the size of the input data. Hash algorithms that are specifically designed to protect passwords are often variable. Today in this article we are going to explain everything you need to know about hashes.

What are hashes used for?

Cryptographic hashes are mainly used to protect passwords and not save them in clear text in a database . If you’ve ever read anything about hash functions, most likely it was about their main use, protecting passwords to avoid storing them in clear text. Let’s imagine that cybercriminals are capable of violating a service and stealing its database, if the passwords were not hashed, their credentials would be immediately exposed.

To verify that we have correctly entered a password that is stored in a database (the hash of the key is stored), what is done is to apply the hash algorithm to the entered password and compare it with the stored one, if it is the same, the key is correct, if different, key is wrong. This procedure is used in all operating systems, websites with user / password authentication, etc.

If you ever have to recover or re-obtain your password from an online service, you will have to reset it, because even the service itself will not be able to provide you with the password in clear text, but will only store the password hash. If in any service you have asked to recover the password, and they offer it to you in plain text, that means that they are stored that way, and it is not safe to use that service. Although the typical 123456 passwords have well-known hashes, as soon as we put a robust key, it will not be in any online hashing system, and we will have to crack it ourselves with tools such as Hashcat among others.

Not all uses of hashing algorithms are for passwords, cryptographic hashing functions are also used to detect malware, they can be used to detect different songs or movies protected by copyright, and create blacklists. There are also public lists of malware , they are known as malware signatures, they are made up of hash values of complete parts or small parts of malware. So, if on the one hand, a user detects a suspicious file, he can consult these public hash databases, and in this way, know if it is a malicious file or if it does not have any danger, in turn, by On the other hand, they also serve so that antivirus detects and blocks malware by comparing the hashes of their own databases and the public ones we are talking about.

Another important use of cryptographic hashing functions is to ensure the integrity of messages . The way to use them for this purpose is to check the hashes created before and after the data transmission , in this way, if the hashes are totally identical it will mean that the communication has been secure and that the data has not been altered, otherwise , something has gone wrong and the data obtained at the end of the communication are not the same as those that were issued at the beginning.

Now that we know everything about hash functions, let’s see which ones are the most used today.

SHA2

In its beginnings, the SHA algorithm (Secure Hash Algorithm or Secure Hash Algorithm) was created by the NSA and NIST with the aim of generating hashes or unique codes based on a standard. In 1993 the first SHA protocol was born, also called SHA-0, but it was hardly used and did not have much impact. A couple of years later, a more robust and secure improved variant, SHA-1, was released and has been used for many years to sign SSL / TLS digital certificates for millions of websites. A few years later SHA-2 was created, which has four variants depending on the number of output bits, they are SHA2-224, SHA2-256, SHA2-384 and SHA2-512 . Currently, for security reasons, SHA1 is no longer used, but it is highly recommended to use SHA2 or SHA3 (within the SHA family).

How SHA2 works

The hashing algorithms only work in one direction, we can generate the hash of any content, or the fingerprint, but with the hash or the fingerprint there is no way to generate the initial content. The only way to do it is by dictionary or brute force, so it could take us thousands of years (currently) to get the initial information.

Among the many and different ways to create hashes, the SHA2-256 algorithm is one of the most used thanks to its balance between security and speed, it is a very efficient algorithm and has a high resistance to collisions, something very important to maintain security. of this hashing algorithm. For a hashing algorithm to be secure, no collisions must be known. For example, the method of verifying Bitcoins is based on SHA2-256.

Characteristics of the different types of SHA2

Output size : it is the size of characters that will form the hash.
Internal state size : it is the internal hash sum, after each compression of a block of data.
Block size : is the size of the block handled by the algorithm.
Maximum message size: it is the maximum size of the message on which we apply the algorithm.
Word length: it is the length in bits of the operation applied by the algorithm in each round.
Interactions or rounds : it is the number of operations that the algorithm performs to obtain the final hash.
Supported operations : these are the operations carried out by the algorithm to obtain the final hash.

SHA-256

It has 256-bit output size, 256-bit internal state size, 512-bit block size, the maximum message size it can handle is 2 ⁶⁴ – 1, the word length is 32 bits , and the number of rounds applied is 64, as well as the operations applied to the hash are +, and, or, xor, shr and rot. The length of the hash is always the same, no matter how large the content you use to generate the hash is: whether it’s just one letter or a 4GB ISO image, the result will always be a sequence of 40 letters and numbers.

SHA2-384

This algorithm is different in terms of characteristics, but its operation is the same. It has 384 bit output size, 512 bit internal state size, 1024 bit block size, the maximum message size it can handle is 2 ¹²⁸ – 1, the word length is 64 bit , and the number of rounds applied is 80, as well as the operations applied to the hash are +, and, or, xor, shr and rot. This algorithm is a more secure version than SHA2-256, since more rounds of operations are applied and it can also be applied on more extensive information. This hashing algorithm is commonly used to verify message integrity and authenticity in virtual private networks. A negative aspect is that it is somewhat slower than SHA2-256, but in certain circumstances it can be a very good option to use this.

SHA2-512

As in all SHA2s, the operation is the same, they change only one characteristic. It has an output size of 512 bits. All other features are the same as SHA2-384. 512 bits of internal state size, 1024 bits of block size, 2 ¹²⁸ – 1 for the maximum size of the message, 64 bits of word length, and 80 is the number of rounds applied to it. This algorithm also applies the same operations on each round +, and, or, xor, shr, and rot.

SHA2-224

We have not commented on this algorithm as the main one, because its older brother (SHA2-256) is used much more, since the computational difference between the two is ridiculous and SHA2-256 is much more standardized. We mention this because, at least so far, no collisions have been found for this algorithm, making it a safe and usable option.

In the following table we will be able to verify much better the differences between all the algorithms based on their characteristics.

You will see that the MD5, SHA-0 and SHA-1 hash algorithms previously appear in the table, we have left them out because, although they have been used a long time ago, collisions have already been found and it is no longer safe to use them , so in SHA2, in all its variants, and SHA3 are currently used.

To clarify the concept of collision and to understand it correctly, we explain that, in computing, a hash collision is a situation that occurs when two different inputs to a hash function produce the same output.

SHA-3

SHA3 is the newest SHA family hashing algorithm, it was published by the NISH in 2015, but it is not being widely used yet. Although it is part of the same family, its internal structure is quite different. This new hashing algorithm is based on “sponge construction .” The construction of this sponge is based on a random function or random permutation of data, it allows to enter any amount of data and generate any amount of data, in addition, the function is pseudo-random with respect to all the previous entries. This allows SHA-3 to have great flexibility, the objective is to replace SHA2 in the typical TLS or VPN protocols that use this hashing algorithm to verify the integrity of the data and the authenticity of the same.

SHA-3 was born as an alternative to SHA2, but not because using SHA-2 is unsafe, but because they wanted to have a plan B in case of a successful attack against SHA2, in this way, both SHA-2 and SHA-3 will coexist For many years, in fact, SHA-3 is not used massively as it is with SHA-2.

Operation and characteristics

SHA-3 uses a “sponge” construction, the data is “absorbed” and processed to display an output of the desired length. In the data absorption phase, the XOR operation is used and then transformed into a permutation function. SHA-3 allows us to have additional bits of information, to protect the hash function from extension attacks, something that happens with MD5, SHA-1 and SHA-2. Another important feature is that it is very flexible, making it possible to test cryptanalytic attacks and use it in light applications. Currently SHA2-512 is twice as fast as SHA3-512, but the latter could be implemented through hardware, for which then could be just as fast and even faster.

KDF hash algorithms

The difference between KDF (Key Derivation Function) and a password hashing function is that the length with KDF is different, whereas a password hashing function will always have the same output length. Depending on whether we are hashing encryption keys or passwords stored in a database, it is advisable to use some hashing algorithms or others. For example, in the case of stored passwords, it is recommended that the hashing algorithm takes a time of, for example, 5 seconds to calculate, but then it is very robust and very expensive to crack.

Less experienced developers who do not know all the possibilities of KDF hashing algorithms will think that the generic one-way, fixed-length and collision-resistant cryptographic hash functions such as SHA2-256 or SHA2-512 are better, without thinking twice about the possible problem that they may have. The problem with fixed-length hashes is that they are fast, this allows an attacker to crack the password very quickly with a powerful computer. Variable-length hashes are slower, this is ideal for password crackers to take longer to obtain.

The crypto community came together to introduce hashing functions designed specifically for passwords, where a ‘cost’ is included. The key derivation functions were also designed with a “cost.” Building on password-based key derivation functions and hashing functions designed specifically for passwords, the community designed various algorithms for use in password protection.

The most popular algorithms for protecting passwords are:

Argon2 (KDF)
scrypt (KDF)
bcrypt
PBKDF2 (KDF)

The main difference between a KDF and a password hashing function is that the length with KDFs is arbitrary, and typical password hashes like MD5, SHA-1, SHA2-256, SHA2-512 have an output of fixed length.

For password storage, the threat is that the key database is leaked to the Internet, and password crackers around the world work on the database hashes to recover the passwords.

Taking as an example the storage of passwords in a database, when we log in to access a website, it is always necessary that the hashing of the key be done quickly, so as not to have to wait without being able to access, but this supposes a The problem is that it could be cracked faster, especially if we use the power of the GPUs together with Hashcat.

bcrypt, sha256crypt, sha512crypt and PBKDF2

In the following table there is a comparison of several widely used hashing algorithms, with their corresponding cost in one table. You will see that the green row is highlighted where a possible work factor could mean spending 0.5 seconds in hashing the password, which is a pretty good relationship, and a red row where a possible work factor could mean spending a full 5 seconds creating a password-based encryption key, which is bad for loss of efficiency.

Note that for bcrypt this means that for password hashing, a factor of 13 would provide a cost of approximately 0.5 seconds to encrypt the password, while a factor of 16 would approach a cost of approximately 5 seconds to create a key-based password. For sha256crypt, sha512crypt, and PBKDF2, that appears to be roughly 640,000 and 5,120,000 iterations respectively.

scrypt

When we think about moving to scrypt it’s because things are getting a bit more difficult. With bcrypt, sha256crypt, sha512crypt, and PBKDF2, our cost is entirely a CPU load factor, the higher the processing power, the higher the algorithm efficiency. The bad part is that they still fall victim to algorithm-specific FPGAs and ASICs. To combat this, a memory cost can be included. With scrypt we will have a cost of both CPU and RAM.

In the following table you can see a comparison with different cost values.

These tests have been carried out with a single processor quad-core CPU, an attempt has been made to limit the «p» cost to 1, 2 and 4. The use of RAM has also been limited and thus not having to interrupt the rest ongoing actions that were being carried out. Therefore, the “r” cost has been limited to 4, 8 and 16 multiplied by 128 bytes (512 bytes, 1024 bytes and 2048 bytes).

Argon2

Argon2 has two different versions: Argon2d and Argon2i; the first depends on the data (d) and the second is independent of the data (i). The former is supposed to be resistant to GPU cracking, while the latter is supposed to be resistant to side channel attacks. In other words, Argon2d would be suitable for hashing passwords , while Argon2i would be suitable for derivation of encryption keys .

Argon2 has a CPU cost and a RAM cost, both are handled separately. CPU cost is handled through standard iterations, as with bcrypt or PBKDF2, and RAM cost is handled by specifically increasing memory. When testing with this algorithm began, it was found that simply manipulating the iterations ended up looking a lot like bcrypt, but in turn, the total time it took to calculate the hash could be affected by simply manipulating the memory. Combining the two, iterations were found to affect CPU cost more than RAM cost, but both had a significant share of computation time, as can be seen in the tables below. As with scrypt, it also has a parallelization cost, which defines the number of threads you want to work on the problem:

The note to take into account in this parameterization process is that the cost of RAM varies between 256 KiB and 16 MiB, in addition to the number of iterations and the cost of counting the processor. As we increase the RAM used in parameterization, we can reduce our iteration cost. Since we need more threads to work on the hash, we can further reduce that iteration. So the two concepts that are covered result in that, independently, you are trying to target 0.5 seconds for an interactive password login, and a full 5 seconds for the derivation of the password-based encryption key.

Conclusion

We can summarize the use of these hashing algorithms as follows: when hashing passwords, either to store them on disk or to create encryption keys, password-based cryptographic codes should be used, specifically designed for the problem at hand. General-purpose hash functions of any kind should not be used, due to their speed. Also, they should not implement their own “key stretching” algorithm, such as recursive hashing of their password digest and additional output.

Therefore, if we take into account that, if the algorithm was designed specifically to handle passwords, and the cost is sufficient to cover the needs, threat model and adversary, then we can say, without a doubt, that we are doing it well. . Actually, we will not be wrong if we choose any of them, we simply have to be clear about the use that we are going to give it, in order to avoid any algorithm that is not specifically designed for passwords, which will strengthen the security of them.

Now you have a clear idea of which algorithms are used today, we have explained the operation of each algorithm and even the processing costs so that we can be clear about which one to use depending on the situation. What has become clear is that they are all used for a clear common objective, our protection, both the fixed algorithms based on hash and the variables are used to protect information, since as you know, information is power. Thanks to them, our passwords, files and data transmissions are safe from any external agent who wants to know them.