Area, Pipelining, Integration: A Comparison of SHA-2 and SHA-3 for embedded Systems

Written by Editoral Team | Apr 9, 2026 6:15:00 AM

Content

1. Introduction

2. Common Misconceptions

3. SHA-2 vs. SHA-3 in embedded Systems

4. Underlying Design Differences

4.1 SHA-2: Iterative Compression over Fixed-Size Blocks

4.2 SHA-3: Sponge Construction with Large Internal State

4.3 Key Consequences for System Design

5. When to use SHA-2 vs. SHA-3

6. Conclusion

1. Introduction

For years, SHA-2 has been the most common way to do cryptographic hashing in embedded systems. It is an important part of protocols and architectures like TLS, secure boot chains, firmware validation, and software update systems. SHA-2 is not a design choice for many developers, it is part of the baseline.

SHA-3 is also becoming more important at the same time. It can be found in newer security architectures, designs that aim for long lifecycle robustness, and situations where post-quantum considerations are already affecting system design choices. SHA-3 is not directly linked to post-quantum cryptography, but it's often looked at with it as part of bigger efforts to modernize cryptography.

This makes developers ask a practical question: Which one should I use in my system? The answer isn't just about cryptography. It depends on the architecture, integration issues, hardware resources, and how easy it will be to keep it up to date in the long run. This article explains the technical differences and real-world effects of embedded systems.

2. Common Misconceptions

There are several common misconceptions when it comes to comparing SHA-2 and SHA-3.

“SHA-3 replaces SHA-2”: SHA-3 was designed as an alternative, not a successor. Both are standardized and widely accepted.
“SHA-3 is more secure”: Both options offer robust security when used correctly. The distinction lies in their construction and their approach, rather than a simple "stronger vs weaker" comparison.
"Hashing is easy to implement": In embedded systems, hashing affects datapath design, memory usage, latency and interface behaviour. It is really important to get the details of implementing it right, especially when you don't have a lot of resources to work with.

KiviHash IP Core Family

If you want to explore how these concepts translate into real, integration-ready IP cores, take a look at the KiviHash product family. It includes compact, hardware-efficient implementations of SHA-2 and SHA-3 for FPGA and ASIC integration, with a focus on clear interfaces, and fast time-to-integration.

3. SHA-2 vs. SHA-3 in embedded Systems

For SoC developers, the differences between SHA-2 and SHA-3 become particularly visible when looking at hardware implementation, performance, and system integration.

SHA-2 is generally straightforward to implement due to its regular datapath structure, predictable control flow, and efficient pipelining capabilities, combined with a moderate internal state size. This makes it well suited for high-throughput designs, low-complexity control logic, and area-constrained implementations.

In contrast, SHA-3 often introduces additional complexity through its large 1600-bit internal state, its reliance on permutation rounds, and more demanding routing caused by bitwise transformations. As a result, SHA-3 typically requires more complex control logic, offers less straightforward pipelining, and increases pressure on routing resources.

These architectural differences also impact performance. In software, SHA-2 is often faster on general-purpose CPUs, as it benefits from decades of optimization and existing hardware acceleration support. In hardware, it enables efficient pipelining with high throughput and relatively low implementation overhead. SHA-3, on the other hand, is more flexible but also more resource-intensive. Its performance depends heavily on how the permutation is implemented, both in software and hardware, making it more sensitive to design quality and optimization effort.

From a system integration perspective, SHA-2 follows a block-based data handling model with fixed-size input chunks, which aligns naturally with typical embedded architectures. Both, SHA-2 and SHA-3 can be exposed through streaming APIs, but internally SHA-2 processes fixed-size message blocks with a compression function, whereas SHA-3 uses sponge absorb/squeeze phases with a rate/capacity split. In systems with CPUs, hardware accelerators, and memory subsystems (e.g., DMA or FIFO), this leads to different requirements in terms of AXI/FIFO interfaces, data alignment, and backpressure handling. In many embedded designs, SHA-2 may be easier to integrate and pipeline, but area, throughput, routing, and control complexity are strongly implementation- and technology-dependent.

Aspect	SHA-2	SHA-3
Datapath Structure	Regular	More complex (permutation-based)
Control Logic	Simple	More complex
State Size	Moderate	Large (1600 bits)
Area	Typically smaller	Larger
Software Performance	Highly optimized, often faster	implementation-dependent
Flexibility	Limited	High (XOF, variable output)
Accelerator Design	Easier	More complex

In software on mainstream CPUs, SHA-2 is often faster because of hardware instructions and mature optimization. In hardware, speed depends on the implementation. Under a simple one-round-per-cycle model, SHA-3 can process more input bytes per cycle than SHA-2, because SHA-2 requires 64 rounds, whereas SHA-3 only requires 24. However, real results depend on clock frequency, area, pipelining, and whether you care about latency, absolute throughput, or throughput per area.

4. Underlying Design Differences

4.1 SHA-2: Iterative Compression over Fixed-Size Blocks

SHA-2, specified in NIST FIPS 180-4, processes input data in fixed-size blocks (512 or 1024 bits, depending on the variant) using an iterative compression function. Each block updates an internal state, and the final state becomes the hash output. This construction is commonly referred to as a Merkle–Damgård design.

From a system perspective, this leads to several practical properties:

Predictable control flow: Each block is processed in the same sequence of steps
Regular datapath structure: Operations such as additions, rotations, and XORs map well to standard hardware
Efficient pipelining: The round-based structure allows straightforward pipeline designs
Moderate state size: Internal state is relatively small compared to SHA-3

These characteristics make SHA-2 well suited for high-throughput hardware implementations and systems where simplicity and predictability are important.

4.2 SHA-3: Sponge Construction with Large Internal State

SHA-3, specified in NIST FIPS 202, is based on the sponge construction, where input data is absorbed into a large internal state (1600 bits in Keccak-f[1600]) and output is generated by extracting data from that state. The same underlying permutation is reused throughout the process. FIPS 202 defines both fixed-length hash functions (SHA3-224/256/384/512) and extendable-output functions such as SHAKE128 and SHAKE256.

From a system perspective, this results in different trade-offs:

Flexible output generation: Supports both fixed-length hashes and extendable-output functions (XOFs)
Large internal state: Increases storage requirements and routing complexity in hardware
Permutation-based processing: Relies on repeated application of the Keccak-f permutation
More involved control logic: Absorb and squeeze phases introduce additional control complexity

These properties make SHA-3 more flexible, but often more complex to implement and integrate, especially in resource-constrained embedded systems.

4.3 Key Consequences for System Design

The different constructions defined in FIPS 180-4 (SHA-2) and FIPS 202 (SHA-3) lead to distinct system-level behaviors:

Area and routing: SHA-2 typically uses smaller state and simpler data paths; SHA-3’s 1600-bit state increases routing and storage demands
Performance characteristics: SHA-2 often achieves high throughput with straightforward designs; SHA-3 performance depends more strongly on implementation strategy and design choices
Functional flexibility: SHA-2 provides fixed-length hashes as defined in FIPS 180-4; SHA-3 additionally enables XOFs and related functions defined in FIPS 202

In summary, SHA-2 emphasizes regularity and efficiency, while SHA-3 emphasizes flexibility and extensibility. These differences, rooted in their respective NIST-standardized constructions, explain the trade-offs observed in hardware implementation, software performance, and system integration.

5. When to use SHA-2 vs SHA-3

The choice between SHA-2 and SHA-3 depends strongly on system requirements, existing constraints, and long-term design goals, which is also reflected in their typical use cases. SHA-2 is generally the preferred option when compatibility with established protocols and ecosystems is required, as it is widely deployed in applications such as secure boot, firmware integrity verification, TLS and IPsec stacks, and code signing. Its standardization, proven interoperability, and broad tooling support make it a reliable and low-risk choice. From a technical perspective, it is well suited for designs with limited hardware resources, offering efficient implementations with low area and control complexity. In many embedded projects, its predictable behavior and alignment with block-based system architectures also result in faster and simpler integration.

In contrast, SHA-3 is attractive in some new designs that want Keccak-based primitives, XOF support, or cryptographic diversity, but actual adoption is application- and ecosystem-dependent. Its sponge construction enables features such as variable output lengths and extendable-output functions (XOF), which can simplify or enhance protocol design. This makes SHA-3 particularly relevant for flexible architectures and use cases that benefit from adaptable cryptographic primitives. This flexibility is relevant in modern cryptographic systems such as post-quantum cryptography (PQC) standards. PQC standards using SHA-3 are, for example, ML-KEM, ML-DSA, or FrodoKEM. While this flexibility comes with higher implementation and integration complexity, it positions SHA-3 as a strong candidate for future-oriented embedded systems.

6. Conclusion

SHA-2 and SHA-3 are both robust, standardized hash function families, but they reflect fundamentally different design philosophies.

SHA-2 remains the pragmatic choice for many systems. It is widely deployed, well understood, and aligns naturally with block-based architectures. Its regular datapath and efficient pipelining make it particularly suitable for area-constrained designs and high-throughput implementations. For applications that require compatibility with existing ecosystems such as secure boot, TLS, or IPsec, SHA-2 is often the lowest-risk option.

SHA-3, in contrast, introduces a different model. Its sponge construction avoids structural properties such as length extension that are inherent to Merkle–Damgård designs, simplifying certain use cases and reducing the risk of incorrect usage. In addition, the SHA-3 standard defines extendable-output functions (XOFs) such as SHAKE128 and SHAKE256, which enable flexible output lengths and allow multiple cryptographic functions such as hashing, key derivation, and masking to be built from a single primitive. This makes SHA-3 attractive for new designs that prioritize flexibility, cryptographic diversity, and long-term robustness, although it typically comes with higher implementation complexity and integration effort.

For SoC and FPGA developers, the decision is not about “which is better,” but about system context:

Use SHA-2 when compatibility, simplicity, and efficiency are the primary drivers
Use SHA-3 when flexibility, future-oriented design, or integration with modern cryptographic schemes (e.g. PQC) is required

In practice, many systems will continue to rely on SHA-2 while selectively adopting SHA-3 where its properties provide clear architectural advantages.

View full post