Definition of DSP device and general purpose processor (GPP) and its differences

Consider an example of digital signal processing, such as a finite impulse response filter (FIR). In mathematical terms, the FIR filter is a series of dot products. Take an input quantity and an ordinal vector, multiply the coefficient and the sliding window of the input sample, and then add all the products together to form an output sample.

Similar operations are repeated in large numbers during digital signal processing, so that devices designed for this purpose must provide specialized support, facilitating the shunting of DSP devices and general purpose processors (GPPs):

1 Support for dense multiplication operations

GPP is not designed to do dense multiplication tasks. Even some modern GPPs require multiple instruction cycles to do a multiplication. The DSP processor uses specialized hardware to implement single-cycle multiplication. The DSP processor also adds an accumulator register to handle the sum of multiple products. The accumulator registers are typically wider than the other registers, adding extra bits called result bits to avoid overflow. At the same time, in order to fully embody the benefits of specialized multiply-accumulate hardware, almost all DSP instruction sets contain explicit MAC instructions.

2 memory structure

Traditionally, GPP uses the von Neumann memory structure. In this configuration, only one memory space is connected to the processor core through a set of buses (an address bus and a data bus). Typically, doing a multiplication will result in 4 memory accesses, using at least four instruction cycles.

Most DSPs use a Harvard architecture that divides the memory space into two, storing programs and data separately. They have two sets of buses connected to the processor core, allowing them to be accessed at the same time. This arrangement doubles the bandwidth of the processor memory and, more importantly, provides data and instructions to the processor core at the same time. In this layout, the DSP is able to implement a single-cycle MAC instruction.

A further problem is that typical high-performance GPPs now contain two on-chip caches, one for data and one for instructions, which are directly connected to the processor core to speed up access at runtime. Physically, the on-chip dual memory and bus architecture is almost identical to the Harvard architecture. However, logically, there are still important differences between the two.

GPP uses control logic to determine which data and instruction words are stored in the on-chip cache, which programmers do not specify (or may not know at all). In contrast, DSPs use multiple on-chip memories and multiple sets of buses to ensure multiple accesses to memory per instruction cycle. When using a DSP, the programmer has to explicitly control which data and instructions are stored in on-chip memory. When programmers write programs, they must ensure that the processor can effectively use its dual bus.

In addition, the DSP processor has almost no data cache. This is because the typical data of the DSP is the data stream. That is to say, after the DSP processor calculates each data sample, it discards it and almost no longer uses it.

3 zero overhead loop

If you understand a common feature of DSP algorithms, that is, most of the processing time is spent on executing smaller loops, it is easy to understand why most DSPs have dedicated hardware for zero-overhead loops. The so-called zero-overhead loop means that the processor does not take the time to check the value of the loop counter when the loop is executed, the condition is shifted to the top of the loop, and the loop counter is decremented by one.

In contrast, the GPP cycle is implemented using software. Some high-performance GPPs use branch-prediction hardware to achieve almost the same effect as hardware-supported zero-overhead loops.

4 fixed point calculation

Most DSPs use fixed-point calculations instead of floating point. Although DSP applications must pay great attention to the accuracy of the numbers, it should be much easier to do with floating point, but for DSP, cheap is also very important. Fixed-point machines are cheaper (and faster) than the corresponding floating-point machines. In order to ensure the accuracy of numbers without using floating-point machines, DSP processors support saturation calculations, rounding and shifting in both instruction set and hardware.

On the surface, DSP has many things in common with standard microprocessors: an ALU-centric processor, address and data bus, RAM, ROM, and I/O ports. Broadly speaking, DSPs, microprocessors And the microcontroller (microcontroller) and so on belong to the processor, it can be said that the DSP is a kind of CPU. But DSP and general CPU are different:

The first is the architecture: the CPU is von Neumann structure, and the DSP has separate code and data bus, the "Harvard structure", so that multiple memory accesses can be made in the same clock cycle - this is because the data bus There are often several groups. With this architecture, the DSP can fetch an instruction and one or two (or more) operands in a single clock cycle.

Standardization and versatility: CPU standardization and versatility are well done, support operating system, so CPU-centric system facilitates human-computer interaction and communication with standard interface devices, very convenient and does not require hardware development; but also The CPU peripheral interface circuit is more complicated. The DSP is mainly used to develop an embedded signal processing system. It does not emphasize human-computer interaction. Generally, it does not require many communication interfaces, so the structure is relatively simple and easy to develop. If you only focus on embedded applications, the difference between embedded CPU and DSP should only be a biased control of a biased operation.

Pipeline structure: Most DSPs have a pipeline structure, that is, each instruction is completed by multiple functional units on the chip, such as fetching, decoding, fetching, and executing, which can greatly improve the system's execution efficiency. However, the adoption of the assembly line also increases the difficulty of software design, requiring designers to consider the need for running water in the program design.

Fast multipliers: Signal processing algorithms often use a large number of multiply-accumulate (MAC) operations. The DSP has a dedicated hardware multiplier that can perform MAC operations in one clock cycle. The hardware multiplier takes up a large portion of the DSP chip area. (In contrast, general-purpose CPUs use a slower, iterative multiplication technique that can perform one multiplication in multiple clock cycles, but takes up less silicon resources).

Address Generator: The DSP has a dedicated hardware address generation unit so that it can support the specific data address patterns required by many signal processing algorithms. This includes the pre (post) increment (subtraction), the modulo address of the ring data buffer, and the bit inversion address of the FFT. The address generator unit works in parallel with the main ALU and multiplier, which further increases the amount of work that the DSP can do in one clock cycle.

Hardware-assisted loops: Signal processing algorithms often need to perform tight instruction loops. Support for hardware-assisted loops allows the DSP to efficiently execute code blocks without having to stall the pipeline or let the software test the loop termination condition.

Low power consumption: DSP power consumption is small, usually 0.5W to 4W, low-power DSP is even only 0.05W, battery-powered, very suitable for embedded systems; and CPU power consumption is usually above 20W.

Copper Tube Terminals Without Checking Hole

Our company specializes in the production and sales of all kinds of terminals, copper terminals, nose wire ears, cold pressed terminals, copper joints, but also according to customer requirements for customization and production, our raw materials are produced and sold by ourselves, we have their own raw materials processing plant, high purity T2 copper, quality and quantity, come to me to order it!

Copper Tube Terminals Without Checking Hole,Cable Lugs Insulating Crimp Terminal,Cable Connector Tinned Copper Ring Terminal,Tubular Cable Lugs Crimp Terminal

Taixing Longyi Terminals Co.,Ltd. , https://www.lycopperterminals.com

Posted on