From Tanebaum’s Structured Computer Organization
A vector processor is very efficient at executing a sequence of
operations on pairs of data elements. All of the operations are
performed in a single, heavily pipelined functional unit. Vector
processors work on arrays of data, and execute single
instructions that, for example, add the elements together pairwise
for two vectors.The vector processor has the concept of a vector register, which
consists of a set of conventional registers that can be loaded from
memory in a single instruction, which actually loads them from
memory serially.Then a vector addition instruction performs the pairwise addition of
the elements of two such vectors by feeding them to a pipelined
adder from the two vector registers. The result from the adder is
another vector, which can either be stored into a vector register or
used directly as an operand for another vector operation.The SSE (Streaming SIMD Extension) instructions available on the Intel
Core architecture use this execution model to speed up highly regular
computation, such as multimedia and scientific software.
As it says, a vector processor has a single functional unit (e.g. a single adder, which seem can add a pair of scalars at one time, not a pair of vectors?), and
the scalars in an array are loaded into a vector register from memory serially. Is there parallelism inside a vector processor?
For example, add two vectors stored as two arrays of scalars, A and B. Does it work in parallel like this:
- the adder adds a pair of scalars, A[i] and B[i], which have been stored in two vector registers,
- and at the same time, a later (j>i) pair of scalars A[j] and B[j] are loaded from memory to the vector registers?
Or does addition take place after the two arrays A and B are completely loaded into twp vector registers? Then doesn’t this make the adder idle while the scalars in each array are loaded?
Thanks.
2