When studying computer architecture in college, I learned that the data bus pins connecting the CPU and L1 cache for data transmission are sized according to the register bit size. For example, in a 64-bit system, I understand that the data bus pins between the CPU and L1 Cache are 64 bits.
However, in modern embedded systems, the internal cache typically adopts a banked structure, allowing parallel Load/Store of data for each bank. Even if more than 64 bits of data are fetched in parallel from the banks, the data bus pins are still 64 bits. Consequently, the values will be updated sequentially in the registers.
So, to avoid sequential data transmission, is the increase in the number of data bus pins from the conventional 64-bit to a form like 64 * N bits being considered?