Hello all,
It’s me again, back to ask yet another probably obscure question :).
I’ve been studying the V810 CPU and the Seminar slides that NEC did to help Virtual Boy programmers get started. I know a little bit about pipelining from taking a course that studied MIPS, but I’m a little confused about the pipelining hazards that exist in the V810.
For example, the slides say a hazard exists when performing a LOAD after STORE, which stalls the pipeline. Assuming a classic RISC pipeline, like MIPS, yes while LDs and STs require to use the bus at different stages, I’m not sure how a consecutive LD and ST will attempt to use the bus simultaneously and cause a stall (the LD stage will lag behind the ST by one stage, and a bus access shouldn’t occur in two consecutive stages for an instruction).
(LD/ST assumed from here on) Another hazard shows that consecutive stores will not work. Since only one memory access, which presumably, takes one stage of the pipeline, should occur during a store (unless it’s a 32-bit quantity), I’m not sure why a stall is needed- the stores should be able to happen in succession, since each store will have completed a different pipeline stage each cycle (barring other hazards). Likewise, I do not see how placing loads in succession speeds up the CPU.
Does anyone have a reference on how the pipeline in the V810 was implemented?
The V810 manual, PDF page 11, says the V810 has a one-clock pitch pipeline (whatever that means); another source I’ve read states that the V810 used a 5-stage pipeline. If the latter is true, then I can’t really see how some of these hazards occur. The term pipeline is mentioned exactly one in the V810 User’s Manual, and the timing diagrams in the Datasheet don’t give many hint’s either. The Hardware Manual describes the bus cycle, but not the pipeline.
Perhaps the V810 summary pamphlet, which states that the V810 pipeline has interlock, is a hint?
… Why do I care about this stuff? 🙁 LOL! Well, at least this CPU has somewhat modern design choices… 🙂
I think it’s just a typical pipeline data hazard situation. Wikipedia has a pretty good description ( http://en.wikipedia.org/wiki/Hazard_%28computer_architecture%29 ). Consecutive loads are good, since there’s no hazard… but writing and then reading does have a hazard since it can’t read until it’s been written, or it’d possibly pull stale data. So it stalls the pipeline (bubbles) to verify the data has been stored before allowing the load.
While I don’t think the pipeline is described in detail anywhere… I have a feeling that it’s similar to the standard 5 stage RISC pipeline. The V810 seminar shows a generic diagram (looks just like: http://en.wikipedia.org/wiki/Classic_RISC_pipeline ). And Pg. 7 of the programming document gives a hint on the operation: “Load invokes data read bus cycle before execution”, and “Store invokes data write bus cycle after execution”… you can also get some hints from the notes on the execution cycles on pg. 108 if the V810 Architecture Manual.
I believe the interlock comment is just saying that it automatically avoids hazards by having the hardware interlock (on architectures without it, you’d need to add NOPs to avoid hazards). But of course due to the interlock, the performance is affected.
DogP
In the example on Slide 7 of the programming seminar, the memory location being stored to does not depend on the address from which a value should be loaded. That’s why it confuses me- unless the v810 can’t detect that r10 != r11 ahead of time.
Yeah, I’ve never seen anything that says the pipeline stall only occurs if there is actually a hazard, so I assumed they didn’t put the logic in to detect dependencies… basically just blindly preventing the worst case. In the case of something like MIPS, the assembler detects dependencies and selectively adds NOPs, but doing that logic in hardware would probably have taken up a good chunk of real estate.
DogP