***UPDATE***I updated the routine and zip file. The array was causing the compiler to bloat and slowdown the code - fixed.
***
CAUTION***
Don't use this routine if the number of bytes is more than 32 and the display is on. It will delay all interrupts!
[original]
Here's a small DMA routine for use with HuC. The PCE has a total of five DMA instructions - I actually didn't notice a fifth one until today

I made the interface C friendly. The instructions on it use are inside the DMA_RTN.h file. The destination can be anywhere - VDC, VCE, memory, SGX data port($0012), etc. You're not limited to global variables when assigning the parameters, but it's prefered for speed when used in loops.
The speed of the DMA is at ASM level since it's a block transfer instruction, but the arguement assigning is handled via the HuC compiler for flexibility and ease of use. If there is a need for a fast parameter setup in clock cycle resitrictive loops, let me know and I'll see what I can come up with.
Here's the file -
http://pcedev.net/HuC/Dma_rtn.zip I'll post it on the magicengine forums, although the dev section looks to be at a crawl at the moment.
-Rich