I had a (very) quick look. As even stated in the comments in the code only scatterlist with 1 segment it allowed. Workaround could be:
copy the entire scatterlist to a buffer, do the transformation and copy the resulting buffer back to a scatterlist. The larger the buffer needed, the more overhead. Given that most packets are around 1500 bytes (MTU) or maybe 4k using LUKS with large sectors, you have to think about what you want to achieve.
Alternatively, create a transformation for each scatterlist segment. Keep in mind that the "Source" and "Destination" scatterlist might not be the same segmented. This is what I'm trying to correct in my code. The only problem would be (and I didn't look long enough at your code for this), is can you use the IV from the previous transformation for the next. This is not the same as programming a "new" IV.
Most solutions I've looked at for different hardware are queuing in software, meaning they wait for each full request to be completed before de-queueing the next. In this cause you could use the "operation complete" interrupt which would generate the interrupt when the hardware has no more transformations for complete.
The first option is the easiest to implement: scatterlist-to-buffer, create a single segment "scatterlist" from this buffer, pass that to the remaining driver. Combine that with a one-request in the engine at the same time and you could even pre-allocated the buffer for this instead of dynamically allocating/freeing memory.