I have thought of some way for GPGPU utilization, eg having a complex delayed model computed by GPGPU and simple instant model computed by CPU.

For example consider following scheme (substeps are done in parallel):

1. Code first megabyte using instant model.

2a. Code second megabyte using instant model.

2b. Fed first megabyte to delayed model.

3a. Code third megabyte using instant model and delayed model (delayed model fed up with first megabyte).

3b. Fed second megabyte to delayed model.

4a. Code fourth megabyte using instant model and delayed model (delayed fed up with first and second megabyte).

4b. Fed third megabyte to delayed model.

I have no idea how to make the delayed model however.

Anyone has any ideas?