The Wikipedia article has a range of links of hardware implementations of Deflate:
https://en.wikipedia.org/wiki/DEFLATE
Also recently I came across IBM's patches to some code I work with (Samtools) which has an interface to their hardware zlib:
https://github.com/ibm-genwqe/genwqe-user
Google finds a bunch of hits about deflate and gpu, but I haven't checked the state of affairs. Also trying searching for rfc 1951 and 1952, incase deflate or gzip doesn't find the right hits. I don't see why it's not feasible to implement, although it may be hard to get the higher end compression out of it. Basically you want a parallel way to find all the text matches as this is the slow part of gzip compression.
Edit: In that regard, using a GPU suffix array or even the ST-8 construction used by BSC may work as an initial step to a deflate-in-gpu algorithm. (No one said you *have* to process the data as a stream.) I've no idea how much time this would save though.