While very slow, it may be interesting: http://habrahabr.ru/blogs/hi/124210/

Another, faster implementation: https://github.com/Kentzo/phuffman, "cuda" branch (the say that it's 20% faster than fast ari = www.cipr.rpi.edu/~said/FastAC.html )