Hi all – When compressors use a sliding window, it's one monolithic window of a specified size (e.g. 32 KiB for DEFLATE), and that window slides from the start to the end of the file.

Such approaches take no account of particular types of files or data, their size, or their structure. Would it make sense to break the window into several windows, probably of different sizes? For example, we could have 64 KiB window for the first 128 KiB of a file, then maybe a 1 MiB window. We could have several windows (and static dictionaries), depending on the nature and size of the file, which we'll often know in advance.

One minor example is an HTML file. A lot of the text at the top of the file will never be seen again – e.g. we probably won't get matches on these common strings in the body of the HTML document, or even a chunky substring match:

  • <meta property="og:title" content=
  • <meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1">

We could probably break the window after a good chunk of the head section, and start a new window for the body (or start with any CSS in the head, as in AMP, combined with the body, since selectors will yield matches). I don't expect a huge payoff for HTML files, but broken window compression could significantly improve compression in other contexts. Note that we could also concatenate part of the last window with the new window if that was beneficial.

We could break static dictionaries into several smaller dictionaries depending on the nature and structure of the file. There's a lot more to say about that, but it will take careful research to discover if and when this approach pays dividends.

Does this sound promising? Has it already been done?