Clearly it is possible because we could write a program to print it up in an even smaller size.
Likely this compressor has some form of string delta which produced lots of +1 values (likely not handling the carry-over case of az to ba, but that's still a good start), and then those deltas get compressed again.
Eg a really basic string delta:
Code:
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv) {
char last_a[8192], *last = last_a;
char curr_a[8192], *curr = curr_a;
*last = 0;
if (argc > 1 && strcmp(argv[1], "-d") == 0) {
while (fgets(curr, 8192, stdin)) {
int d = *curr, l = strlen(curr+1)-1+*curr;
memmove(curr+*curr, curr+1, l);
memcpy(curr, last, d);
curr[l] = '\0';
puts(curr);
char *cp1 = last; last = curr; curr = cp1;
}
} else {
while (fgets(curr, 8192, stdin)) {
char *cp1, *cp2;
// Find delta point
for (cp1 = last, cp2 = curr; *cp1 == *cp2; cp1++, cp2++)
;
// Chomp nl
for (cp1 = cp2; *cp1 != '\n'; cp1++)
;
*cp1 = 0;
// Encode
putchar(cp2-curr);
puts(cp2);
cp1 = last; last = curr; curr = cp1;
}
}
return 0;
}
bsc -m0e2b1000M5 on the output of this then gets you to 712 bytes! Whereas on the original file it was much slower and gave around 100Mb.
(Amazingly 712 bytes is under 2x larger than the trivial little program I wrote to create the input data.
)
Edit: tweaked bsc params.