Here is the source code to one of the other original versions I did of this that is faster and has a few other slight changes to it. Releasing as public domain.
This one is in QB64 / Quickbasic 64 which is a 64 bit version of Basic that compiles for Mac, Windows, and Linux. It will also compile and run on Chromebook and (possibly) traditional Android devices.
The first version was done in pascal/delphi, but I doubt anyone will use that one. This can be easily ported to VB6 by adding a form and replacing write/locate with .caption or .text on a textbox or Label to update data positions.
You can download QB64 for free to compile and run it. http://qb64.org/
It should be easy to convert this to Python, PHP, or any other language you fancy.
TOP4 COMPRESS:
Code:
'TOP4 COMPRESS BASIC/Quickbasic/QB64 implementation by James Wasil 2018
'TOP4 is a compression method I came up with that can replace Huffman for some post-LZ and other final pass compression methods. It can be used that way or standalone.
'It's easy to use, easy to implement, easy to convert to other programming languages, straightforward, and treeless.
'This version will work with the 32 or 64 bit version of the QB64 compiler for Windows, Mac OS X, and Linux
'freely available at http://www.qb64.net or http://www.qb64.org
'It can work with classic Quickbasic or Qbasic from DOSBOX.
'Top4 can be made to work with Visual Basic 6.0 or higher by adding a form and placing this code in the main. The locate/write visual updates may be replaced with
'Caption changes to a .Text of a TextBox if you like. The rest should work as-is.
'This is released as public domain to use, improve upon, or do as you like with it for data compression.
'If you find this helpful or useful, please give credit or leave my name in the code. That's all I ask. :)
'Feel free to send feedback to: james.wasil@gmail.com
'Enjoy!
'
'Array definitions:
'P1$() = ASCII symbols from 0 to 255
'P2() = Numeric array that holds count of symbols
'P3$() = Binary string array from 0 to 255 that holds variable-sized bit patterns
'BUFFER$ = Temporary file buffer of up to 32767 bytes
'Z2$ = Working string buffer of data read from BUFFER$ out of the file
'OUTP$ = Output string for binary patterns. These get sent to the disk after there are enough to convert to an 8 bit byte.
DIM P1$(256), P2(256), P3$(256): P = P - 1
FOR T = 0 TO 3: P = P + 1: TEMP$ = CHR$(T): CALL Ascii.To.Bynary(TEMP$, OUT1$): P3$(P) = RIGHT$(OUT1$, 7): OUT1$ = "": NEXT T: 'ADD FIRST 4 PATTERNS THAT ARE 7 BITS TO P3$(P), WHERE P IS ALWAYS +1.
FOR T = 8 TO 249: P = P + 1: TEMP$ = CHR$(T): CALL Ascii.To.Bynary(TEMP$, OUT1$): P3$(P) = OUT1$: OUT1$ = "": NEXT T: 'ADD NEXT 248 PATTERNS THAT ARE 8 BITS TO P3$(P), WHERE P IS ALWAYS +1.
FOR T = 250 TO 254: P = P + 1: TEMP$ = CHR$(T): CALL Ascii.To.Bynary(TEMP$, OUT1$): P3$(P) = OUT1$ + "0": P = P + 1: P3$(P) = OUT1$ + "1": OUT1$ = "": NEXT T: 'ADD LAST 8 PATTERNS THAT ARE 9 BITS TO P3$(P), WHERE P IS ALWAYS +1.
'SLIGHTLY MODIFIED FROM ORIGINAL TOP4 PATTERN TO MAKE SPACE FOR AN EOF SYMBOL.
'TEMP$=CHR$(255): P=P+1: CALL Ascii.To.Bynary(TEMP$, OUT1$): P3$(P) = OUT1$ + "0": P = P + 1: P3$(P) = OUT1$ + "1": OUT1$ = "": 'ADD STOP/EOF SYMBOL AND WILDCARD$
FOR T = 0 TO 255: P1$(T) = CHR$(T): NEXT T: 'CREATE STANDARD ASCII TABLE
LINE INPUT "File to read:", FILE1$ :'FILE TO READ FROM
LINE INPUT "File to write to:", FILE2$ :'FILE TO OUTPUT TO
OPEN FILE1$ FOR BINARY AS #1
OPEN FILE2$ FOR OUTPUT AS #2: CLOSE #2: OPEN FILE2$ FOR BINARY AS #2
DO
IF LEN(Z2$)<=1 THEN
DO
IF LOF(1)-LOC(1)=>32767 THEN BUFFER$=STRING$(32767,"0") ELSE BUFFER$=STRING$(LOF(1)-LOC(1),"0")
IF LOF(1)-LOC(1)=>0 THEN GET #1,,BUFFER$:Z2$=Z2$+BUFFER$
LOOP UNTIL LOF(1)-LOC(1)=<0 OR LEN(Z2$)>=1
ELSE
END IF
Z$ = LEFT$(Z2$,1):Z2$=RIGHT$(Z2$,LEN(Z2$)-1)
IF LEN(Z$)=>1 THEN P2(ASC(Z$)) = P2(ASC(Z$)) + 1
LOCATE 1, 1: WRITE LOF(1) - LOC(1)
LOOP UNTIL Z$=""
DO
FOUND = 0
FOR T = 0 TO 254
IF P2(T) < P2(T + 1) THEN TEMP1 = P2(T): TEMP2$ = P1$(T): P2(T) = P2(T + 1): P2(T + 1) = TEMP1: P1$(T) = P1$(T + 1): P1$(T + 1) = TEMP2$: FOUND = 1
NEXT T
LOOP UNTIL FOUND = 0
'MAKE STATIC BYTE HEADER HERE. NOW THAT IT IS ARRANGED STATISTICALLY, IT CORRESPONDS FROM GREATEST TO LEAST OCCURENCES WITH THE P3$() ARRAY BIT PATTERNS.
for T=0 to 255:HEADER$=HEADER$+P1$(T):NEXT T:PUT #2,,HEADER$:' This adds an easy to restore 256 byte header. It is possible to reduce this to 12 bytes and build around it to restore.
'ADD THE FILESIZE NEXT. The file size is added to the second part of the header after the 256 byte table. For easy implementation, file size digits are stored as raw
'8 bit bytes. The file size can be any size this way, but if you want to save space this can be a static 4 byte file header to support up to 4gb, even though the overhead isn't
'that large with this dynamic size header, it may make a difference of a few bytes for smaller files:
FILESIZE$=LTRIM$(STR$(LOF(1)))+"E":'E FOR END
PUT #2,,FILESIZE$
'Set the file pointer to the first position. Z$ is the same as BUFFER$ here, but needs to be initialized with any symbol of at least 1 byte. It can be up to 32767 bytes or more, but gets slower if too large
'of a string is used.
SEEK #1, 1: Z$ = "P"
CLS
'Get up to 32767 bytes, use Z3$ as a BUFFER$, add the new bytes to Z2$
DO
IF LOF(1)-LOC(1)=>32767 AND LEN(Z2$)<2 THEN Z3$ = STRING$(32767,"0")
IF LOF(1)-LOC(1)<32767 AND LEN(Z2$)<2 THEN Z3$="0"
IF LEN(Z2$)<2 AND LOF(1)-LOC(1)=>1 THEN
DO
GET #1, , Z3$:Z2$=Z2$+Z3$
LOOP UNTIL LEN(Z2$)=>1 OR LOF(1)-LOC(1)<1
END IF
LOCATE 1, 1: WRITE LOC(1):WRITE LOF(2):'Write the position we're at from FILE1$ and the size of the new FILE2$ and update it. Compression goes slightly faster without this visual update.
'A few modifications:
'Modification 1: If ASCII symbol 0+0 is seen, then compress each to 4.5 bits and group the occurence together as 1 9 bit pattern.
IF LEFT$(Z2$,2)=CHR$(0)+CHR$(0) THEN
OUTP$=OUTP$+"111111110"
Z2$=RIGHT$(Z2$,LEN(Z2$)-2):'Remove the 2 bytes from the input file stream after we've output the binary result
'Modification 2: If we see the last symbol output appear as the next 2 symbols, we are able to group those 2 symbols together as 4.5 bits and output as 1 9 bit pattern:
ELSEIF Z2$=LAST$+LAST$ THEN
OUTP$=OUTP$+"111111111"
Z2$=RIGHT$(Z2$,LEN(Z2$)-2):'Remove the 2 bytes from the input file stream after we've output the binary result
'Proceed with normal ASCII to binary lookup table here:
ELSE
Z$=LEFT$(Z2$,1):Z2$=RIGHT$(Z2$,LEN(Z2$)-1)
FOR PP = 0 TO 255: IF P1$(PP) = Z$ THEN OUTP$ = OUTP$ + P3$(PP):LAST$=Z$:Z$="":EXIT FOR: 'ADD THE BINARY REFERENCE PATTERN BY ASCII VALUE, BASED ON THE STATISTICAL ARRANGEMENT OF THE SYMBOLS HERE.
NEXT PP
END IF
'If we're not at the end of the file yet
IF LOF(1)-LOC(1)=<0 THEN
'See if we are able to output a byte to the disk from OUTP$. If it's at least 8 bits, convert to a byte and send it out:
IF LEN(OUTP$) >= 8 THEN TEMP$ = LEFT$(OUTP$, 8): OUTP$ = RIGHT$(OUTP$, LEN(OUTP$) - 8): CALL Bynary.To.Ascii(TEMP$, OUTP2$): PUT #2, , OUTP2$: OUTP2$ = ""
EXIT DO
ELSE
END IF
'IF WE HAVE 8 BITS, THEN OUTPUT A BYTE FROM THE LEFT. SAME AS ABOVE, BUT OUTSIDE OF THE MAIN LOOP:
IF LEN(OUTP$) >= 8 THEN
DO
TEMP$ = LEFT$(OUTP$, 8): OUTP$ = RIGHT$(OUTP$, LEN(OUTP$) - 8): CALL Bynary.To.Ascii(TEMP$, OUTP2$): PUT #2, , OUTP2$: OUTP2$ = ""
LOOP UNTIL LEN(OUTP$)<8
END IF
LOOP UNTIL LOF(1)-LOC(1)=<0 AND Z2$="" AND LEN(OUTP$)<8
'OUTSIDE OF THE LOOP AND READY TO FINISH:
'IF THERE ARE STILL BITS PRESENT WITH OUTP$, THEN PAD IT TO 8 BITS WITH EXTRA ZEROS THEN OUTPUT THE LAST BYTE.
IF LEN(OUTP$) <> 0 THEN OUTP$ = OUTP$ + STRING$(8 - LEN(OUTP$), "0"): OUTP$ = RIGHT$(OUTP$, LEN(OUTP$) - 8): CALL Bynary.To.Ascii(TEMP$, OUTP2$): PUT #2, , OUTP2$: OUTP2$ = ""
WRITE "Original File:", LOF(1)
WRITE "Compressed Output:", LOF(2)
WRITE "Difference:", LOF(1) - LOF(2)
CLOSE #1
CLOSE #2
END
'Functions / Subs area.
SUB Ascii.To.Bynary (X$, OUTZX$)
FOR K = 1 TO LEN(X$)
KXZZ# = ASC(MID$(X$, K, 1))
IF KXZZ# - 128 >= 0 THEN KXZZ# = KXZZ# - 128: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 64 >= 0 THEN KXZZ# = KXZZ# - 64: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 32 >= 0 THEN KXZZ# = KXZZ# - 32: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 16 >= 0 THEN KXZZ# = KXZZ# - 16: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 8 >= 0 THEN KXZZ# = KXZZ# - 8: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 4 >= 0 THEN KXZZ# = KXZZ# - 4: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 2 >= 0 THEN KXZZ# = KXZZ# - 2: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 1 >= 0 THEN KXZZ# = KXZZ# - 1: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
NEXT K
END SUB
SUB Bynary.To.Ascii (X$, KXX$)
KXX$ = ""
FOR X = 1 TO LEN(X$) STEP 8
IF MID$(X$, X, 1) = "1" THEN XXZ = XXZ + 128
IF MID$(X$, X + 1, 1) = "1" THEN XXZ = XXZ + 64
IF MID$(X$, X + 2, 1) = "1" THEN XXZ = XXZ + 32
IF MID$(X$, X + 3, 1) = "1" THEN XXZ = XXZ + 16
IF MID$(X$, X + 4, 1) = "1" THEN XXZ = XXZ + 8
IF MID$(X$, X + 5, 1) = "1" THEN XXZ = XXZ + 4
IF MID$(X$, X + 6, 1) = "1" THEN XXZ = XXZ + 2
IF MID$(X$, X + 7, 1) = "1" THEN XXZ = XXZ + 1
KXX$ = KXX$ + CHR$(XXZ): XXZ = 0
NEXT X
END SUB
TOP4 DECOMPRESS (this can be merged or made a function/sub of the above to consolidate it as one program if desired)
Code:
'TOP4 DECOMPRESS BASIC/Quickbasic/QB64 implementation by James Wasil 2018
'TOP4 is a compression method I came up with that can replace Huffman for some post-LZ and other final pass compression methods. It can be used that way or standalone.
'It's easy to use, easy to implement, easy to convert to other programming languages, straightforward, and treeless.
'This version will work with the 32 or 64 bit version of the QB64 compiler for Windows, Mac OS X, and Linux
'freely available at http://www.qb64.net or http://www.qb64.org
'It can work with classic Quickbasic or Qbasic from DOSBOX.
'Top4 can be made to work with Visual Basic 6.0 or higher by adding a form and placing this code in the main. The locate/write visual updates may be replaced with
'Caption changes to a .Text of a TextBox if you like. The rest should work as-is.
'This is released as public domain to use, improve upon, or do as you like with it for data compression.
'If you find this helpful or useful, please give credit or leave my name in the code. That's all I ask. :)
'Feel free to send feedback to: james.wasil@gmail.com
'Enjoy!
DIM P1$(256), P2(256), P3$(256): P = P - 1
FOR T = 0 TO 3: P = P + 1: TEMP$ = CHR$(T): CALL Ascii.To.Bynary(TEMP$, OUT1$): P3$(P) = RIGHT$(OUT1$, 7): OUT1$ = "": NEXT T: 'ADD FIRST 4 PATTERNS THAT ARE 7 BITS TO P3$(P), WHERE P IS ALWAYS +1.
FOR T = 8 TO 249: P = P + 1: TEMP$ = CHR$(T): CALL Ascii.To.Bynary(TEMP$, OUT1$): P3$(P) = OUT1$: OUT1$ = "": NEXT T: 'ADD NEXT 248 PATTERNS THAT ARE 8 BITS TO P3$(P), WHERE P IS ALWAYS +1.
FOR T = 250 TO 254: P = P + 1: TEMP$ = CHR$(T): CALL Ascii.To.Bynary(TEMP$, OUT1$): P3$(P) = OUT1$ + "0": P = P + 1: P3$(P) = OUT1$ + "1": OUT1$ = "": NEXT T: 'ADD LAST 8 PATTERNS THAT ARE 9 BITS TO P3$(P), WHERE P IS ALWAYS +1.
FOR T = 0 TO 255: P1$(T) = CHR$(T): NEXT T: 'CREATE STANDARD ASCII TABLE
LINE INPUT "File to read:", FILE1$
LINE INPUT "File to write to:", FILE2$
OPEN FILE1$ FOR BINARY AS #1
OPEN FILE2$ FOR OUTPUT AS #2: CLOSE #2: OPEN FILE2$ FOR BINARY AS #2
Z$ = STRING$(256, "0"): GET #1, , Z$: HEADER$ = Z$: Z$ = "P"
FOR T = 1 TO 256: P1$(T - 1) = MID$(HEADER$, T, 1): NEXT T: 'Load HEADER$ to P1$() array.
'GET FILESIZE NEXT
Z$="G"
DO
GET #1,,Z$:IF Z$="E" THEN EXIT DO ELSE FILESIZE$=FILESIZE$+Z$
LOOP
SIZEFILE#=VAL(FILESIZE$)
DO
'ENSURE WE HAVE ENOUGH BYTES
DO
Z$=STRING$(1024,"0"):IF LOF(1)-LOC(1)<32767 THEN Z$="0"
IF LEN(OUTP$)<16 AND LOF(1)-LOC(1)>0 THEN GET #1,,Z$
'LOCATE 1, 1: WRITE LOC(1)
IF LEN(OUTP$) < 16 THEN CALL Ascii.To.Bynary(Z$, OUTP1$): OUTP$ = OUTP$ + OUTP1$: OUTP1$ = ""
LOOP UNTIL LEN(OUTP$)=>16 OR LOF(1)-LOC(1)=<0
LOCATE 1,1:WRITE LOF(1)-LOC(1)
IF LEFT$(OUTP$,9)="111111110" THEN OUTP$ = RIGHT$(OUTP$, LEN(OUTP$) - 9): OUTP2$ = CHR$(0)+CHR$(0): PUT #2, , OUTP2$
IF LEFT$(OUTP$,9)="111111111" THEN OUTP$ = RIGHT$(OUTP$, LEN(OUTP$) - 9): OUTP2$ = LAST$+LAST$: PUT #2, , OUTP2$
FOR PP = 0 TO 255: IF LEFT$(OUTP$, LEN(P3$(PP))) = P3$(PP) THEN OUTP$ = RIGHT$(OUTP$, LEN(OUTP$) - LEN(P3$(PP))): OUTP2$ = P1$(PP): LAST$=OUTP2$:PUT #2, , OUTP2$: EXIT FOR: 'ADD THE BINARY REFERENCE PATTERN BY ASCII VALUE, BASED ON THE STATISTICAL ARRANGEMENT OF THE SYMBOLS HERE.
NEXT PP
'WRITE OUTP$
LOOP UNTIL LOF(2)=>SIZEFILE# AND LOF(1)-LOC(1)=<0
WRITE "Compressed File:", LOF(1)
WRITE "Original File:", LOF(2)
WRITE "Difference:", LOF(1) - LOF(2)
CLOSE #1
CLOSE #2
END
SUB Ascii.To.Bynary (X$, OUTZX$)
FOR K = 1 TO LEN(X$)
KXZZ# = ASC(MID$(X$, K, 1))
IF KXZZ# - 128 >= 0 THEN KXZZ# = KXZZ# - 128: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 64 >= 0 THEN KXZZ# = KXZZ# - 64: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 32 >= 0 THEN KXZZ# = KXZZ# - 32: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 16 >= 0 THEN KXZZ# = KXZZ# - 16: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 8 >= 0 THEN KXZZ# = KXZZ# - 8: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 4 >= 0 THEN KXZZ# = KXZZ# - 4: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 2 >= 0 THEN KXZZ# = KXZZ# - 2: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
IF KXZZ# - 1 >= 0 THEN KXZZ# = KXZZ# - 1: OUTZX$ = OUTZX$ + "1" ELSE OUTZX$ = OUTZX$ + "0"
NEXT K
END SUB
SUB Bynary.To.Ascii (X$, KXX$)
KXX$ = ""
FOR X = 1 TO LEN(X$) STEP 8
IF MID$(X$, X, 1) = "1" THEN XXZ = XXZ + 128
IF MID$(X$, X + 1, 1) = "1" THEN XXZ = XXZ + 64
IF MID$(X$, X + 2, 1) = "1" THEN XXZ = XXZ + 32
IF MID$(X$, X + 3, 1) = "1" THEN XXZ = XXZ + 16
IF MID$(X$, X + 4, 1) = "1" THEN XXZ = XXZ + 8
IF MID$(X$, X + 5, 1) = "1" THEN XXZ = XXZ + 4
IF MID$(X$, X + 6, 1) = "1" THEN XXZ = XXZ + 2
IF MID$(X$, X + 7, 1) = "1" THEN XXZ = XXZ + 1
KXX$ = KXX$ + CHR$(XXZ): XXZ = 0
NEXT X
END SUB