Hi!
I just want to show my new idea about a code optimizing algorithm for you.
Now it is just in a test-form, and I still have some problem to make a really usable automatic application from this algorithm. But it is not a difficult algorithm - I think - and I'm still amateur in programming, so I hope somebody can help me or make a program for the idea.
However, I wrote a C# code of it, but it has a bug, and because of that I could make that run with pauses.
Also, this code is only usable on command-line programs, which do data manipulations on files, and also it is very-very slow.
And sorry for my English and wrong phrases.
First about the main idea:
Imagine a simple Win32 native (or it can be .NET application as well (now, but not for the program, I will explain it later)) program. It contains loads of "strange" and "baffling" block of bytes, (for example we can see this with a Hex-Editor), which can be "only" clear for the machine (e.g. the reason why we say "machine-code"). But in most cases (in my manual tests the 100% of the tested programs) there are some block of bytes, which aren't important (or aren't important in every case) during the running of the program, so they can be changed without the program crashes. In most cases, they can't be exactly deleted from the hexadecimal code of the program, because it has a fixed size for the running and several jump-point (of whatever), do after a direct byte-deleting it crashes terribly again. But as I said, they can be changed, and for example, all of them can be changed to zero bytes (00), so after it the program can be compressed with a little better ratio.
For example:
In almost every native Win32 exe, in the start of the code there is a code block, which tells 'This program can't be run in DOS mode' blablabla, when we would try to run the program in pure DOS. But every normal user know, that trying to run the program in DOS mode is a stupid thing, so this block of code is unnecessary, so we can fill the bytes with '00' zero bytes, so that the program can be compressed to smaller and it can still run on Windows.
In my tests, I made a small native program in C++ which could produce a sine wave sound on a frequency, which I wrote into the console window. I compilled it with all of the size optimizations with Visual C++ 2008 and it was about (or more than) 7 KB. Than I manually searched all of the bytes, which could be cleared to zero-byte and wasn't necessary for the right program running (It was a very hard and boring work still on 7000 character with the permanent tests). When I was done, I packed my version with UPX, and also the original one. The original one - which wasn't modifyted manually - with UPX compression was about 5,5-6 KB, and my modifyted version was almost 4 KB. However, the MS Visual Studio's compiller and code optimizer isn't the best, but I think it shows that in bigger native codes it would be useful.
But there are some problem and limitation in it's practice:
First, doing it manually would be extremly, very-very huge and long work. Doing it on that 7 KB native code in that test without any mistake was about 3 hour for me - and the potential applications would take MegaBytes! It's 1000× hard work to optimizing manually with this technology.
In the other hand, we could implement a programcode of this algorithm, but there is a problem during the testing : in the bigger codes (I mean which have about more than 10 command, so I mean all of the potential programs) there is several blocks, which can be cleared to zero-bytes and the program can start, however it will have crazy bugs in it e.g. in it's appearance. For example, I have a command line audio equalizer, which I modifyted on it's native byte code, and now it produces a strange BitMachine effect on the audios. However, I made a "Good" bug for me in this case, general people would use the applications in it's original form.
So to make a usable code of the idea/algorithm, we must find out a point programmatically during testing the application, which shows for the code correctly that the modifited application's tested full part (which would be used by the user) works 100% correctly and safe.
This is the reason, why my code would works only with file-/data.manipulating command line applications.
First, here is the C# code :
Code:
using System;
using System.IO;
using System.Diagnostics;
class a
{
static string m = "modify.exe", n, o, p;
static Process e = new Process();
static void t()
{
e = Process.Start(new ProcessStartInfo("taskkill", "/IM " + m + " /T /F"));
e.WaitForExit();
}
static void w(string z) { Console.Write(z); }
static void Main()
{
w("Sample of original program-work : "); o = Console.ReadLine();
w("Sample of test-work : "); n = Console.ReadLine();
w("Command-Line parameters (write it carefully!) : "); p = Console.ReadLine();
w("Maximum time to wait for the executable (in milliseconds): "); var q = int.Parse(Console.ReadLine());
w("Original EXE (extension is given) : ");
byte[] a = File.ReadAllBytes(Console.ReadLine()+".exe"), j = File.ReadAllBytes(o);
byte d;
long b = a.LongLength, c = 64, g = j.LongLength, h;
var e = new Process();
for (; c < b; c++) if (a[c] != 0) {
var f = true;
d = a[c]; a[c] = 0;
try { File.WriteAllBytes(m, a); }
catch
{
t();
e = Process.Start(new ProcessStartInfo("taskkill", "/IM WerFault.exe /T /F"));
e.WaitForExit();
File.WriteAllBytes(m, a);
}
var l = new ProcessStartInfo(m, p);
l.WindowStyle = ProcessWindowStyle.Hidden;
try
{
e = Process.Start(l);
e.WaitForExit(q);
}
catch { f = false; }
if (f && File.Exists(n) && new FileInfo(n).Length == g)
{
BinaryReader i = new BinaryReader(File.OpenRead(n)), k = new BinaryReader(File.OpenRead(o));
for (h = 0; h < g; h++) if (i.ReadByte() != k.ReadByte()) { f = false; break; }
i.Close(); k.Close();
}
else f = false;
if (!f) a[c] = d;
try { File.Delete(n); }
catch
{
t();
File.Delete(n);
}
Console.SetCursorPosition(0, 6); Console.Write(c + " / " + b + " [" + f + "]");
}}}
It works as the followings:
Imagine, that we would like to optimize one of the PAQ8 versions' data-compressing native application with this program. We would use the application at maximum level always later, so the parameter (excluding the names) would always '-9'.
The optimizer program have to test the differences always, after every cleared byte, and it must use a reference point/data, which was produced by a correct running. So we take a small file (imagine about 10 KB), called 'testfile.dat', and we compress it with the PAQ8 application, so the outcome will be 'testfile.paq8'. Everybody know, that in this case we used the 'paq8 -9 testfile.dat' command in the command line window. Then we keep the original ('testfile.dat') and the compressed ('testfile.paq8') test file, because we will need them later.
Then we run the optimizer code. First it ask the sample datafile of the program-work, which in this case is 'testfile.paq8', the compressed testfile. It will be used for comparing that if the modifited binary code works right. Secondly, we have to give a filename for the permanent testings. Pay attention, that later (in the next point) we have to use the same name with this outfile-name. This is the filename, which will be produced by the modifited version permanently and it will be compared with the static test file ('testfile.paq'). For example, give 'testfile2.paq8'.
Next, we have to give the command line parameters for the application, which used to run/use it. Pay attention, which I've written above, that if a filename have to be given for the application's product in it's parameters, it must be the same with the sample of test-work, in the second input. So in this case we have to input something like this : "-9 testfile2.paq8 testfile.dat"
And at last, we have to give the application's name, which we would like to "optimize". Then the program will shows how many bytes have been checked to replacing with a zero-byte, and if the replacing/clearing produced a "positive" test.
The main problem now - excluded the speed - is that in most cases if the testing produce a program-crash, the modifited application/EXE must be closed correctly before the next step.
As you can see, I've also tried the Windows' taskkill command for it, but sometimes it throw an exception in this case too, that the program still used, and the code stops. Maybe - I think - sometimes it takes a little more time to unload the EXE from the memory.
But WHAT SHOULD I DO?
Any ideas how to How to proceed?, What next? Is those right which I did?