Results 1 to 16 of 16

Thread: Using paq8 to measure image similarity

  1. #1
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts

    Using paq8 to measure image similarity

    I tried using paq8's RGB image model to measure symbol similarity:
    http://nishi.dreamhosters.com/u/ocr_test_1.zip

    The script:
    Code:
    rem removing bmp headers
    cutbmp 00000009.bmp 1
    cutbmp 0000017C.bmp 2
    cutbmp 000001A2.bmp 3
    
    rem concatenating images
    copy /b 1+2 12
    copy /b 1+3 13
    copy /b 2+1 21
    copy /b 2+3 23
    copy /b 3+1 31
    copy /b 3+2 32
    
    for %%f in (12,13,21,23,31,32) do (
      bin2bmp24r.exe 28 %%f %%f.bmp
      paq8px_v69.exe -6 %%f
      paq8px_v69.exe -6 %%f.bmp
    )
    The results:
    Code:
    // paq8px_v69
    502 12.bmp.paq8px  562 12.paq8px  498 21.bmp.paq8px  560 21.paq8px
    521 13.bmp.paq8px  586 13.paq8px  518 31.bmp.paq8px  590 31.paq8px
    494 23.bmp.paq8px  552 23.paq8px  496 32.bmp.paq8px  559 32.paq8px
    
    // BMF 2.01
    528 12.bmf  524 21.bmf
    544 13.bmf  548 31.bmf
    524 23.bmf  524 32.bmf
    
    // gralic 1.8 demo
    644 12.gra  643 21.gra
    648 13.gra  649 31.gra
    637 23.gra  639 32.gra
    
    // png+pngout+deflopt
    669 12.png  670 21.png
    678 13.png  676 31.png
    657 23.png  661 32.png
    0000017C.bmp and 000001A2.bmp contain different scans
    of the same symbol, and 00000009.bmp is a different symbol.

    In theory, compressing an image in context of another image is
    the best way to measure image similarity.

    But apparently we don't really have a model good enough for that
    (1-494/502)*100 = 1.59% smaller description size doesn't seem
    convincing enough to declare that symbols are similar based only
    on that.
    Attached Images Attached Images    

  2. #2
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    Yeah, that's a problem because most of the information content in images is low level noise that you can't compress and can't see. If we knew how to keep just the meaningful information in images, video, and audio, you could compress any of them to just a few bits per second.

  3. #3
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    I actually mean that (unfortunately) there're no models that can detect
    and use such dependencies.

    For example, these are contexts in RGB model of paq8px69 (which is noticeably better than paq8p)
    Code:
    buf(1) buf(2) buf(3) buf(4) buf(6)
    buf(w-2) buf(w-3) buf(w) buf(w+1) buf(w+3)
    buf(w*2-6) buf(w*2-3) buf(w*2) buf(w*2+6)
    
    *** *
     ***
    **X
    With only that its hardly possible to find a matching fragment in the
    already processed part of the picture, it can be compared to order2 CM on text.

    But actually its even worse than it seems, because pixels from listed
    locations are never used as context independently, and half of them only
    appear in secondary contexts.

    So instead of pattern matching, paq8 basically only does some noise shaping
    (and other image coders still show worse compression) - no wonder that it
    doesn't care whether we repeat the same picture in the stream or different
    ones with similar statistics.

    Sure, there's a large part of noise in losslessly compressed representations
    of images. But I still think that a real image model can be used for recognition in cases like
    my example - similar images would be better compressed together than different ones,
    even if half of output still would be the compressed noise.

  4. #4
    Member
    Join Date
    Sep 2008
    Location
    France
    Posts
    863
    Thanks
    460
    Thanked 257 Times in 105 Posts
    There are several thesis currently working on patch-based image & video compression.
    They are expressly using similarities between source image and patch database to approximate a result (lossy compression).
    You may find some interesting formulas to evaluate similarities into them.

  5. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    I don't think that match detection methods from video coding would be relevant for compression of 28x28 kanji scans - afaik
    they mostly work with fixed-size blocks, like 8x8.
    But if you still think that I missed something, please post some links.

  6. #6
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    The paq8px_v69 BMP contexts are not all used at once. They are mostly used in small groups like **X from different angles, so there is no high level feature detection or long range contexts. It would be very slow and it wouldn't help compression much to add them because what you see in an image is such a tiny fraction of the information content.

    I think image comparison needs to be based on visible feature detection. Maybe you could try concatenating the images, converting to low quality JPEG, and then do the compression distance measure with JPEG capable compressors. Or probably better, align the images and subtract them.

  7. #7
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > They are mostly used in small groups like **X from different angles

    Sure, deltas and extrapolations, like
    Code:
        cm.set(hash(++i, buf(w)+buf(1)-buf(w+1)));
    are necessary too.
    But its a fact that it can't find a matching symbol in a book scan.

    > It would be very slow

    There's not much sense to discuss speed of paq8 - imho its
    only real application is redundancy measurement for evaluation
    of faster coders.
    But here it looks like I can't do that either.

    > and it wouldn't help compression much to add them because
    > what you see in an image is such a tiny fraction of the
    > information content.

    But it would, at least for some types of images:
    http://en.wikipedia.org/wiki/Motion_...trated_example

    Also this:
    Quote Originally Posted by wikipedia
    WebP is based on block prediction. Each block is predicted on the
    values from three blocks above it and from one block left to it
    (block decoding is done in raster-scan order: left to right and top
    to bottom). There are four basic modes of block prediction:
    horizontal, vertical, DC (one color) and TrueMotion. Mispredicted
    data and non-predicted blocks are compressed in a 4x4 pixel
    sub-block
    > I think image comparison needs to be based on visible feature detection.

    Sure, in a way, but here I don't know how to extract "visible features"
    without losing information.
    The previous post about this - http://encode.su/threads/623-Japanese-OCR-evaluation
    shows that recognition quality is too low even with commercial OCR apps,
    so I think that the common approach "approximate everything with 8x8x1bit pictures,
    then find matches" won't give 100% results here either (even though it would work
    with my example here, but there're more complex cases).

    > Maybe you could try concatenating the images, converting to low
    > quality JPEG, and then do the compression distance measure with JPEG
    > capable compressors.

    I considered that, but jpeg adds too much extra data - with huffman
    and quantization tables, and low quality, the actual image would be
    like 10% or less of file size.
    It might kinda work with upscaled pictures (I tried an image comparison
    app which worked that way), but first I'd need to implement something like
    http://en.wikipedia.org/wiki/Hqx (statistical upscale) for my type of data.

    > Or probably better, align the images and subtract them.

    That looked obvious to me as well, but compensating shift+rescale+rotation
    at subpixel precision is not very easy, so I hoped that it
    would just magically work with paq8 and thus tested it.
    I actually expected that paq8 would do much better though...
    (exactly because these are small images where even a 3x3 context could work)

    Anyway, I'm attaching an upscaled picture here, where you can see
    why there's no sense in subtracting them without some anti-antialiasing upscale.
    Attached Thumbnails Attached Thumbnails Click image for larger version. 

Name:	01a.png 
Views:	238 
Size:	4.0 KB 
ID:	1463  

  8. #8
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    I've read that more than 60% of what we see (details) is generated by brain. Sometimes people think they see some shape, but at a closer range it turns to be wrong belief. Also eyes have low sensitivity to high frequencies (in images at least) and high frequencies are hardest to compress. Additionally most monitors are TN based which have 18-bit color + dithering so many details are gone when displayed.

    Shelwien:
    Have you tried neural networks for character recognition? Especially those that use multistep downscaling and convolution.
    Last edited by Piotr Tarsa; 6th January 2011 at 01:32.

  9. #9
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    > I've read that more than 60% of what we see (details) is generated by brain.

    Sure (60% is too much though), but here all the differences are caused by technical issues.
    There's a japanese tradition to post book scans as archives of low-res jpg/png pages
    (something like 832x1200) - OCR badly fails there (kinda as expected).
    So samples #2 and #3 in my example are occurrences of the same symbol from the same book page,
    and its not really a psychovisual problem to determine whether they're similar.

    > Have you tried neural networks for character recognition? Especially those that use multistep downscaling and convolution.

    No, but I think that modern CM models do that better anyway. In fact the paq ancestor was based on NN.
    Also the main problem here is "alignment" - its hard to match even a single straight line, if its rotated and
    downscaled with antialiasing.

  10. #10
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    I bet PAQ didn't performed convolution

    Here's outdated version of LeNet:
    http://yann.lecun.com/exdb/lenet/index.html

    And two PDFs I've presented on seminars, they have results for newer versions of LeNet:
    http://yann.lecun.com/exdb/publis/pd...o-lecun-07.pdf
    http://jmlr.csail.mit.edu/papers/vol...ochelle09a.pdf

    Results are quite impressive.

  11. #11
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    Thanks, I'll look into it.

  12. #12
    Member
    Join Date
    Apr 2010
    Location
    El Salvador
    Posts
    43
    Thanks
    0
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Piotr Tarsa View Post
    I bet PAQ didn't performed convolution
    P5 didn't have a perceptron, nor multiple outputs. It was "just" a statistical NN with 2 layers and 1 output.

  13. #13
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    > There's not much sense to discuss speed of paq8

    Yes, it is slow enough. But what you want here is a model of the human visual system, which would be even slower. But it is massively parallel and you could probably accelerate it over multiple cores or on a GPU. It is basically a neural network with many layers, trained bottom up 1 or 2 layers at a time. The lower layers detect simple features like lines, edges, and corners with various rotations over various small regions. The higher layers detect whole symbols over larger regions using learned relations between the lower level features. The lower level weights can probably be fixed or hand tuned and replicated over a range of locations, rotations, and scale factors. The higher level features should be trained on your image data and a large text corpus. If you were doing this in English, for example, the top level neurons would detect whole words or phrases independent of font, scale, color, etc. as you scanned a window simulating the movement of your eyes across the text (which actually jumps from word to word). There really is a lot of computation in doing this. The visual cortex makes up maybe 10% of the human brain, which would be something like 10^10 neurons and 10^14 synapses with 100 ms response time, something like 1000 teraflops. You could probably get by with less but you won't match human performance. It's why OCR hasn't caught up to your eyes.

  14. #14
    Member
    Join Date
    Jun 2009
    Location
    Kraków, Poland
    Posts
    1,474
    Thanks
    26
    Thanked 121 Times in 95 Posts
    Realtime processing is not a requirement for researchers. If a computation requires 1000 TeraFLOP then you can do this with 1 TeraFLOPS computer during 1000 seconds

  15. #15
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,255
    Thanks
    306
    Thanked 779 Times in 486 Posts
    True. But you still need about 100 TB memory to simulate 10^14 trainable synapses.

  16. #16
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,348
    Thanks
    212
    Thanked 1,012 Times in 537 Posts
    I didn't say anything about full psychovisual model being necessary here :)
    And I still think that matching these symbols with paq8 was a reasonable idea... if only its im24 model was a little better... :)

Similar Threads

  1. FP8 (= Fast PAQ8)
    By Jan Ondrus in forum Data Compression
    Replies: 65
    Last Post: 1st April 2019, 11:05
  2. PAQ8 - Download Page
    By Jan Ondrus in forum Data Compression
    Replies: 7
    Last Post: 7th October 2010, 22:14
  3. deflate model for paq8?
    By kaitz in forum Data Compression
    Replies: 2
    Last Post: 6th February 2009, 21:48
  4. PAQ8 tests
    By kaitz in forum Forum Archive
    Replies: 4
    Last Post: 17th January 2008, 15:03
  5. PeaZip v1.3 now with PAQ8 support!
    By LovePimple in forum Forum Archive
    Replies: 29
    Last Post: 9th February 2007, 16:58

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •