Results 1 to 5 of 5

Thread: Search text in a compressed TXT file without unpacking it completely

  1. #1
    Member
    Join Date
    May 2012
    Location
    usa
    Posts
    24
    Thanks
    5
    Thanked 0 Times in 0 Posts

    Cool Search text in a compressed TXT file without unpacking it completely

    Hi there,
    In a situation where I have a huge .TXT file where it's better to compress it.
    Situation 1: Huge md5 hash list ( containing words and their hashes).
    Situation 2: Huge english dictionary ( containing words and their meaning).

    Now my question is: Is there any GOOD archive that can compress a huge .txt file, and later-on allows you to search for a certain string of text without unpacking the entire .txt file.

    Which archive do you recommend for Situation 1?
    Which archive do you recommend for Situation 2?

    It's like stream viewing compressed txt?
    Waiting for your replies

  2. #2
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,976
    Thanks
    296
    Thanked 1,303 Times in 740 Posts
    1) Rainbow Table

    2) Bitcode compression (eg. huffman) would allow for search in compressed form.
    Its also possible to build an "ordered bitcode" which would let you use binary search.
    Another approach is to use BWT+bitcode, its quite possible to look up strings in transformed data,
    but that would require either an index (can be temporary), or multiple scans of the whole BWT file.

    Also this may be related: https://encode.su/threads/3025-Achie...-short-strings

  3. Thanks:

    2pact (24th October 2018)

  4. #3
    Member
    Join Date
    Dec 2011
    Location
    Cambridge, UK
    Posts
    506
    Thanks
    187
    Thanked 177 Times in 120 Posts
    You may want to read up on FM-Index and Compressed Suffix Arrays. See also the Succinct Data Structure Library implementations which may help.

    Basically these are BWT style transforms with sufficient indexing to permit random access searching.

  5. Thanks:

    2pact (24th October 2018)

  6. #4
    Member
    Join Date
    May 2012
    Location
    usa
    Posts
    24
    Thanks
    5
    Thanked 0 Times in 0 Posts
    Thank you all. Though are there any good implementations on Github?

  7. #5
    Administrator Shelwien's Avatar
    Join Date
    May 2008
    Location
    Kharkov, Ukraine
    Posts
    3,976
    Thanks
    296
    Thanked 1,303 Times in 740 Posts

Similar Threads

  1. Cracking an old MS-DOS game's compressed file.
    By Peter Swinkels in forum Data Compression
    Replies: 47
    Last Post: 29th August 2018, 22:46
  2. Replies: 8
    Last Post: 23rd September 2016, 13:41
  3. Request on gamedata compressed file
    By theruler in forum Data Compression
    Replies: 0
    Last Post: 29th May 2016, 20:58
  4. Help on compressed file from OLD DOS game
    By theruler in forum Data Compression
    Replies: 3
    Last Post: 16th August 2015, 12:18
  5. Replies: 3
    Last Post: 10th November 2007, 21:32

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •