Results 1 to 5 of 5

Thread: File "Type" identification tool

  1. #1
    Member
    Join Date
    Jun 2011
    Location
    Germany
    Posts
    1
    Thanks
    0
    Thanked 0 Times in 0 Posts

    File "Type" identification tool

    Hi
    I have developed a tool to identify file types (or rather similar structure), so it may detect
    for example ppm files as bmp , map or raw (all images) , or zip and rar, (or compressed exe) as jpeg (all compressed), in order to build optimized models by structure. Because i have rather limited training data, i would like to hear about your results identifying some of your test data sets.

    I hope someone finds this usefull ( and i hope it works on your data , my test data is rather "skewed", mostly software projects and resources from video games)
    Attached Files Attached Files

  2. #2
    Member VoLT's Avatar
    Join Date
    Mar 2010
    Location
    Moscow, Russia
    Posts
    20
    Thanks
    2
    Thanked 1 Time in 1 Post
    this method by Luigi Auriemma ...
    Attached Files Attached Files
    Last edited by VoLT; 5th June 2011 at 09:14.

  3. #3
    Member m^2's Avatar
    Join Date
    Sep 2008
    Location
    Ślůnsk, PL
    Posts
    1,610
    Thanks
    30
    Thanked 65 Times in 47 Posts

  4. #4
    Member VoLT's Avatar
    Join Date
    Mar 2010
    Location
    Moscow, Russia
    Posts
    20
    Thanks
    2
    Thanked 1 Time in 1 Post
    m^2 in method by Luigi Auriemma informations was retrieved from TrID, and from http://toorcon.techpathways.com/uploads/headersig.txt

    by Luigi Auriemma

    informations retrieved from the following resources:
    - http://mark0.net/soft-trid-e.html (a very stripped-down version of it)
    - http://toorcon.techpathways.com/uploads/headersig.txt

    consider that I need only a quick check based on the first bytes because
    this function is used only rarely and needs to be fast and simple
    Last edited by VoLT; 5th June 2011 at 10:05.

  5. #5
    Expert
    Matt Mahoney's Avatar
    Join Date
    May 2008
    Location
    Melbourne, Florida, USA
    Posts
    3,257
    Thanks
    307
    Thanked 797 Times in 489 Posts
    Quote Originally Posted by m^2 View Post
    Sometimes...

    Code:
    C:\res\maxcomp>trid *
    
    TrID/32 - File Identifier v2.10 - (C) 2003-11 By M.Pontello
    Definitions found:  4320
    Analyzing...
    
    File: a10.jpg
     50.0% (.JPG) JFIF JPEG Bitmap (4003/3)
    
    File: acrord32.exe
     71.8% (.OCX) Windows OCX File (123521/4/18)
    
    File: english.dic
           Unknown!
    
    File: FlashMX.pdf
    100.0% (.PDF) Adobe Portable Document Format (5000/1)
    
    File: fp.log
           Unknown!
    
    File: mso97.dll
     46.6% (.AX) DirectShow filter (201555/2/20)
    
    File: ohs.doc
     36.2% (.FLO) iGrafx FlowCharter document (33500/1/5)
    
    File: rafale.bmp
    100.0% (.BMP) Windows Bitmap (2000/1)
    
    File: vcfiu.hlp
           Unknown!
    
    File: world95.txt
           Unknown!
    
    C:\res\maxcomp>cd ..\calgary
    
    C:\res\calgary>trid *
    
    TrID/32 - File Identifier v2.10 - (C) 2003-11 By M.Pontello
    Definitions found:  4320
    Analyzing...
    
    File: BIB
           Unknown!
    
    File: BOOK1
    100.0% (.AIML) Artificial Intelligence Markup Language (14500/1/4)
    
    File: BOOK2
           Unknown!
    
    File: GEO
           Unknown!
    
    File: NEWS
    100.0% (.PL) Perl script (4000/1/1)
    
    File: OBJ1
     44.3% (.CEL) Lumena CEL bitmap (63/63)
    
    File: OBJ2
    100.0% (.MPG/MPEG) MPEG Video (3000/1)
    
    File: PAPER1
           Unknown!
    
    File: PAPER2
           Unknown!
    
    File: PIC
     23.0% (.WK*) Lotus 123 Worksheet (generic) (2005/4)
    
    File: PROGC
           Unknown!
    
    File: PROGL
    100.0% (.DII) Summation Document Image Information Load File (4000/1/1)
    
    File: PROGP
    100.0% (.DPR) Delphi project source (16000/1/3)
    
    File: TRANS
           Unknown!
    
    C:\res\calgary>cd ..
    
    C:\res>trid enwik8
    
    TrID/32 - File Identifier v2.10 - (C) 2003-11 By M.Pontello
    Definitions found:  4320
    Analyzing...
    
    Collecting data from file: enwik8
    
    Warning: file seems to be plain text/ASCII
             TrID is best suited to analyze binary files!
    
     39.0% (.XML/ATOM) Atom web feed (35500/1/16)
     21.4% (.KML) Google Earth placemark (19500/1/6)
     20.3% (.SVG) Scalable Vector Graphics (18500/1/6)
     15.9% (.AIML) Artificial Intelligence Markup Language (14500/1/4)
      3.2% (.HTML) HyperText Markup Language (3000/1/1)
    
    C:\res>trid enwik9
    
    TrID/32 - File Identifier v2.10 - (C) 2003-11 By M.Pontello
    Definitions found:  4320
    Analyzing...
    
    Collecting data from file: enwik9
    
    Warning: file seems to be plain text/ASCII
             TrID is best suited to analyze binary files!
    
     49.6% (.XML/ATOM) Atom web feed (35500/1/16)
     25.8% (.SVG) Scalable Vector Graphics (18500/1/6)
     20.2% (.AIML) Artificial Intelligence Markup Language (14500/1/4)
      4.1% (.HTML) HyperText Markup Language (3000/1/1)
    
    C:\res>

Similar Threads

  1. The lie of "The world is a globe"
    By Vacon in forum The Off-Topic Lounge
    Replies: 2
    Last Post: 14th December 2009, 15:58
  2. PAQ8 C++ precedence bug (or "-Wparentheses is annoying")
    By Rugxulo in forum Data Compression
    Replies: 13
    Last Post: 21st August 2009, 20:36
  3. "decompilling" iso
    By SvenBent in forum Forum Archive
    Replies: 5
    Last Post: 1st April 2008, 00:18
  4. LZ77 speed optimization, 2 mem accesses per "round"
    By Lasse Reinhold in forum Forum Archive
    Replies: 4
    Last Post: 11th June 2007, 21:53
  5. Freeware "Send To" interface for CCM and QUAD
    By LovePimple in forum Forum Archive
    Replies: 2
    Last Post: 20th March 2007, 17:22

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •