|
Linux Roadmap |
<< Top |
< Previous |
Next: The Whole Shebang, or What's in a Script > |
Many people just assume the computer will do the right thing with their files, without understanding how the computer could possibly know what the right thing may be. Double-click a file or a web link and it magically opens the correct program to load that data. How does a computer associate a given file with a given program?
There are three basic approaches used on today's computers to associate various files with application programs.
- explicit external type declarations (scripts and mime)
- implicit naming type declarations (filename extensions)
- implicit data analysis declarations (magic and shebang)
An explicit external type declaration is formed when some external piece of information explicitly describe's the type of data, such as a command script or a MIME header like "
text/plain".Web browsers often rely on the use of explicit types like MIME headers, to know how to render a given website or image. The web server prefaces each element on every file requested with a few lines of text, and MIME type is one of the things included in that header. The web browser reads those lines but does not display them. Email attachments also usually require MIME types to make sense of them.
An implicit naming type declaration is managed through an external association table that is based on the naming conventions for the file, such as a filename's ending extension like "
.bat" or ".jpg".MS-DOS and Windows is heavily dependent upon the filename extension to know what application should be run. MS-DOS knows that
.BAT,.COMand.EXEfiles are executable. Windows maintains a table of "file associations" that identifies that.TXTfiles should be opened with Notepad, that.XLSfiles should be opened with Microsoft Excel, and so on.A typical Windows system may have a few hundred file types listed in that file association table. When developers create new software programs, they have to choose their filename extensions carefully to avoid conflicting with other existing programs. Indeed, some extensions may be ambiguous and some applications clobber that table's entries for accidental or competitive reasons.
Additionally, Windows often tries to hide file extensions from users. Trojan and virus programs often use inconsistent behavior to their advantage: the filename
BritneySpears.jpg.exelooks to the user to contain a harmless JPEG-type image, but appears to the system to contain an executable program.Some web server applications internally rely on such implicit type associations as well, when explicit type data are not available.
An implicit data analysis declaration is an internal or external association that is based on peeking at the actual data in the file, such as magic numbers or the shebang notation like "
#!/usr/bin/perl". More about this shebang notation is in the following section.Unix and Linux rely mostly on this scheme for file type determination. Any file may contain any kind of data, and the name itself doesn't matter. (In fact, most commands are just script files that have no filename extension.) When new file data types become somewhat widespread, a developer adds some tips and heuristics to "fingerprint" the data to correctly recognize files of that type in the future. A command called
filecan apply those heuristics to any given file, and report back the file's data type, if known. Since most of the heuristics rely on the first two or four bytes of a file, it's quick and pretty reliable, it can't be fooled by just renaming the file, and gives more possibilities than three or four letters in a filename. These heuristic bytes are often called the 'magic number' to describe its type.The
filecommand can be used at any time to find out the sort of data that is found in a given file. Just provide the filename of the file to examine. Here is the output when given thebashexecutable file for a test.
$ file /bin/bash /bin/bash: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), strippedTo demonstrate that it's the contents and not the filename, try something like the following.
$ file britney.jpg britney.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), 72 x 72 $ mv britney.jpg britney.exe $ file britney.exe britney.exe: JPEG image data, JFIF standard 1.01, resolution (DPI), 72 x 72Some helpful manual pages on your Linux system may be (
man ls), (man file), and (man chmod).Some helpful google searches may be
linux mime file types, and
linux file formats and magic numbers.
<< Top |
< Previous |
Next: The Whole Shebang, or What's in a Script > |
|
Contact Ed Halley by email at
ed@halley.cc. Text, code, layout and artwork are Copyright © 1996-2005 Ed Halley. Copying in whole or in part, with author attribution, is expressly allowed. Any references to trademarks are illustrative and are controlled by their respective owners. |
|