Tar (File Format)

In computing, the tar file format is a type of archive file format: the Tape ARchive format. These files are produced by the Unix command tar and were standardized by POSIX.1-1998 and later POSIX.1-2001. It is used widely to archive and unarchive files, which means to accumulate a large collection of files into a single archive file (packer), while preserving file system information such as user and group permissions, dates, and directory structures. tar was originally developed for use with sequential access devices such as tape drives, specifically for backup purposes. Somewhere along the line the -f argument was added and tar is now more frequently used as a general archive utility. tar's linear roots can still be seen in its slow partial extraction performance, when it has to read through the whole archive to extract only the final file. Commonly a tar file is referred to as a tarball. As is common for Unix utilities, tar is a single specialist program. It follows the Unix philosophy in that it can "do only one thing" (archive), "but do it well". If one then wants to compress the archive, one uses a separate program that is specialised in compression. tar is most commonly used in tandem with an external compression utility such as gzip, bzip2 or, formerly, compress, since it has no built in data compression facilities. These compression utilities generally only compress a single file, hence the pairing with tar, which can produce a single file from many files. One might think this requires more steps, but it is possible to use the Unix pipe capability to combine the two steps manually. Also, the GNU version of tar supports the command line options -z (gzip), -j (bzip2), and -Z (compress), which will compress or decompress the archive file it is currently working with.

Usage

  • To pack tar files, use the following commands:
    • for an uncompressed tar file:
    • :tar -cf packed_files.tar file_to_pack1 file_to_pack2 ...
    • to pack and compress (one step at a time):
    • :tar -cf packed_files.tar file_to_pack1 file_to_pack2 ...
      gzip packed_files.tar
    • to pack and compress all at once:
    • :tar -cf - file_to_pack1 file_to_pack2 ... | gzip -c > packed_files.tar.gz
  • To unpack tar files, use the following commands:
    • for an uncompressed tar file:
  • ::tar -xf file_to_unpack.tar
    • to decompress and unpack one step at a time:
    • :gunzip packed_files.tar.gz
      tar -xf packed_files.tar
    • to decompress and unpack all at once:
    • :gunzip -c packed_files.tar.gz | tar -xf -
To use bzip2 instead of gzip, simply replace the commands above with bzip2 where gzip is used and bunzip2 where gunzip is used.

GNU tar only

GNU tar (from the FSF) has a compression flag feature making it easier to archive and compress gzipped or bzipped tarballs in one go. The following commands can be used to take advantage of this:
  • To pack and compress:
  • :tar -czf packed_files.tgz file_to_pack1 file_to_pack2 ...
  • :tar -cjf packed_files.tbz2 file_to_pack1 file_to_pack2 ...
  • :tar -cZf packed_files.tar.Z file_to_pack1 file_to_pack2 ...
    • using some other arbitrary compression utility that works as a filter:
  • :tar --use-compress-program=name_of_program -cf packed_files.tar.XXX file_to_pack1 file_to_pack2 ...
  • To uncompress and unpack:
    • a gzip compressed tar file:
  • :tar -xzf file_to_unpack.tar.gz
    • a bzip2 compressed tar file:
  • :tar -xjf file_to_unpack.tar.bz2
    • a compress compressed tar file:
  • :tar -xZf file_to_unpack.tar.Z
    • an arbitrary-compression-utility-compressed tar file:
  • :tar --use-compress-program=name_of_program -xf file_to_unpack.tar.XXX

Filename extensions

The following is a list common file extensions for uncompressed and compressed tar archives:
  • tar file:
    • .tar
  • gzipped tar file:
    • .tar.gz
    • .tgz
    • .tar.gzip
  • bzipped tar file:
    • .tar.bz2
    • .tar.bzip2
    • .tbz2
    • .tbz
  • tar file compressed with compress
    • .tar.Z
    • .taz

MIME-Type

  • application/x-tar

File format details

A limitation of early tape drives was that data could only be written to them in 512 byte blocks. As a result data in tar files is arranged in 512 byte blocks. A tar file is the concatenation of one or more files. Each file is preceded by a header block. The file data is written unaltered except that its length is rounded up to a multiple of 512 bytes and the extra space is zero filled. The end of an archive is marked by at least two consecutive zero-filled blocks.

File header

The file header block contains metadata about a file. To ensure portability across different architectures with different byte orderings, the information in the header block is encoded in ASCII. Thus if all the files in an archive are text files, then the archive is essentially an ASCII file. The fields defined by the original Unix tar format are listed in the table below. When a field is unused it is zero filled. The header is padded with zero bytes to make it up to a 512 byte block.
ield Size Field
|100 |File name
00 8 File mode
08 8 Owner user ID
16 8 Group user ID
24 12 File size in bytes
36 12 Last modification time
48 8 Check sum for header block
56 1 Link indicator
57 100 Name of linked file
For historical reasons numerical values are encoded in octal with leading zeroes. The final character is either a nul or a space. Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. This gives a maximum file size of of 8 gigabytes on archived files. To overcome this limitation some versions of tar, including the GNU implementation, support an extension in which the file size is encoded in binary. The checksum is calculated by taking the sum of the byte values of the header block with the eight checksum bytes taken to be ascii spaces (value 32). It is stored as a six digit octal number with leading zeroes followed by a nul and then a space.

USTAR format

Most modern tar programs read and write archives in the new USTAR format, which has an extended header definition. Older tar programs will ignore the extra information, while newer programs will test for the presence of the "ustar" string to determine if the new format is in use. The USTAR format allows for longer file names and stores extra information about each file.
ield Size Field
|156 |(as in old format)
56 1 Type flag
57 100 (as in old format)
57 6 USTAR indicator
63 2 USTAR version
65 32 Owner user name
97 32 Owner group name
29 8 Device major number
37 8 Device minor number
45 155 Filename prefix

Example

The example below shows the hex dump of a header block from a tar file created using the GNU tar program. It was dumped with the od program. The "ustar" magic string can be seen, meaning that the tar file is in USTAR format.
  0000000   e   t   c   /   p   a   s   s   w   d nul nul nul nul nul nul  0000020 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul  *  0000140 nul nul nul nul   0   1   0   0   6   4   4 nul   0   0   0   0  0000160   0   0   0 nul   0   0   0   0   0   0   0 nul   0   0   0   0  0000200   0   0   4   1   3   5   5 nul   1   0   1   5   5   0   6   1  0000220   1   0   5 nul   0   1   1   5   5   6 nul  sp   0 nul nul nul  0000240 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul  *  0000400 nul   u   s   t   a   r  sp  sp nul   r   o   o   t nul nul nul  0000420 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul  0000440 nul nul nul nul nul nul nul nul nul   r   o   o   t nul nul nul  0000460 nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul nul  *  0001000 

See also

External links

 

<< PreviousWord BrowserNext >>
river avon, warwickshire
river avon, hampshire
river avon, bristol
kennet and avon canal
list of glossaries
stratford upon avon canal
trent's last case
walt disney studios
aire and calder navigation
line drawing algorithm
wilts and berks canal
very large telescope
fomalhaut
normal
paddy driscoll
coney island
communities, regions and provinces of belgium
plushophile
i, libertine
hell's angels
hells angels
john shelby spong
motorcycle gang
galaxy science fiction
birmingham and fazeley canal
glasgow, paisley and johnstone canal
thames and medway canal
thames and severn canal
trent and mersey canal
staffordshire and worcestershire canal
roger mellie
general certificate of secondary education
oisc
rebec
romano serbian language
nicholas barbon
carlos de leon
ziggy marley
the rascals
the cult
aztec calendar
john barbour
alexander barclay
blanche of castile