When browsing open-source projects that are primarily developed for Linux systems and downloading the latest packages, the source code is always stored in a .tar.gz or .tar.bz2 file.
Is there any reason for using .tar.gz or .tar.bz2 rather than something like .zip or .rar or some other compression algorithm (or even leaving it uncompressed if the project is small enough)?
21
To answer the question in the heading: tar.gz/tar.bz2 became the standard for distributing Linux source code a very very very long time ago, as in well over 2 decades, and probably a couple more. Significantly before Linux even came into existence.
In fact, tar stands for (t)ape (ar)chive. Think reel hard, and you’ll get an idea how old it is. ba-dum-bump.
Before people had CD burners, distros of software were put out on 1.44Mb floppy disks. The compressed tar file was chopped into floppy-sized pieces by the split
command, and these pieces were called tarballs. You’d join them back together with cat
and extract the archive.
To answer the other question of why not Zip or Rar, that’s an easy one. The tar archiver comes from Unix, while the other two come from MS-DOS/Windows. Tar handles unix file metadata (permissions, times, etc), while zip and rar did not until very recently (they stored MS-DOS file data). In fact, zip took a while before it started storing NTFS metadata (alternate streams, security descriptor, etc) properly.
Many of the compression algorithms in PKZip are proprietary to the original maker, and the final one added to the Dos/Windows versions was Deflate (RFC 1951) which performed a little better than Implode, the proprietary algo in there that produced the best general compression. Gzip uses the Deflate algorithm.
The RAR compression algorithm is proprietary, but there is a gratis open source implementation of the decompressor. Official releases of RAR and WinRAR from RARlab are not gratis.
Gzip uses the deflate algorithm, and so is no worse than PKZip. Bzip2 gets slightly better compression ratios.
TL;DR version:
tar.gz and tar.bz2 are from Unix, so Unix people use them. Zip and Rar are from the DOS/Windows world, so DOS/Windows people use them. tar
has been the standard for bundling archives of stuff in *nix for several decades.
2
I don’t know about when, but I imagine the reason why it’s used is a combination of: tar being traditional (it’s very old); easy management from a command line; tar preserving file system info that ZIP or RAR may not; and the two pass process means that compression is more efficient (one big file compressing better than many little files).
bzip2 (.bz2) seems to be displacing gzip (.gz) as it provides better compression, in much the same way that gzip itself displaced the earlier compress (.Z).
1
In essence, archiving and compressing are two different operations. The tar.gz very clearly shows the intention: a compressed archive whereas a .zip or .rar just shows it’s some compressed stuff.
tar
is traditional in unix, it combines files but doesn’t necessarily compress them.
Compressing them with .g or .bz or .b2 is just as easy.
Zip
and rar
are propriety and more common in the Windows world
3
It’s traditional, ubiquitous, and it works. Plus I thought it was somewhat self apparent.
Update
My apologies, I forget most people don’t know what I know or have experience as an administrator in heterogeneous environments.
Tradition as in a custom or practice ingrained over time. We know it has basis in history because tar derives from Tape ARchive referencing the old tape backup technology. It has a long history in the various Unix operating systems dating back to 1979 in 7th edition Unix where it replaced tp. Linux systems are usually an amalgamation of the Linux Kernel and GNU software of which GNU tar is a part of. All this tar history means a majority of experienced technical people know how to use it without having to refer to documentation because it’s been ingrained. For newer users there is plenty of documentation because the software has been around for so long.
Ubiquitous as in appearing or found everywhere. A somewhat accepted misuse is where the appearance isn’t universal, but in large enough percentage of the population to be accepted as ubiquitous. 7th Edition Unix is the ancestor of the biggest versions of Unix including Sun OS/Solaris, AIX, HPUX, AIX, BSD, etc. There is also a high degree of cross-compatibility across the different implementations of tar on unix. Since MacOS (since OS 10) has been based on BSD it also has tar. Linux uses GNU software which includes GNU Tar so tar is available on all flavors of Linux. AND, while not available as a builtin there are many implementations of tar available on Windows including GNU Tar through cygwin and natively. GNU Tar in particular is available on most Unices and Windows making it the good choice for file migrations across OSes.
Works as in it’s been functioning for a long time without major modifications. It’s available on all major platforms out of the box (except for Windows, where it’s available as additional software). The format is also supported on all major platforms which facilitates interchange between platforms. Not only is it still used as a way to make easily portable archives, but a tar-pipe is a standard Unix idiom for copying directory trees, especially across heterogenous environments. In short, it’s been around and still in heavy use because it does what it does well.
8