image rars

more complicated than necessary

what is an image rar

An image RAR is a RAR containing images/pictures. An image archive is an archive (such as RAR, ZIP, etc) containing images. However, this article also applies to archives containing any kind of non-compressable files. A non compressable file is a file which does not become (significantly) smaller with more compression, regardless of the compression method used. The most common application of image archives is to store all pages of a manga/comic, or to store all photos of a set, in one file. Usually, image archives are released on internet for others to download/use. In practice, almost all image archives made by amateurs are in RAR format.

compression, theory

the compression used in archives is so called "lossless compression", which means decompression gives the exact original file back. This requires the compressed data to contain atleast the same amount of information as the original file, no information is lost. Lossy compression discards information from the original file, so the decompressed file can only be an approximation of the original. One can say any file contains an amount of "information", or "entropy", and an amount of "air", or "redundancy" - any amount of data which is not necessary to represent the information. One can call the amount of information in the file, divided by it's size, it's "information density".

a graphical representation of compression

original, compressable, file
informationair
compressed with bad, lossless compression
informationair
compressed with good, lossless compression
informationair
compressed with lossy compression
information 
lossy compressed file decompressed
informationair

solid archives

One reason why RAR and 7ZIP make smaller archives than ZIP is because they support solid archiving. A non solid archive compresses all files first, then adds them together. A solid archive first adds all files together, then compresses them. so if information repeats in every file, it only has to appear once in the archive:

original file A
common unique air
original file B
common unique air
original file C
common unique air
non solid archive:
common A   common B   common C  
solid archive:
common A B C air

compression in files other than archives

Many file formats used on the internet have their own compression methods, so the files can transfer quickly over slow connections, cause less traffic, and take less space on a disk, while they can still conveniently be used without having to "unpack" them all the time. some examples of formats using lossy compression are: MP3, JPEG, OGG, MPG. some examples of formats using lossless compression are PNG, GIF, FLAC, and archives. So such files are already compressed, so they won't become significantly smaller when trying to compress them again in an archive.

a mathematical approach to non-compressable files

To put it very simple, a file can be seen as a positive integer number. A lossless compression algorithm is a function which gives another, preferably smaller, number. This function has to be invertible - the decompression is the inverse - so it is injective: for every original file, there has to be atleast one compressed file. Given a number, there is only a limited amount of smaller numbers - there are less smaller numbers than the value of this number. So not every number can be turned into a smaller number by using the compression function, most numbers will have to become larger numbers.

archives used as "resource"

If a program, such as a game, a web browser, or an image viewer, needs to load files on the fly (for example, textures, sounds, images, etc), and this is not necessarily from real individual/local disk files, i like to refer to it as "resources". Examples:

Stepmania (a dance dance revolution simulator/game), can load it's data (music, graphics, etc) from either disk files, or from a zip file (with .smzip) extension.

Quake 3 Arena uses a data file with a .pk3 extension, it is in fact a zip archive.

Many games, maybe mostly older ones, loaded their data from custom archive formats made for that game. WAD for doom, PAK for quake, HOG for descent, GRP for build engine, etc.

Image viewers exist for viewing comic books/mangas, photo sets, etc, loading the images from a zip or rar archive on the fly, sometimes renamed to .cbz or .cbr.

Web browsers load web pages and images on the fly using the HTTP protocol.

Unreal Tournament can use the HTTP protocol to load needed maps and other data files when playing online.

Preferable properties of a data source for this kind of use is that it's easy to implement/program, and that one can obtain individual "files" randomly.

what is a ZIP archive?

i define a zip as a PKZIP 2.0 compatible, or "classic", zip archive, with only "deflate" (method 8) or "store" (method 0) compression. Practically all zip archives on internet are of this format, and this is the format which many archivers can open. I consider the newer, enhanced, and incompatible, PKZIP archives to not be zip files.

archive formats compared

What are desirable properties of an image archive format? Conclusion: complexity and advanced compression are not interesting for an image set, so being open, simple, compatible, and as widely usable as possible, are good criteria.

I think it is not necessary to have one archive format for all archives in the world, and multiple formats all have their purpose. I think ZIP is the best for image sets.

How to easily make image zips if you use winRAR: set the default archive format to ZIP. Now the rightclick menu will produce ZIP archives. You can set it back to RAR if you need to make RAR archives again.

external links

"why RAR sucks"

an image viewer. supports ZIP, but not RAR


back to main page