PDA

View Full Version : OT: Compression at the bit level



serberus
01-20-2004, 09:31 AM
Firstly, i'm sorry for posting this here - I don't go to any other "coder" type forums because I don't code and I couldn't find an off topic/general forum on the SEQ forums so i'm afraid this has ended up here.

I've been thinking about compression and i've been wondering why nobody compresses things at the bit level, seeing as how every file essentially boils down to a very long string of 1's and 0's when you break it down, would this not be a very easy thing to compress?

I imagine a run length compression type scheme would work and could be backed up by something else.

The problem that makes me think if it would work or not is that your storing the 8 bits of data from a byte as 8 individual bytes (if you store the whole file as a binary string) meaning you've just multiplied the size of your file by 8, so you'd need atleast a compression factor of 8 minimum to just keep your file the same size.

I then thought, well, you could use a byte to store 8 bits, but this is what happens anyway so nothing's changing.

I've just hit my confusion point, i'm trying to follow my train of thought but now i'm stuck so I don't know what else to say.

I'll try to re-emphasise my point a bit clearer.

"Text" = "74h 65h 78h 74h" = "01110100011001010111100001110100"

so, because binary is so simple, is it easy to compress? or, because binary is so simple, it's inherently large (8 times larger than hex) and it's not possible to compress it that well because it's simple and limited?

Thank you to anyone who provides some information, i'm sure this has been discussed/attempted before but i'm not sure where to start looking (a search on google turned up information about "unconditional bases and bit-level compression" which didn't help me much).

Thanks

Serberus

fester
01-20-2004, 11:02 AM
http://www.webopedia.com/TERM/H/Huffman_compression.html

ksmith
01-20-2004, 11:57 AM
Almost everything you'll ever use is a variant of LZ77.

Original LZ77 paper (http://compression.graphicon.ru/download/articles/lz/ziv_lempel_1977_universal_algorithm.pdf) (not light reading)

What is LZ77? (http://foldoc.doc.ic.ac.uk/foldoc/foldoc.cgi?query=LZ77&action=Search) (brief overview)

Deflate (ie. zlib, gzip, pkzip) is a LZ77 derivative that uses Huffman coding.

RFC1951 - DEFLATE Compression (http://www.faqs.org/rfcs/rfc1951.html)

ksmith
01-20-2004, 12:51 PM
so, because binary is so simple, is it easy to compress? or, because binary is so simple, it's inherently large (8 times larger than hex) and it's not possible to compress it that well because it's simple and limited?

You're thinking about things the wrong way. The data is independent of how it is displayed. Things are done in bytes, words, and dwords, because that is what processors are designed to handle efficiently. Compression (more speicifcally lossless compression) is about finding patterns and replacing them with instructions on how to rebuild them.