I recommend you also to read the "compression" article on Wikipedia, and links to different algorithms from it, since you seem (from comments) to be interested in that: en.wikipedia.org/wiki/Data_compression. You can for example check the LZW algorithm: en.wikipedia.org/wiki/LZW, to understand more why there are no long sequences of same bit.
–
GnoupiFeb 4 '10 at 14:19

4 Answers
4

You're asking about binary strings. In a general sense, there is no limit to the number of 0's or 1's that can appear in sequence. That is, an infinitely-long string of 0's is valid binary.

You're also asking about binary formats. Data in a computer aren't just random binary strings; they're formatted in a particular way so that specially-designed machines called computers can process them, either as information (like an MP3) or as instructions (like winamp.exe), or even as transmission encodings (like the encodings used in USB or Ethernet).

In a practical sense, you won't find arbitrary-length strings of 0's or 1's in executable code. Transmission encodings, if they aren't synchronized via another method, may insert extra bits after a certain number of data bits, so arbitrary-length strings won't be found there either. Data formats can be more flexible, and some will allow long strings of 0's or 1's, but formats like MP3 require regular markers (again, for synchronization), so even an MP3 of silence won't contain all 0's.

So: could a binary string contain a sequence of 9 zeroes? Sure, it's quite possible, and probably very common. Could a particular binary format contain that? Maybe. But it's impossible to tell without specifying what format.

... and when i say "arbitrary-length", i'm not talking about "pick a number between 2 and 40", i'm talking about "pick a number between 0 and infinity". in other words, read "arbitrary-length" as "really really long".
–
quack quixoteFeb 4 '10 at 13:24

how could i get the binary code of a particular file... all i am interested is compressing files... so would i be able to get the binary code of any given file? or does a file not contain binary codes?
–
ValFeb 4 '10 at 13:30

well, you already have the binary code of the file; that's what's on the disk. to see it, you could use a hex editor like WinHex (and then convert hex to binary), or if on unix/linux/cygwin, use od to view as hex or octal (and then convert to binary).
–
quack quixoteFeb 4 '10 at 13:36

thanks, quak, i was looking to create a program that could look up a files binary code... lets say it's : "100010001" i would compress it and give a key code on screen : "ABC" which = 100010001. then when decompresing allow the user to enter ABC which will write create a file with binary code: 100010001. my understanding is that winhex is a user interface program? or can it be used by lets say visual basic to access the hex code or binary code? or convert it to binary... don't know if im making much sense or if i got this idea all wrong...
–
ValFeb 4 '10 at 13:46

ah. WinHex and od are tools that you would use to examine a file's contents for yourself. if you're trying to manipulate the file programmatically, you'd need to read the file and use tools available to you in whatever programming language you're using.
–
quack quixoteFeb 4 '10 at 14:03

It is possible that it is repeated any number of times. Binary and decimal are almost the same.
If you have a binary number abcde, it just means that abcde = a*2^4+b*2^3+c*2^2+d*2+e.

So if you want to write 0, to a file, you'll have to write a byte of all 0s, and if you have a long sequence of 0s.

Also for example 10000000(binary) = 128(decimal), and if you want to multiply it by 2 any number of times, you'll have to add so many 0s to the end of the number.
(I've made a little mixture between real numbers, and computer representation of numbers, but I think you can see the general idea).

Edit: As a continuation to the questions in your comments:

Any programming language is able to open files to read in binary format.(Here by binary it's meant as binary versus text)
And you're most likely to find such strings in uncompressed image files, like uncompressed BMP format, output by painter.

Your compression technique is one of the first used to compress images, I think they were named RLE, after the name of the compression, but I really don't remember exactly. (RLE on wikipedia)

If you want to read more about compression, you can look on wikipedia, since it has a lot of information. Specific widely used and not very complex compressions is Huffman coding, and also you can take a look at Lempel Ziv used by zip.

so it is more than likely that .mp3 file or .doc file or any other file might have a 0 or 1, repeated x amount of times...? also is it possible to read the binary code of any given file? if so how would i do that?
–
ValFeb 4 '10 at 13:21

Do a search for Hex Editors if you want raw access to a file. But don't edit the file and expect it to remain valid. Hex can be converted to binary with a simple 16 line lookup table, or use Windows Calculator in scientific mode.
–
MartinFeb 4 '10 at 13:27

It's exactly equivalent to asking how many zeros, or ones, or twos etc you can have in a row in a decimal number. As many as you want, why would there be a limit?

There is, of course, a limit to the maximum size that any particular variable/file/disk can hold, but that is a practical matter which is secondary to the maths.

More specifically, if you asking how likely is a particular sequence of zeros in an MP3 file, because that its a compressed file format, it is less likely as the length of the sequence increases. You'll find many pairs of zeros, but fewer runs of three, and even fewer runs of 4 etc. The compression routines are specifically looking for patterns so are removing them to reduce the file size, and replacing them with a reference to the pattern - as an approximate explanation of file compression.

Exactly, it is highly dependent on the fact that this is a compressed file. Another example would be a BMP image, fully black. It would contain long streams of zeros. Any compressing format which would come on this would map the "area" of black, and save only color and the coordinates of this "area", leading to a very small file, with no long sequences of the same binary.
–
GnoupiFeb 4 '10 at 13:44

A file that was all 1s or all 0s wouldn't be a very interesting file. If it were all zeros then it wouldn't contain any meaningful data. If it were all ones then there could be some data present, but it would depend on what format the file was supposed to be.

A file will only contain "interesting" data if it consists of patterns of 1s & 0s and, depending on the encoding of the file, these could be any length. Though long runs of one or the other will be unlikely.

This is a pure guess but I would expect that any type of file will contain roughly the same number of 1s & 0s and just looking at the binary data wouldn't tell you what type of file it was. You would have to interpret the stream for ascii codes, number etc. to extract meaning.

To answer your 2nd question in your comment on @SurDin's answer - yes it is possible to read any file as a binary stream, but that will depend on the language used to write the program.