Notes on searching binary files

Windows Grep can search binary files, and most of the time it does a pretty good job. However it is not perfect, so the following discussion on how it works may be of interest.

Files to be searched are first examined to see whether they are text (ASCII files with CR/LF at the end of each line) or binary (all other files). A block of the file is read in, and if it is found to contain any non-printing characters such as 0x00 to 0x08 etc. the file is taken to be binary. In addition, if the block does not contain any carriage returns or line feeds it is also assumed to be binary.

When binary files searched, they are read in 256 bytes at a time. Then, non-printing characters are stripped out to leave a string that is less than 256 bytes but contains only proper characters, punctuation, accented characters and so on. It is this string that is then searched, and displayed in the results window. This accounts for the differing lengths of lines shown when matches are found.

This method has one flaw. If the word you are searching for occurs on a 256-byte boundary, Windows Grep will not find it. I hope one day to fix this, but for now be aware of it!