Navigation

How to search for a pattern in a file

Question

I need to locate a pattern in a file (both text and binary) - just like
the Pos function does with the strings. Preferably, it should
deal with TFileStream. Straightforward solution first seemed
kind of expensive - that is to just plainly go through the stream
comparing patterns on every step.

Answer 1:

You can do it that way but it is much faster to load chunks of data into a
sizeable buffer and do the search in the buffer. Here is an example:

functionScanFile(constfilename:String;constforString:String;caseSensitive:Boolean):LongInt;{ returns position of string in file or -1, if not found }constBufferSize=$8001;{ 32K + 1 bytes }varpBuf,pEnd,pScan,pPos:Pchar;filesize:LongInt;bytesRemaining:LongInt;bytesToRead:Word;F:File;SearchFor:Pchar;oldMode:Word;beginResult:=-1;{ assume failure }if(Length(forString)=0)or(Length(filename)=0)thenExit;SearchFor:=Nil;pBuf:=Nil;{ open file as binary, 1 byte recordsize }AssignFile(F,filename);oldMode:=FileMode;FileMode:=0;{ read-only access }Reset(F,1);FileMode:=oldMode;try{ allocate memory for buffer and pchar search string }SearchFor:=StrAlloc(Length(forString)+1);StrPCopy(SearchFor,forString);ifnotcaseSensitivethen{ convert to upper case }AnsiUpper(SearchFor);GetMem(pBuf,BufferSize);filesize:=System.Filesize(F);bytesRemaining:=filesize;pPos:=Nil;whilebytesRemaining>0dobegin{ calc how many bytes to read this round }ifbytesRemaining>=BufferSizethenbytesToRead:=Pred(BufferSize)elsebytesToRead:=bytesRemaining;{ read a buffer full and zero-terminate the buffer }BlockRead(F,pBuf^,bytesToRead,bytesToRead);pEnd:=@pBuf[bytesToRead];pEnd^:=#0;{ scan the buffer. Problem: buffer may contain #0 chars! So we treat it as a concatenation of zero-terminated strings. }pScan:=pBuf;whilepScan<pEnddobeginifnotcaseSensitivethen{ convert to upper case }AnsiUpper(pScan);pPos:=StrPos(pScan,SearchFor);{ search for substring }ifpPos<>Nilthenbegin{ Found it! }Result:=FileSize-bytesRemaining+LongInt(pPos)-LongInt(pBuf);Break;end;pScan:=StrEnd(pScan);Inc(pScan);end;ifpPos<>NilthenBreak;bytesRemaining:=bytesRemaining-bytesToRead;ifbytesRemaining>0thenbegin{ no luck in this buffers load. We need to handle the case of the search string spanning two chunks of file now. We simply go back a bit in the file and read from there, thus inspecting some characters twice }Seek(F,FilePos(F)-Length(forString));bytesRemaining:=bytesRemaining+Length(forString);end;end;finallyCloseFile(F);ifSearchFor<>NilthenStrDispose(SearchFor);ifpBuf<>NilthenFreeMem(pBuf,BufferSize);end;end;