This approach slurps the entire master file into memory, so it should work fine with a 38 MB or even 380 MB file, but will not scale to larger file sizes indefinitely.

The regex for matching references assumes the reference string is always bounded by a non-\w character. If this is not the case, adjust as needed.

The substitution replaces Ref00004-like strings anywhere and everywhere in the file. If you need this replacement done, e.g., only between certain tags, adjust the match regex as needed or perhaps use an XML parser.

Update: No validation is done on the content of the lookup.dat file. It might be wise to consider this.

Update: I think the regex for extracting URLs from the lookup data file will support embedded whitespace in the URL, but I haven't tested this. Caveat Programmor.

Update: The regex for extracting reference placeholders and URLs from records in the lookup file is very naive. For instance, \S+ matches a reference placeholder. Personally, I would feel better with a more specific match, maybe something like qr{ (?<! [[:alpha:]]) Ref \d{5} (?! \d) }xms
Likewise, I'm sure there are canned regexes for matching URLs available.

The master-file is XML and therefore should be manipulated using XML tools, not direct manipulation as though it were a text-file. You can afford to read the entire file into memory using a tool like XML::LibXML and then manipulate the structure internally. Then, write out the modified XML, preferably into a new file so that the original input is not corrupted when if you make a mistake.