note
thezip
<p>I've done some rudimentary parsing of PDF's using [mod://CAM::PDF]'s getPageText() method, but I was only able to deal with PDF v1.4 formatted files though (v1.5 and v1.6 I couldn't parse).</p>
<p>I have not done anything similar in Word, but there must be something around that performs a similar extraction function.</p>
<p>Once you've extracted each file, then you'd need to write the comparator function.</p>
<!-- Node text goes above. Div tags should contain sig only -->
<div class="pmsig"><div class="pmsig-212789">
<br>
<i>What can be asserted without proof can be dismissed without proof. - Christopher Hitchens, 1949-2011</i>
</div></div>
1028985
1028985