If I were on a Windows box that had Microsoft Word installed, I would use Win32::OLE to control Word to retrieve that information. However, that combo isn't available on Linux.

One alternative would be to use OpenOffice or LibreOffice. Although I personally have never done so, I believe that they offer some kind of API that you could leverage from Perl similar to someone using OLE to control Microsoft Office software in Windows.

It is worth mentioning that in Word, pages do not exist in the document file. Like a
professional typesetter, Word makes up its pages on the fly when it displays or
prints a document. Word uses measurements from the installed fonts and the
installed printer driver to do this. It is almost impossible to get two machines so
exactly similar that a document will paginate with exactly the same page breaks on
each. Sometimes people complain that when they open the document on a different
machine, some of the page numbers in the TOC or Index are wrong. Theyre not:
when the document is opened on the other machine, minute variations in set-up that
do not show over a ten page memo will cause variations in the position of page
breaks in a 1,000-page manual. If you remember to update the TOC and Index before
you print, the problem corrects itself.

So the page numbers do not exist in the document, therefore you cannot retrieve them to split the text into pages. Only Word can do that for you.