Let's suppose we have a large text corpus of Greek text (or any
text), and we want to do a grammatical analysis of a part of it. This
corpus is homogeneous: I mean it is written by only one author in a given
period of his life, without radical departures from the main narrative,
either in style or in the subject. Now the question: what is the minimum
percentage of such corpus we must analyse in order that we may confidently
extrapolate the results of our analysis to the whole corpus?. I bet
staticians have an (approximate) answer for that. Bibliography? I also
understand that it may be probably methodologically preferable to analyse
several portions of the same size from the text, instead of parsing only
one longer chunk of continuous text. Any help welcome.