This data is in the form of a single 3000-page PDF, or 3 broken-out 1000-page PDFs. They are scanned and OCRed copies of the original 3-volume book set that the Government Printing Office produces each year. For now, we’re mirroring the original PDFs here, in case the House site goes down.

The very first step, though, is to split these large PDFs into sections, as denotated by the Table of Contents. We’ll update this post with links to each PDF as we make them available. This will include individual committees, but not (at first) individual members of Congress. That will take a little more time, as the Table of Contents doesn’t give page numbering for individual members, and so it will require extra time and manual labor on our part.