Each row is variable length. The number of columns varies with the level of indentation. The good news is that the level of indentation is consistent. Very consistent. Year, Month, Topic, Details in this case.

[When an outline is super consistent, one wonders why a spreadsheet wasn't used.]

Each outline node in the export is prefaced with "- ".

It looks pretty when printed. But it doesn't parse cleanly, since the data moves around.

Further, it turns out that "notes" (blocks of text attached to an outline node, but not part of the outline hierarchy) show up in the last column along with the data items that properly belong in the last column.

Sigh.

The good news is that notes seem to appear on a line by themselves, where the data elements seem to be properly attached to outline nodes. It's still possible to have a "blank" outline node with data in the columns, but that's unlikely.

We have to do some cleanup

Answer 1A: Cleanup In Column 1

We want to transform indented data into proper first-normal form schema with a consistent number of fixed columns. Step 1 is to know the deepest indent. Step 2 is to then fill each row with enough empty columns to normalize the rows.

Each specific outline has a kind of schema that defines the layout of the export file. One of the tab-delimimted columns will be the "outline" column: it will have tabs and leading "-" to show the outline hierarchy. The other columns will be non-outline columns. There may be a notes column and there will be the interesting data columns which are non-notes and non-outline.

In our tab-delimited export, the outline ("Topic") is first. Followed by two data columns. The minimal row size, then will be three columns. As the topics are indented more and more, then the number of columns will appear to grow. To normalize, then, we need to pad, pushing the last two columns of data to the right.

That leads to a multi-part cleanup pipeline. First, figure out how many columns there are.

rows= list( rdr )
width_max= max( len(r) for r in rows )+1

This allows us the following two generator functions to fill each row and strip "-".

Advertisers

About Me

Steven F. Lott is a consultant, teacher, author and software developer with over 35 years of experience building software of every kind, from specialized control systems for military hardware to large data warehouses to web service API's.