Blogger importer inefficient handling of data

Description

If the import dataset is large, the Blogger importer can store huge amounts of data in the blogger_importer option. It then updates this data over and over throughout the import. If MySQL logging (binary or query) is enabled, this can result in a large amount of data being written to disk, potentially filling up the partition rather quickly. On WordPress.com, I have seen an import write 100MB of binary logs every 2 min. Andy's suggestion is that we split up the data from the import rather than store it in one option. This would allow us to manipulate it more granularly and prevent the huge updates from happening.

I do not believe this to be an issue anymore, as the blogger importer is only storing current status information (basically, list of blogs, position of import, etc) in the blogger_importer option as of 0.5 (probably as of 0.4 too).

The data imported from the blog itself is not stored in the option, although the URL of each post imported *is* stored there temporarily. It would be possible to change this to be stored as post_meta instead (probably already is, actually), and to be referenced there in order to avoid duplicates, at the cost of extra SQL queries.

Can a large import be done to get an idea of what the current damage level on the importer is?

Yes, the importer is storing keys (partial URL) for each post and comment.
The key is something like this '/feeds/417730729915399755/posts/default/8397846992898424746'

As Otto mentions, it should be possible to change this from being stored in an array and add meta data.

The posts already add a meta data entry "blogger_permalink", this would also need to be added to the comments to support nesting.

Given that the load of the comments is sequential by post rather than random then having them look up the post ID via the DB should not add a significant overhead and the performance advantage of not storing the comments and posts arrays in the option may compensate for this.

The CommentEntry class in comment-entry would need to be updated to include the meta data.

The import_comments and import_posts functions in blogger-importer.php would need to be updated to remove the use of the arrays.

The set authors form also needed changing as it was referencing the post array