I'd like to convert this data into csv format so that I can import it into SPSS (or Excel if need be) - but I have no idea how to do that. In total, I have around 70,000-ish records, hence the need for an automated process. In the end, the SPSS data fields I would like to end up with are:

Author - defined as everything preceding the first colon on the first line of each record.
Content - defined as everything following the first colon on the first line of each record.
Date - defined as the XX/XX/XXXX content on the second line of each record.
Retweets - defined as the number following date but before the word "retweet" (NB: when 0, this number is absent from the record).
Links - defined as any and all urls appearing in the "Content" field
Link# - the number of urls appearing in the "Content" field

Can it be done? Can anybody help point me in the right direction? Any guidance would be much appreciated! If I can offer anything in exchange for your help, please let me know. Thanks!

Have you tried simply opening adding comma separators via script and opening the file in Excel? It might be a lot easier than you think since the format is so consistent.

With my skillset, the only way it could be easy for me is if each field was a fixed width so that I could sub in delimiters at specific intervals. But since both author and content are variable-width fields and since the retweets field occurs only occasionally, I would have to use some sort of programming logic to tell it where to look for the appropriate places to insert delimiters. And that is well beyond my capabilities, I'm afraid. I tried reading "Sed - An Introductory Tutorial" last night and was immediately overwhelmed by the urge to punch babies.

This can be done in Perl rather quickly and efficiently. Can you post up an example of the result you want? (like actual data so if anyone helps out here, they know it's actually correct). I need to finish up some things for school but I can probably mock up a quick script today or tomorrow if you're interested!