Yesterday somebody asked me the question "We'd like to get all this text into Excel, is it possible?".Since then I've been having a look at a number of options, XTags, XData and TextTractor.

I've tried exporting the text as XTags and can see all the different styles which is great. I was then thinking of using the styles as search criteria and placing the resultant text into the relevant cells/ Columns in Excel.

I've also had a quick look at XData, though I've not found too much on 'exporting' the relevant data so I can import it into Excel.

TexTractor was another contender and I've had a look at.

I've had a read through several articles here to do with XTags and XData and I'm not sure which way would be best to go?I was considering exporting the text as XTags and then using GREP to pull out the relevant styled text and populate an Excel spreadsheet.

Please can anyone suggest a suitable workflow as I can see a few alternatives but I'm not sure which to choose?

Aagh, don't talk to me about Excel. I'm cutting and pasting which is dismal.

XData doesn't have any export facilities, it's purely for importing. XCatalog would let you export, but if it's a one-off job it's probably as quick to cut and paste as to put in XCat links on every piece of text.

XTags is the one piece of Emsoftware I haven't really grappled with but it sounds like it could be a good idea.

If you know Filemaker at all, I often find it's easier to get text there, in case it needs manipulating, and then export it as tab separated text and put that into Excel at the last minute.

Not XData, though, it won't help you here.

There's a lot of this going on, isn't there? I've got a job that's been built in InDesign, I've got to extract all the data to Excel, they update and send it back, and then I'll use XData (most complex one I've ever done) to put it all together in Quark. Seems a lot of trouble to go to....

Thank you for your speedy reply and for the clarification, it's been a big help in making the decision.I'll stay away from XData. The reason I'd looked at that is because I'd read one of your other posts saying you'd gone from XData to FM.I wondered if I'd be able to to tie it up with Excel instead of FM.

Looks like I'll probably go down the XTags route, I'm just playing with exporting the text without XTags.

A thought just occured regarding XCatalog. If I used AS to look for certain styles within the text could I add the XCat links that way?

Regarding Excel, I've done a few projects now taking text from Excel to populate Quark Docs and have found it pretty straight forward, it wasn't as bad as I thought it might be.

Actually,if your only intention is to populate an Excel sheet with value depending on their applied Style Sheet that you will no need XTags either. XPress Tages could do as good a job as Xtags would.

Other than that, I would use grep ( or any other means) to strip any "tag" character and transform my text output into a tab delimited file. Where things gets complicated is if your request include the need to keep bold, italic, font sizes, color, etc. for each individual "field".

Thanks for the suggestions regarding GREP, I was just in the process of scripting Text Wrangler to do a GREP search :-)At present I'm trying to get part of the search result into a variable to populate Excel. So in the search below the ([A-Z]*) part would be put in the variable. I'll have a look hrough the sample scripts to see if there's any clues.

(@AG Medium Drop Cap:)([A-Z]*)

So far this is looking like the best way to process the information, I've played around with GREP quite a lot in BBEdit so hopefully I'll get something sorted.

I believe you best bet is to convert your text output into a tab delimited text file an THEN open that in Excel.

I managed to do this for many HTML source files and Quark XPressTag file and found it was the best way to approach this.I tend to look at this form the angle on REMOVING what I don't want to see instead of extracting what I do want. This is made much more easy with the uses of tags where you can easily replace any text that starts with "<" and ends with ">" with nothing as well as anything that starts with "@" and ends with ":".

So my normal procedure is as follows:

1- Replace all tabs by single spaces

2- Replace all tabs tags (<\t>)by single spaces

3- Replace all double spaces by single spaces

4- Replace all tags that delimit my fields by tabs

5- Replace all tags by nothing

I would normally replace all paragraph returns with something like "[R]" if I know that individual fileds can be made of multiple paragraphs and if there is a way of determining where each "record" starts. The thing with XXPressTags is that Quark does not tag consecutive paragraphs with its own Style Tag, so you somethimes need to be creative in that regard.