Hi,
I'm trying to automate conversion of Bible text files from SwordSearcher (SS) format to USFM ([url]hhttp://paratext.org/about/usfm[/url]; same in PDF: http://paratext.org/system/files/usfmReference2_35.pdf). I don't need to include all USFM markers, of course (that would be very involved, and SS uses almost none of those markups anyway) . I want to do this because I can then output the USFM to .rtf files and have imbedded footnotes and nice text I can easily manipulate in Nisus Writer Pro.

The basic format for SS has "$$", the book name abbreviation with chapter number, then a colon and the verse number. The "¶ " [pilcrow + space] indicates beginning of a paragraph. The data between curly brackets {} is footnote data. SS's format is very straightforward, but more information is available in the help file that comes with Forge (module builder software) for SwordSearcher, which can be downloaded here: http://www.swordsearcher.com/forge/index.html. Unfortunately, it's a Windows-only app.

$$ Ge 1:1
¶ In the beginning GOD created the heaven and the earth.
$$ Ge 1:2
And the earth was without form, and void; and darkness [was] upon the face of the deep. And the Spirit of GOD moved upon the face of the waters.
$$ Ge 1:3
And GOD said, Let there be light: and there was light.
$$ Ge 1:4
And GOD saw the light, that [it was] good: and GOD divided {the light from...: Heb. between the light and between the darkness}the light from the darkness.

The format for USFM is very different. Genesis 1:1-4 would look like like the sample below. The last item in the beginning marker is a space and the last item in a closing marker is an asterisk (\nd …\nd* stand for names of diety; in the SS sample above it is presented merely in all caps, in USFM with these tags)

\id GEN
\c 1
\p
\v 1
In the beginning \nd God\nd* created the heaven and the earth.
\v 2
And the earth was without form, and void; and darkness [was] upon the face of the deep. And the Spirit of \nd God\nd* moved upon the face of the waters.
\v 3
And \nd God\nd* said, Let there be light: and there was light.
\v 4
And \nd God\nd* saw the light, that [it was] good: and \nd God\nd* divided \f + the light from...: Heb. between the light and between the darkness\f*the light from the darkness.

This 2nd USFM sample layout is better because it embeds in each footnote the chapter and verse (e.g., "1:4") to which the footnote refers (I don't know how to make a macro do this). It's marked up with the "fr 1:4 \ft " tags and data.

\id GEN
\c 1
\p
\v 1
In the beginning \nd God\nd* created the heaven and the earth.
\v 2
And the earth was without form, and void; and darkness [was] upon the face of the deep. And the Spirit of \nd God\nd* moved upon the face of the waters.
\v 3
And \nd God\nd* said, Let there be light: and there was light.
\v 4
And \nd God\nd* saw the light, that [it was] good: and \nd God\nd* divided \f + fr 1:4 \ft the light from...: Heb. between the light and between the darkness\f*the light from the darkness.

I am learning how to do macros, and I've successfully done a very basic one that can renumber the verses in one chapter. My problem is that I need a macro that will be able to do an entire book of the Bible at a time and add the chapter numbers in there (\c and the #) before a new verse #1 starts in the subsequent chapter. I don't know how to make that kind of macro. Here is my Regex (PowerFind Pro) macro that changes references without getting paragraphs right, and that marks up footnotes, but without putting in the chapter and verse reference in the footnote as I would like [see 2nd USFM sample above]).

Problems I'm having:
1. Getting the chapter number inserted properly for books with more than one chapter. I've attached a sample SS file for Romans 1:1-12:2 (unfortunately, this file was made before I started putting in pilcrows for beginnings of paragraphs, so if someone uses this, it would be good to insert some pilcrows randomly at the beginning of various verses to see if the \p marker is being converted correctly).
2. Getting the paragraph (prose) marker (\p ) on the line before the verse number in USFM (it follows a verse # in SS).
3. I would really like to get the reference of the verse into the footnote (as in 2nd USFM example above), so when one looks at a note at the bottom of the page (in NWP, after I export to .rtf), one can see immediately that footnote "a" is a comment on 1:4. To do this, "\fr 1:1 \ft " must be added to the footnote text. I don't know how to do that dynamically so the chapter and verse numbers are right.
4. It's not essential, but it'd be nice if the macro could convert the SSbook names (e.g., "Ge") to the proper USFM book names (e.g., "id\ GEN") at the top of the text. The abbreviations are in a .csv file attached.

I realize this is quite a project (at least to me), and I'll be grateful for any help. Thanks in advance!

Hello NisusUser,
this is certainly doable. Let me make however one general comment. While regex is great, when things get this complicated you will definitely be better off if you take an approach which first reads in the info--in your case, the chapter/verse numbers, etc., and even the text--and then prints it out again (in a new file) in the desired format. If nothing else this will make it much easier for the person maintaining the code (i.e., you) to follow what you are doing. Note that this will also free you to call on the info you need, e.g., the footnote reference in your case.

So the basic structure is going to be:

Use a find all statement to read in the info. (This can also check the format for correctness).

1. About the pillcrow. Is this always placed at the beginning of a verse?
2. Checking your sample file I noticed that one footnote (in Ro 10:4) does not have a closing bracket. Is that an error? Can one assume that these footnotes always (should always) come in matching pairs?

well PowerFind is regex, but what I meant was you need to write a "real" macro for this. I am appending a very bare bones version here. This doesn't address the footnotes, or the change of abbreviation names yet (or the pilcrows). But they can be added easily following the same format.

NisusUser wrote:1. Pilcrows will not always be at the beginning of verses.

Well the reason I ask, is that in your example you place the pilcrow before the verse marker. So is that the normal rule in such cases? What happens to the ones that are not at the beginning of a verse?

This is regular PowerfindPro statement. The important point here is that it doesn't replace anything. It's just a Find All statement. But it is carefully made to match the format of the data. The reason I do this is so we can be sure that the data that we are going to read in is of the correct format.

This is practically the same thing as the earlier find. There are two important differences: (1) this find is done not on the text of the document, but instead it done on the text object consisting of a single verse which is why it uses $verse.find instead of Find All, and (2) the options do not contain "a", so this is not a find all statement. It only does a single find. This is important because of the other option "$". That is the magic option, that allows us to capture parts of the data. To do that we need to match the information we want, so we change the above to:

This time I have added (…) around the data we want: the book abbreviation, the chapter, the verse, and the verse text. These captured bits can now be used in the following code using the 'names' $1, $2, $3, and $4. But to make the code clearer I use named captures. So instead of using (…), I add a name for each capture, e.g., (?<abbr>…) for the book abbreviation. Now I can refer to that using $abbr. So the whole find statement looks like this:

The rest of the code should hopefully be more or less self-explanatory. Basically the captured pieces are reassembled in the desired format, and then compiled into a big text object, with which we can make a new file.

Well the reason I ask, is that in your example you place the pilcrow before the verse marker. So is that the normal rule in such cases? What happens to the ones that are not at the beginning of a verse?

About pilcrows. The can be anywhere in the SS verses. In USFM the \p can be mid-verse. Is that what you meant? For USFM, the \p really just says that prose begins here. Techinically it has another marker for, say, poetry (\q), etc. For our purposes here, we'll just use \p.

In my examples (SS), the pilcrows are at the beginning of the verse text, but after the verse marker (e.g., $$ Ge 1:4).

Ok, so now here is an extended version of the previous macro. This adds a few things:

It adds the change in abbreviations. In the macro this is done with a hash. This currently only covers the case of Ro -> ROM, but can easily be expanded. Ideally this expansion would also be done with a macro. Ask if you have any questions.

It adds the footnote conversion, including adding the chapter:verse as you requested. Once you see how this is done, you should easily be able to add other changes, such as the deity names.

I have tried to comment everything, so you should be able to adjust things as necessary. If it's unclear, let me know.

\id Ge
\c 1
\v 1
In the beginning God created the heaven and the earth.
\id Ge
\c 1
\v 2
And the earth was without form, and void; and darkness [was] upon the face of the deep. And the Spirit of God moved upon the face of the waters.
\id Ge
\c 1
\v 3
And God said, Let there be light: and there was light.
\id Ge
\c 1
\v 4
And God saw the light, that [it was] good: and God divided {the light from...: Heb. between the light and between the darkness}the light from the darkness.

Based on the intent of changing only references, it actually should be this:

\id Ge
\c 1
\v 1 In the beginning God created the heaven and the earth.
\v 2 And the earth was without form, and void; and darkness [was] upon the face of the deep. And the Spirit of God moved upon the face of the waters.
\v 3 And God said, Let there be light: and there was light.
\v 4 And God saw the light, that [it was] good: and God divided {the light from...: Heb. between the light and between the darkness}the light from the darkness.

In other words, I wasn't clear in my initial explanations:
1. the \id marker only appears once per book, i.e. once per file.
2. the \c marker only occurs when a new chapter is starting.

Oh, I see you've put up a newer version now. Haven't tested it yet. I was still working through how you made the 1st one

\id Ge
\c 1
\v 1 In the beginning God created the heaven and the earth.
\v 2 And the earth was without form, and void; and darkness [was] upon the face of the deep. And the Spirit of God moved upon the face of the waters.

I was so caught up in my "didactic moment" I totally overlooked this "detail". Sorry!
But this is an easy problem to fix. Here is version 3 which should fix this and also do pilcrows and work for other books. (Knock on wood.)

\id GEN
\c 1
\p
\v 1 In the beginning \nd God\nd* created the heaven and the earth.

In other words it seems when you have a ¶ at the beginning of a verse you seem to want it before the verse marker.
One way to handle this would be to fix this after the fact with Find and Replace. Handling it in the macro might be possible. But one thing I still don't understand is what happens when the ¶ is in the middle of a verse. Do you break the verse into two lines? Does the \p still go on its own line, or does it sit in the middle of the verse line? One would have to know this to fix the problem.

\id GEN
\c 1
\p
\v 1 In the beginning \nd God\nd* created the heaven and the earth.

In other words it seems when you have a ¶ at the beginning of a verse you seem to want it before the verse marker.
One way to handle this would be to fix this after the fact with Find and Replace. Handling it in the macro might be possible. But one thing I still don't understand is what happens when the ¶ is in the middle of a verse. Do you break the verse into two lines? Does the \p still go on its own line, or does it sit in the middle of the verse line? One would have to know this to fix the problem.

That is correct: the \c marker must come before the \p one. Otherwise the formatting assigned to \c gets applied to the prose.

The marker \p can be mid-verse – not necessarily at the beginning of the verse or at the beginning of the line. If it's mid-verse, it does not start on a new line. Then there is no new \v marker until the next verse.

Example (assuming the first sentence's 2nd phrase was supposed to start with a new paragraph):

\id GEN
\c 1
\p
\v 1 In the beginning God created the heaven and the earth.
\v 2 And the earth was without form, and void; \p and darkness [was] upon the face of the deep. And the Spirit of God moved upon the face of the waters.
\v 3 And God said, Let there be light: and there was light.

Re: \nd (names of diety). This is going to take some tweaking. That formatting should only be applied (best I can tell) if it is one isolated word in all caps. That's not foolproof, though, so I'll have to think this through. That's because in SS there is other text in all caps also – citations from the OT presented in the NT.