Realview Digital

How can we help you today?

Article Extraction Troubleshooting - Selecting the text from an article, based on layout

Modified on: Fri, 28 Mar, 2014 at 3:02 PM

Selecting the text from an article, based on layout.

When picking up the text for an article, we have to be mindful of how we pick up the text, especially in relation to the layout the article itself.

Below are some examples of how to work with the layout of an article and what should be selected, depending on that layout.

Example 1

What we have above is a general sports article. This article has the text starting from the left side of the image and working it's way down to under the image. It also has a quote between the first and second paragraph of text, and no sub-title.

We can see that the sports category has been selected, since at the top of the page it clearly states Sport (highlighted in Blue)

The Title (Orange) is Tough team is ready to roll. The Orange Box around the title highlights the general size and coordinates of the title box. Always make sure to keep the size of the title box a reasonable size.

Since here is not sub-title, we don't need to use the sub-title box.

The contributor which is highlighted in Green, is By JULIAN RAETHEL

Next we have the text of the article, the red boxes represent the selections that you would need to make when creating the article text. Instead of just selecting the whole text for the article, you would need to select the text in columns and paragraphs instead. This way you can keep the text in a much more neat and tidy format. Quotes should be skipped and only added to the end of the text.

Next we have the image caption which we select separately from the image byline (Caption highlighted in yellow and the byline in orange)

Then Finally we have the image which is highlighted in Purple, making sure to select as much of the image as possible and make sure no white spaces are picked up along the sides of the image.

Example 2

The next example we have is a front page news article, with the title and sub title at the top, text on the bottom left to center, and an image off to the right.

The Category, Title, Sub title Contributor are pretty straight forward, again making sure to select them within a reasonable size. You can tell the difference between the title and sub title by the different font weights and sizes.

The text is straight forward as well, as you would select much larger columns, again making sure to keep the extra content titled "Want to go" at the end of the textbox.

The Caption and byline can be selected as one, but would need to be separated by cutting out the byline and pasting it into the correct textbox.

Finally the image. Due to the awkward positioning of the image, what we usually go for is to select as much of the image as possible and use that for the article. Here we have selected the woman's head to use for the article, as it is within a reasonable area to select.

Example 3

Here we have a third example where we have a title, and image and a image caption and byline.

For these kind of articles we want to make sure we select the title first (Red).

Then We select the image (Green) and then finally the image caption and byline (yellow) which can be selected, then the byline cut and pasted into image byline.

Since there is no sub title, text or contributor, we can leave those files alone. And since it's on the front cover and it would fall under news.

Example 4

This fourth example show's another image based article, which is smaller but has some text and a caption.

The layout for this one is as followed.

The title at the top (Dark Green), followed by the text (Red), The Image (Light Green) and finally the Caption (Blue)