Question

myahia72 on Tue, 27 Feb 2018 20:21:11

The following code splits each lines into words and store the first words in each line into array list and the second words into another array list and so on. Then it selects the most frequent word from each list as correct word.

So this is my code. How can I make it works with lines of different number of words. I mean that the length of each lines here is 7 words and the
for loopworks
with this length (length-1). Suppose that line 3 contains 5 words.

Sponsored

Replies

Acamar on Tue, 27 Feb 2018 20:53:01

So I want to split all text line words into arrays and then apply voting method on these words.

You haven't indicted what the problem is. How far into the process have you got and what is the difficulty you have run into. Post the code you have so far.

Simple Samples on Tue, 27 Feb 2018 23:00:02

Yeah, what is the question?

Well maybe you are asking how to "Split text lines into words" but then the "select the correct ones" part is very vague. Note that you should out the entire question in the body of the post, don't expect to try to ask the entire question
in the subject (title). There is absolutely no question in the body.

As for splitting lines you need to decide how complex you want to make it. For example if you get "it.For" then is that the end of a sentence and the beginning of another and the space has been mistakenly omitted? What if you get ".Net"
then is that another mistake? Some people (marketing types of people especially) exist to violate rules and like to do things however they want to so you might have a period in the middle of a name. How complicated do you need (want) to be? You need to decide
that first.

Mr. Monkeyboy on Wed, 28 Feb 2018 01:56:23

What's the question? How to split lines of text? How to get a percentage of possible correct words for an index of the 5 string arrays? Provide you all the code for a graduate project idea you came up with? What do you want?

La vida loca

Cherry Bu on Wed, 28 Feb 2018 05:20:17

Hi myahia72,

If you want to split text lines into words, you can use string.split method to do this:

Dim str As String="Canda has more than ones official language"
Dim words() As String = str.Split(" ")

You said that you use array1 contain the first word from each line and select the correct one, can you provide your existing code here, it is helpful to us to know what you want to do.

Best regards,

Cherry

myahia72 on Wed, 28 Feb 2018 18:01:30

I have posted the code and also I have updated the question

Reed Kimble on Wed, 28 Feb 2018 20:42:15

Instead of accessing wordsOfLineX(i) directly, create a lambda or helper method to safely get a string from the array, returning an empty string if the index is invalid. For example:

Just keep in mind that now an empty string could be a predominant result depending on how many short strings there are.

Acamar on Wed, 28 Feb 2018 20:46:27

I have posted the code and also I have updated the question

You have also made all the previous responses look like nonsense. If you are provided with code for the project it should be posted as an additional post, not by rewriting your question.

The code works for lines of any length because the For loop runs from 0 to wordsOfLine?.Length - 1, not from 0 to 6. You should work through that code line by line to ensure that you understand exactly what each statement does, because it is
likely you will need to make changes to do what you describe.

myahia72 on Wed, 28 Feb 2018 20:53:27

Thanks very much

Reed Kimble on Wed, 28 Feb 2018 20:55:16

The code works for lines of any length because the For loop runs from 0 to wordsOfLine?.Length - 1, not from 0 to 6. You should work through that code line by line to ensure that you understand exactly what each statement does, because it is
likely you will need to make changes to do what you describe.

But there's only one loop over the first line, so it becomes the maximum length line. The following lines are all accessed by that same iteration variable so if they parsed shorter, there would be an index out of range exception when building the List(Of
String).

-EDIT-

Though I agree that the post appeared to begin with a question about how to organize the words and now is more about dealing with one of the problems (varying length strings) that one might encounter with this kind of thing.

I mentioned it in a reply to Acamar above but it is worth reiterating - the first line is deciding the maximum number of words to test. It might be better to get the longest string and use that length:

But there's only one loop over the first line, so it becomes the maximum length line. The following lines are all accessed by that same iteration variable so if they parsed shorter, there would be an index out of range exception when building the List(Of
String).

Then don't do it like that. If the lines do not have an equal number of words then OP has much bigger problems than the range exception - the whole voting concept becomes itrrelevant. If lines with unequal number of words are allowed
then there are several options. OP could ignore lines that don't match in number of words, or do some sort of similarity ranking to work out which column each word goes into (that is, where to insert a blank dummy word). Whatever
the choice, just extending the lines so they match is going to corrupt the voting.

Reed Kimble on Wed, 28 Feb 2018 21:18:36

If the lines do not have an equal number of words then OP has much bigger problems than the range exception - the whole voting concept becomes irrelevant.

I completely agree that short lines may skew the results, but we don't really know what the expected results are supposed to be or what the input will actually look like.

Simple Samples on Wed, 28 Feb 2018 21:35:27

The following is a possibility. It will adapt to the number of words in each line. This does not do everything but it does most of it and the rest should be easy.

Class classWordsOfLine
Public line As String
Public Words() As String
Public Sub New(line As String)
Me.line = line
Words = line.Split(" ")
End Sub
End Class
Module Module1
Sub Main()
Dim correctLine As String = ""
Dim WordsOfLine(4) As classWordsOfLine
Dim maxwords As Integer = 0
'
WordsOfLine(0) = New classWordsOfLine("Canda has more than ones official language")
maxwords = Math.Max(maxwords, WordsOfLine(0).Words.Length)
WordsOfLine(1) = New classWordsOfLine("Canada has more than one oficial languages")
maxwords = Math.Max(maxwords, WordsOfLine(1).Words.Length)
WordsOfLine(2) = New classWordsOfLine("Canada has nore than one official lnguage")
maxwords = Math.Max(maxwords, WordsOfLine(2).Words.Length)
WordsOfLine(3) = New classWordsOfLine("Canada has nore than one offical language")
maxwords = Math.Max(maxwords, WordsOfLine(3).Words.Length)
WordsOfLine(4) = New classWordsOfLine("Canada has nore than one language")
maxwords = Math.Max(maxwords, WordsOfLine(4).Words.Length)
'
For fromx As Integer = 0 To maxwords - 1
Dim words(4) As String
Dim tox As Integer = 0
For linex As Integer = 0 To WordsOfLine.Length - 1
' if the number of words are less than the current index then don't try it
If WordsOfLine(linex).Words.Length - 1 >= fromx Then
words(tox) = WordsOfLine(linex).Words(fromx)
tox = tox + 1
End If
Next
ReDim Preserve words(tox - 1)
' words now has the words and just the right number of them
Console.WriteLine(String.Join(" | ", words))
Next
End Sub
End Module

myahia72 on Thu, 01 Mar 2018 11:35:52

I think the suggested solution about ignoring the lines with missing some words may be a good suggestions since I have about 70 lines resulted from one run and I have 5 runs. So there will be five 70 lines. The possibilities of having lines with missing
words is low and ignoring these lines will not affect the results.

myahia72 on Thu, 01 Mar 2018 12:04:30

Actually the program here will not ignore the lines with missing words. Instead it will add a word from the next line to the words array as following

Yes the problem of what to do when there is a mismatch in the number of words is a design problem. The solution needs to be defined in the requirements.

This is obviously a theoretical exercise intended to show a specific methodology not disclosed here. I agree that if the requirements were clarified then the implementation can be improved correspondingly.

A more realistic implementation would likely include some kind of spell check. A dictionary would help for recognition of words in it. A sophisticated solution could use a Natural Language form of recognition of words that could help match words to columns
when there are fewer words. This application could be much more complex so I certainly understand there are fundamental imperfections.