Basically, i want to put this into a hashmap so I can search the records. My problem is that sometimes a lexical unit comprises of more than one word example obstetrical delivery rather than just being ID word type.

Depends on how things are characterized. What you could do to solve the problem is to have read the line after the ID up until the carriage return. Then, use a StringTokenizer to separate words by spaces and then use the last word as the type, and combine the rest in order as the word.