Ted

The anagram question is one of the most basic and popular interview coding questions around for entry level cs/data science positions. Below is a quick two-line implementation of it for both Python and R.

I couldn’t find a definitive guide that listed all the steps for installing Redhat on a Oracle’s VirtualBox so the below is a summary of the steps I took to get things up and running. I also installed Ubuntu on my virtualbox but everything about that installation was easier, from finding the single iso file on their website to simply using the default virtual configurations.

I don’t know much about the different flavors but I chose Red Hat Enterprise Linux AS (v. 3 for x86)

There are 4 binary disks that each need to be downloaded. Some flavors have fewer disks but make sure you download all of the disks.

Open up VirtualBox

Click New

Name your machine and choose Redhat

Allocate memory size (might be better to give it more than the 512MB default)

Create a virtual hard drive which will be moved later on . Keep the default type as VDI

I chose dynamically allocated because it was taking a long time to create but fixed size will yield faster performance during actual use

I gave myself more disk space 16GB

I launched my machine at this point, which prompted for a start-up disk. I gave the location of the iso file and machine booted.

But I ran into a problem of the machine not finding a hard drive. If this is ignored in the setup, a further error “No Devices Found” appeared, which shut down my machine.

To solve this error, go back to the VirtualBox manager and click on settings – > storage. Notice the .vdi hard disk in the ‘Controller: SATA” Storage tree panel

Delete the .vdi file

Move up to the controller: IDE file and click on the add hard disk icon.

Create a new VDI disk

I don’t think this matters but you can also add one of the start-up disks (and only one) but clicking the add disk icon while still here in the storage section of settings.

Add disk 1 from the 4 iso images

From here I just followed the default settings until I was prompted for disk 2

Whats important to know is that once you are in the VirtualBox you cannot move your mouse outside of it. At the bottom right hand corner of the virtualbox there will be in writing the key to press to regain control of the mouse for your host computer. I believe it defaults to the right control key.

Press the right control key and go the Devices menu -> CD/DVD Devices -> and choose the next iso disk that is needed.

Repeat this for all the other disks

You will have to skip over the registration parts and then it should be ready to use.

While attempting to discover latent topics for an assignment at work, I ran into the field of information extraction. A simple data model for information extraction is a RDF (Resource Description Framework). The RDF relates entities by the subject-predicate-object format where the subject and object are related to one another by the predicate. The triple is a minimal representation for information.

Here are some examples of some simple relations in subject-predicate-object format:

Houston – is located in – Texas

Ted – is the son of – Steve

Elvis – is buried in – Graceland

This triple format can be used to pull information from any sentence. To aid with this extraction I found a paper that explained in great detail the algorithm for extracting the triplet. To begin the process of triplet extraction it was necessary to download the Stanford Parser and then utilize python’s great NLTK package to parse the sentence in an NLTK readable format. Once the sentence was parsed, the algorithms from the paper were implemented.