Use of R core scripting to eliminate ‘NA’ and other common issue

Tuesday, Jun 10, 2014, 6:00 PM

GotoMeeting Webinar onlineGotoMeeting Webinar online Toronto, ON

9 Researching Traders Went

Meetup Webinar Tues Jun 10 at 6 PM EST: Use of R core scripting to eliminate ‘NA’ (“and other common recycled value problems”?)Body of presentation: I Use of rm() inside of source codeThis following portion is still under construction as I haven’t gotten as much feedback as would be helpful from core R team yet…II Manually coding a ‘divisor proc…

Use of R core scripting to eliminate ‘NA’ and other common issue

Argument is that a<-a[-is.na(a)] would then suffice to clean this up, but what are the costs if, say, a is a resultant vector from a sorting algorithm which recursively shortens the vector?

The reality is that removing individual elements by referring to their index can be difficult on data integrity after the remaining indices are then restructured. Perhaps this is dependent on the cluster or R environment you are loading from. The reality is that NA’s are a commonly recurring problem in R.

Since there are many precompiled functions in R, it seems logical to make use of them. What isn’t so obvious is the usage of them for non-vector arguments. For example, typically rm() is a function which can be used to clean up a directory prior to inputting or after outputting a file from a program. However, rm() can also be used for the same purposes as a<-a[-(i)], and therefore bypassing the need to subsequently call a<-a[-is.na()] afterwards, and the risk for loss of data integrity.

More along the lines of data integrity is the loss of precision in arithmetic operations as you get close to your assigned machine precision. What then happens is dependent on, again, your own system and which version of R you are utilizing. Apparently 3.0.0 seems to be set up now with the idea of allowing data to just drop digits as precision is maxed out. To quote the current developers blog:

The following function is due for release:

digitloss=c(“allow”, “warn”, “forbid” )

C developers can deal with this by implementing their own arithmetic procedures, keeping in mind the underlying algorithm of each. e.g. Division can be viewed as the inverse operation of multiplication, which in turn can be viewed as a “convolution” of two floating point integers.

So what does this mean.. ? Maybe for the purposes of speeding up your system and avoiding the abovementioned data loss, converting your division problem to a multiplication by the inverse of your divisor, and then in order to convert your base 10 number to decimal formatting- either calling strtoll() or incorporating your own division algorithm.

At this point you would be ready to perform the “convolution” portion of your multiplication formula. Warning: convolve() in R (as in C’s numerical recipes) incorporates the Fourier transform, adding a full N*logN to your computational complexity. So it may be best to code up your own if you think time is of importance.

Examples of code demonstrating the above topics can be available upon request. Thanks for your attendance.

The reasons these are posted for either extending Matlab’s capaibilities into 3rd party solutions like trading platform or for pure market forecasting:

Coder

Compiler

Curve Fit

Dot Net Builder

Econometrics

Finance

Identification

Math (core Matlab)

Optimization

PDE

Stats

Do note there are others I would I recommend like Market Data Capture, Database, etc. This is just the ones I have focused on but these manual PDFs for these toolboxes is where you can find some potentially explosive unknown gold nuggets.