Now one might say "you can just make wrappers of match_between where the value parameter is set appropriately." That is true, but it requires you to duplicate code manually, potentially introduce errors, and is harder to maintain throughout developmental changes. Functional programming, as I've experienced it, seems to give better management both over the code and semantics (if you design them well), but also over the state of variables, which to me seems to be the coolest benefit of closures: controlling a variable within a function environment.

One other thing I like is that with the closure definition, the formal arguments for regex_matches and regex_values are the same, which is something I think should be expected as they do the same thing. What changes is the return object. The complexity is increased by having more functions to remember, but that is also balanced by being able to better management the parameter space and the semantics of those functions to be remembered.

ggplot2orBust

Probably A Mammal

Well I used the stuff I learned in Hadley's functional programming to take two lists, one of measures (sum, mean, load factor, ...) and one of time aggregations (year, quarter, week, ...) and generate all permutations of them into Level_Measure functions. Thus, a person can read.meter(...meter number...) to read in a meter number. Then they can Quarter_LoadFactor and generate a table (data.table) of that measure by that level of aggregation. This can be directly used in ggplot quite easily. I'm thinking of building classes for certain plots so maybe I can do something like "Trend + Week_Mean(x)" and "Profile + Work_Schedule(x)" to do the appropriate ggplot on the appropriate table transformation. That may be a much later thing, however. I have no idea how much work that will take until I start digging more into the ggplot code to see how Hadley did what he did. It would be a ton easier for an end-user than having to Trend(Week_LoadFactor(x)) or something.

Now that I think about it, this may be easier than I thought. "Trend" here would simply be a ggplot object containing all the stuff I need it to contain, and what I'm adding is the data. What I'm wrapping just needs to validate that the data is of the right form, hence maybe my own custom class.

Anyway, read Hadley's stuff and work with it. I'm using this everywhere these days, putting together things much faster than I used to and able to appear a lot cleaner (semantically).

It's not exactly as I would like, but it would be a bit of work to make my own + supersede all this, but then I'd create my own class returned by these aggregate measure functions so it would be like "aggregate measure plus trend [plot]" where Trend is to be a ggplot object predefined. Then the end-user could Day_Max(x) + Trend or Year_Mean(x) + Trend. It would be easy to use and semantically clear.

Here my data model is a 3 column data.table object with (meter, timestamp, kW) tuple, with meter an identifier, timestamp a 15 minute reading, and kW the value read at that timestamp.

Truly was easy to do. Requires a bit of work to make "good" in a development sense (like validation of inputs, among other things), but as a proof of concept, I got it to work as expected.

Now I just need to figure how to build a package off stuff like this. For instance, I can package.skeleton this stuff to generate R code files, but they don't contain the environment they enclose (so it just hard codes Measure and Level, not their values for each function). I could wing it and just take those files, compile them into one file and do documentation for them as a group while having the package generate them when it loads or something. But that seems shoddy.

It was useful in my case for parsing timestamp strings, but clearly it is a general wrapper for strsplit that should (hopefully) be quick at grabbing the vector index you want from that split. There's reasons strsplit returns a list, so maybe some validation or something should be included to handle the subset of data this method is intended to work against, but it wasn't meant to be entirely sophisticated as-is.

ggplot2orBust

This is a bit above my pay grade but I know you can export environments as data sets (not sure if this helps). Also I have bee working on a package skeleton function (it will never be released to CRAN and is for my own personal use) that may be of use to you. It generates its own roxygen2 templated .R files. A bit fancier than package.skeleton.

Probably A Mammal

Yeah, but the data sets would have to be respectively linked to the function environments to which they belong (are used in). Unless there's a built-in way to handle that, I'm just not sure. Maybe I'll shoot Hadley a message about this. Maybe his code will give some insight into how he handled it. Something to study later. Since I have a workaround from needing a package to get done what I need to get done, I'll focus on continuing to build this suite of tools. Package development will come later.

ggplot2orBust

I would strongly recomend against this. I'd include it in a package on GitHub and add roxygen2 documentation as you go. This saves time recalling documentation and stuff later, is less boring in small chinks, allows you to test as you go and find breaking points, allows you to have others see your package and test it, provides a means of storing the package in a safe place.

Probably A Mammal

I always document as I go, but the point is that I CAN'T document something that doesn't exist (statically). I can document the generating function (level_measure), but not its offspring that I generate given a list of different levels (time cut breaks) and measures (vector functions of one output). Nobody else is going to be working on this, though. I'm the only R programmer here. It's all closed since it's work specific. I was thinking of using a private bit bucket for versioning, though.

ggplot2orBust

Phineas Packard

A while back Dason had code on his blog on how to get the point estimate and uncertainty around a turning point in polynomial regression. I stole the idea for latent growth curve models. The Lavaan code is:

New Member

Here are some useful tips and tricks in R. Please share the tips and tricks you know, in this thread:

Ctrl + L - clears the screen
Home and Ctrl+k - go to the beginning of the line of code and delete the line
ls ("package:base") - to find out the list of functions in base R
rm (list=setdiff (ls(), "x")) - remove all objects except "x"
read.csv ("data.csv", nrow=10) - import only the first 10 rows of data
read.csv ("data.csv", skip=100) - skip the first 100 rows of data and import the rest

To change your working directory to a specific folder permanently, right-click "Notepad++" and choose "Run as administrator" and add "setwd ("your working directory folder")" in line 18 of "Rprofile.site" file in your R directory and save.

Attached is a cool 3D interactive graphic from the codes below. I just love the "rgl" package:

Ambassador to the humans

To change your working directory to a specific folder permanently, right-click "Notepad++" and choose "Run as administrator" and add "setwd ("your working directory folder")" in line 18 of "Rprofile.site" file in your R directory and save.

There are better ways to do this that don't require administrator privileges. Just modify the .Rprofile in your home directory. If you don't have a .Rprofile file go ahead and create one. This will only modify things for you as a user. If you modify Rprofile.site you're making changes for *everybody* running on that computer.

New Member

Using loops takes much more processing time than "apply" family functions, especially if the data set is big. For example, we can use the function "proc.time()" to compare the processing times in assigning bonuses for 20,000 employees by using "for..loop" versus "sapply":

ggplot2orBust

Using loops takes much more processing time than "apply" family functions, especially if the data set is big. For example, we can use the function "proc.time()" to compare the processing times in assigning bonuses for 20,000 employees by using "for..loop" versus "sapply"

Direct assignment is much better in this case - but being tricky and only doing one assignment instead of two is the best way. The cut implementation would probably be faster if I used findInterval - I included it because it's fairly easy to generalize. One thing to keep in mind is that accessing a data.frame is a *slow* operation. So if you're building a loop and you have to access a single element (emp$service) in every iteration then you can easily get improvements by trying to work around that. You also always want to avoid building up your result vector as you go - preallocating can help if you can't avoid a loop. Clearly though for something simple like this you really should just be doing things directly.

Vectorization is king in R - if you can vectorize something instead of using a loop (or apply family function) then that will most likely net you the best results.

I'm sure if we used data.table we would see some decent results too - but it's hard to beat the vectorized approach for something simple like this.