Document Actions

In this talk, Ian Cook will discuss how to apply the tenets of R's dplyr package (immutability, chaining, consistency, parsimony) when working with Python's pandas library. In the R community, dplyr is the most widely used data manipulation package. dplyr provides a small, consistent set set of "verbs" (functions) that you can use to perform most common operations on R data frames. You can chain together these verbs to perform a series of operations on a data frame. dplyr treats data frames as immutable objects, returning manipulated copies instead of mutating them in place. In the Python community, pandas is the most widely used data manipulation library. pandas does not prescribe one right way to manipulate DataFrames; it enables several different styles. Ian will show how to apply the dplyr style when you're working with pandas, and we'll discuss the benefits, challenges, and alternatives. Ian Cook is a data scientist at Cloudera. Ian is a long-time R user and an author of and contributor to several R packages; he is newer to Python. Ian lives in Carrboro and is a cofounder of the local Research Triangle Analysts group. Ian has degrees in Statistics from Lehigh University and Stony Brook University. Extemporaneous "lightning talks" of 5-10 minute duration are also welcome and don't need to be pre-announced. Lightning talks are for you to "show and tell" something you've learned about Python recently, no matter how small. We all use Python, therefore, we are always learning something new about Python that we can tell others. Plenty of free parking is available in the RENCI parking deck. The meeting will be followed by our usual after-meeting at a nearby tavern for food and beverage. Come join us for a fun and informative evening.