data.table speed is slower when assigning a column

For some reason, this operation seems to show data.table assigning a new column about half as fast as base R. Is there a reason for this? require(microbenchmark) require(data.table) DT = data.table(a = runif(1000000), b = rnorm(1000000)) DF = data.f...
more »

2017-04-19 23:04 (3) Answers

Use `callNextMethod()` with dotsMethods

I would like to define some S4 generics dispatching on the ... argument such that the more specialized methods call the inherited method through callNextMethod(). However, as illustrated by the MWE, this fails with the following error. # sample func...
more »

2017-04-19 16:04 (0) Answers

Passing data.frame as an argument in function

Context As a followup to R: Pass data.frame by reference to a function and How to add a column in the data frame within a function I am attempting the following, seemingly easy, function: naToZero <- function(df) { df$Vol[is.na(df$Vol)] <-...
more »

2017-04-19 16:04 (2) Answers

why does as.integer in R decrement the value?

I am doing a simple operation of multiplying a decimal number and converting it to integer but the result seems to be different than expected. Apologies if this is discussed else where, I am not able to find any straight forward answers to this >...
more »

2017-04-19 15:04 (1) Answers

data.table rolling join for each group

How can I join two data table with rolling join for each group? library(data.table) alldates = as.Date(c('2000-01-01','2005-01-01','2010-01-01','2015-01-01','2020-01-01')) gdp = data.table(date = alldates[c(1,3,5,1,3,5)], country = c('A','A','A','B'...
more »

2017-04-19 15:04 (1) Answers

Fast way of calculating distance between rows

I am trying to calculate weighted euclidean distance between rows from two dataframes. For example df.test contains 100 rows whiledf.train has 1000 rows.And I would like to find rows with mini distance in df.train for every row in df.test First: I...
more »

2017-04-19 10:04 (0) Answers

Create "Week_Start" variable in R

I have a dataframe that look like the one below. bus_date <- as.Date(c('2017-04-03', '2017-04-04', '2017-04-06', '2017-04-11', '2017-04-13', '2017-04-17')) sales <- c(100, 110, 120, 200, 300, 100) daily_sales <- data.frame(bus_date, sales...
more »

2017-04-18 21:04 (2) Answers

categorize based on date ranges in R

How do I categorize each row in a large R dataframe (>2 million rows) based on date range definitions in a separate, much smaller R dataframe (12 rows)? My large dataframe, captures, looks similar to this when called via head(captures) : id...
more »

2017-04-18 21:04 (3) Answers

dplyr for rowwise quantiles

I have a df of strata, each of which has 1000 samples from a posterior distribution of the estimates from that stratum. mydf <- as.data.frame(lapply(seq(1, 1000), rnorm, n=100)) colnames(mydf) <- paste('s', seq(1, ncol(mydf)), sep='') I wan...
more »

2017-04-18 21:04 (3) Answers

R -- Grouping observations based on a code list

I have a dataset where each observation has an integer "code" variable, which I would like to convert to a character "class" variable. Here is a simple example to illustrate what I am trying to do. code.list <- data.frame(code = 1:10, ...
more »

2017-04-18 19:04 (1) Answers

How to sort list in R?

I have list of lists similar to this: a <- list( list(day = 5, text = "foo"), list(text = "bar", day = 1), list(text = "baz", day = 3), list(day = 2, text = "quux") ) with unknown number of fields and the fields my be out of order. how...
more »

2017-04-18 18:04 (2) Answers

Create a colour blind test with ggplot

I would like to create a colour blind test, similar to that below, using ggplot. The basic idea is to use geom_hex (or perhaps a voronoi diagram, or possibly even circles as in the figure above) as the starting point, and define a dataframe that,...
more »

2017-04-18 17:04 (0) Answers

Aggregate character/string variables

I've got a data frame which has a list of Person Reference and then a Trade, each person reference can have multiple trades. I want to aggregate so that it shows the data as below: Record Person Ref Trade Code 1 512 elec, plumbing...
more »

2017-04-18 16:04 (1) Answers

dplyr summarise with condition

I would like to apply the dplyr function summarise_all to calculate the mean in each column. However, I do want to omit the 0 values, thus need to build in a conditional statment. df <- data.frame(x = c(1,0,4,6,0,9), y = c(12,42,8,0,11,2)) df %&...
more »

2017-04-18 11:04 (1) Answers

Error while loading RTextTools library in Mac

Error while loading RTextTools library in Mac, however this works for windows pc. Here is the error: library(RTextTools) # For removeSparseTerms() Error in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]) : there is no packag...
more »

2017-04-18 05:04 (0) Answers

How to plot a function family in ggplot2

I need to plot a family of functions variying according to a set of parameters, say, a family of normal distribution curves that depend on the mean and standard deviation. I found here, a code snipet that almost do the task: p9 <- ggplot(data.fr...
more »

2017-04-18 04:04 (2) Answers

Milliseconds separated by comma

I have the following data.frame called "data" (it is much larger but i just give the first lines as an example): Timestamp Weight Degrees 1 30-09-2016 11:45:00,000 38.19 40.00 2 01-10-2016 06:19:57,860 39.12 40.00 3 01-10-2016 06:20:46,393 42.11...
more »

2017-04-17 23:04 (1) Answers

R searching for information within column

I have two tables. Which have the kind of formatting shown below. One of it is table A as such: students|Test Score A | 100 B | 81 C | 92 D | 88 Another table B I have looks like this: Class | Students 1 | {A,D}...
more »

2017-04-17 23:04 (5) Answers

Custom sorting and calling for columns in R

Got 100 columns with currency data not sorted in a specific order. However, there is one column of spot rates and one for forward rates for each currency. They can all be identified by their name which are in the format USDXXX (XXX=ticker for currenc...
more »

2017-04-17 22:04 (0) Answers