SQLite removes ties in group by

I am expecting the following to yield more than 3 rows, since there are ties of min(a.[Sepal.Width]) within each of a.[Species], however only 3 rows are returned: sqldf(' select a.[Species], min( a.[Petal.Width]) from iris a group by a.[Species] ')...
more »

2017-02-22 21:02 (2) Answers

Set parts of listcolumn to NULL in R

library(tidyverse) data(mtcars) mtcars <- rownames_to_column(mtcars,var = "car") mtcars$id <- c(1:length(mtcars$car)) mtcars$make <- map_chr(mtcars$car,~strsplit(.x," ")[[1]][1]) mt2 <- mtcars %>% select(1:4,id,make) %>% nest(-make...
more »

2017-02-22 18:02 (1) Answers

Replace certain dates with NA

I'm trying to replace certain dates with NA. I tried the following but it did not work. df <- data.frame(dates = seq.Date(as.Date("1910-01-01"), as.Date("1999-01-01"), "days")) %>% mutate_if(dates < as.Date("1990-01-05"),NA) Does ...
more »

2017-02-22 17:02 (2) Answers

Lapply vs for loop - Performance R

It is often said that one should prefer lapply over for loops. There are some exception as for example Hadley Wickham points out in his Advance R book. (http://adv-r.had.co.nz/Functionals.html) (Modifying in place, Recursion etc). The following is o...
more »

2017-02-22 15:02 (1) Answers

build package with overloaded operator

I'm new to building packages with R and I am starting with making a package that combines some functions that I wrote and often load independently. Among those functions there is an overloaded + operator for concatenating strings. It's simply: `+` =...
more »

2017-02-22 14:02 (1) Answers

order by factor in sqldf

As far as I can see, it is not possible to order by a factor in sqldf: levels( iris$Species ) <- c("virginica", "versicolor", "setosa") levels(iris$Species) > sqldf(' select distinct iris.[Species] from iris order by iris.[Species] ' ) ...
more »

2017-02-22 13:02 (1) Answers

Date formatting MMM-YYYY

I have a dataset with dates in following format: Initial: Jan-2015 Apr-2013 Jun-2014 Jan-2015 Jan-2016 Jan-2015 Jan-2016 Jan-2015 Apr-2012 Nov-2012 Jun-2013 Sep-2013 Final: Feb-2014 Jan-2013 Sep-2014 Apr-2013 Sep-2014 Mar-2013 Aug-2012 Apr-2012 ...
more »

2017-02-22 11:02 (2) Answers

Color gradients in R in PDF and bitmap output

I am struggling to get a visually acceptable color gradient in R (see here for a detailed description of my particular case). The problem, in short, is that while output in the R window looks OK, PDFs show thin, white lines between segments used to g...
more »

2017-02-22 10:02 (2) Answers

Dummify character column and find unique values

I have a dataframe with the following structure test <- data.frame(col = c('a; ff; cc; rr;', 'rr; a; cc; e;')) Now I want to create a dataframe from this which contains a named column for each of the unique values in the test dataframe. A uniqu...
more »

2017-02-22 10:02 (6) Answers

Map for nested `data_frame`s

When using map on a nested data_frame, I do not understand why the latter two version give an error: library(tidyverse) # dummy data df <- tibble(id = rep(1:10, each = 10), val = runif(100)) df <- nest(df, -id) # works as ex...
more »

2017-02-22 08:02 (1) Answers

Retrieving R object attributes in JavaScript

I have a bivariate dataset with 100 observations. I used hexagon binning and ended up with 26 hexagon bins. In order to save the rows of the 100 observations that are in each of the 26 hexagon bins, I used the base::attr function in R. In the code be...
more »

2017-02-22 01:02 (1) Answers

bit64 integers with fst

I have data in a csv containing long integers. I am exchanging this data between csvs and fst files. For example, library(bit64) library(data.table) library(fst) library(magrittr) # Prepare example csvs DT64_orig <- data.table(x = (c(234561234...
more »

2017-02-22 00:02 (2) Answers

Reading files in For-loop fashion in R

I have some files with YYYYMMDD date code in it. For example, my20150112.csv. How do you make a for-loop in R so that R will automatically process the next date after it finishes processing the previous date. Here are the scripts below: R_script -&...
more »

2017-02-21 23:02 (3) Answers

Dynamically creating functions and expressions

I am currently dealing with a problem. I am working on a package for some specific distributions where among other things I would like to create a function that will fit an mixture to some data. For this I would like to use for example the fitdistr f...
more »

2017-02-21 23:02 (1) Answers

Determine if 24 hour datetime is within interval

Hope you can help. Have a dataframe with date times in it. I'd like to determine if the time result occurs after hours (> 16:00). Is there an easy way to do this? Was planning on converting the time to seconds and then doing like this but suppose t...
more »

2017-02-21 22:02 (2) Answers

Is there a basepath data option in SparkR?

I have an explicitly pruned schema structure in S3, causing the following error when I read.parquet(): Caused by: java.lang.AssertionError: assertion failed: Conflicting directory structures detected. Suspicious paths s3a://leftout/for/security/...
more »

2017-02-21 22:02 (2) Answers

How to use data.table within functions and loops?

While assessing the utility of data.table (vs. dplyr), a critical factor is the ability to use it within functions and loops. For this, I've modified the code snippet used in this post: data.table vs dplyr: can one do something well the other can'...
more »

2017-02-21 20:02 (1) Answers

How to sum by grouped columns in R?

This is my input. A dataframe with n columns, and an auxiliary dataframe that assigns each id to a group. df <- data.frame( a1 = c(1,2,3), a2 = c(2,3,4), b1 = c(4,5,6), b2 = c(5,6,7) ) aux <- data.frame( id = c("a1", "a2", "b1", "...
more »

2017-02-21 19:02 (3) Answers