Questions tagged [data.table]

The R data.table package is an extension of data.frame built for fast in-memory data analysis. Use the dt tag for the DataTables package with Shiny (DT).

1
vote
2answers
38 views

regexp R - extract string between commas

Because my csv file is broken, I'm reading it into R using: dataDT <- data.table::fread(".../test.csv", sep = NULL) And it gives a dataset something like: dataDT <- data.table("ColA,ColB,ColC,...
3
votes
1answer
41 views

data.table assign value to unique observation

Some example data: library(data.table) mydat <- data.table(id1=rep(c("A","B","C"),each=3), id2=c("D","E","G", "D","E","F","G","E","D"), val=c(1,2,4,1,2,3, ...
4
votes
4answers
58 views

Change data types using a list of data type names

What is an elegant way to change the data types of data frames columns from a list of data type names? Here's an example (change_to_data_types function is what I'm looking for): my_df <- iris ...
1
vote
2answers
61 views

Split and concenate strings in data.table

Lets say i have the following data: kat = c("a.b.c.d.e.f", "a.c.e.d.f.s", "a.v") Desired output in base R: > splitted = strsplit(kat, "[.]") > kat2 = sapply(splitted, function(x) paste(x[1:...
1
vote
1answer
34 views

POSIXct objects and time zones

I have date time data table imported from Excel and the date/time column in a number format (i.e., 43596.22). I used the following code to convert the number to a date time format with UTC time zone: ...
1
vote
1answer
37 views

sample from data.table

I have some data.table from which I want to select a random subset, but only for some operations. Suppose the data is dat <- data.table(id=1:100, group=sample(1:20,100), a=runif(100), b=rnorm(...
0
votes
3answers
44 views

Is there are version of `setorder` that behaves like `setcolorder`

I want to reorder the rows of a data.table according to some give sequence of indices, which is what setcolorder does for columns. Is there a function for this?
1
vote
2answers
42 views

counting events by event history with R

I have a data table that is structured like this, where I have kept track of processes. If an event occured then I marked a 1 next to it in that day, otherwise 0. I have shown the first few events ...
0
votes
0answers
35 views

Left_join by reference for data.table for duplicate keys

Data: library(data.table) A <- data.table(id = letters[1:10], amount = rnorm(10)^2) B2 <- data.table( id = c("c", "d", "e", "e"), ord = 1:4, comment = c("big", "slow", "nice", "nooice") ...
1
vote
1answer
31 views

Unable to reproduce expected result of tableB[ tableA]

Am unable to produce expected results with tableB[ tableA] on my data. But the same works fine on simple example data. Please decode what am I doing wrong. > tableA <- data.table(col1 = c( 1.0, ...
0
votes
2answers
40 views

join two tables such that we get all rows from tableA and all columns of matching rows from tableB [duplicate]

Join two tables such that we get all rows from tableA and all columns of matching rows from tableB I want to use data.table and not data.Frame. Please suggest the fastest method. tableA <- data....
0
votes
1answer
29 views

Pull Matching Data from 2 Data Frames Using dplyr or Purrr [duplicate]

I have 2 data frames. The first data frame has 2 columns (Ticker, Date), the second data frame has 3 columns (Ticker, Date, Price). The first data frame only has 1 row per Ticker while the 2nd data ...
-2
votes
2answers
43 views

Colon-Equals operator proper usage

I used the := in R to perform some manipulations in my data set but the usage which I am implementing throws an error. I tried using other functions like c() for creating subsets but I need ...
12
votes
2answers
152 views

Fast way to group variables based on direct and indirect similarities in multiple columns

I have a relatively large data set (1,750,000 lines, 5 columns) which contains records with unique ID values (first column), described by four criteria (4 other columns). A small example would be: # ...
1
vote
1answer
56 views

Group by in R with like on multiple constraints

currently I try to learn R but I stuck with the following. I had this table material V1 1: Silber 450.7886 2: Kupfer-Nickel 0.0000 ...
0
votes
1answer
50 views

Adding new column to list of data.tables [on hold]

I have a list of data.tables in R: "genelists" df1 <- data.table(x = 1:3, y=letters[1:3]) df2 <- data.table(x = 4:6, y=letters[4:6]) genelists <- list(df1,df2) I now want to add a new empty ...
2
votes
1answer
53 views

adding in data.table missing values for each combination of several categories

I am given a data.table dt with some demographic statistics for certain ages and years. Moreover, I have a differentiation into several categories Cat_1, Cat_2 and Cat_3 set.seed(1) Cat_1<-c("A","...
3
votes
5answers
90 views

Filtering observations based on specific date condition using data.table

I have a set of observations, which are recorded every time a user has taken an action. I want to filter only those observations from a user which are six or more months apart. So, if a user has ...
-1
votes
3answers
59 views

How can I create a rolling mean for each user based on previous 7 days of activity?

I've been looking at past posts and can't seem to find something that matches my needs. Goal: For each user, I want a mean of their previous 7 days of activities (not counting the current observation)...
3
votes
3answers
45 views

Selection with a filter on row number and value

I have the following simple data.table "test". I would like to select all rows of row 3 to 8 with X equal to "A": library(data.table) set.seed(1) test <- data.table(X=c(rep("A",5),rep("B",5)),Y=...
0
votes
0answers
52 views

How can I evaluate the code in function of a boolean column in a data table in R?

I have a new issue with my code, regarding to the first version in https://sid.justtry.fun/posts/56680932/edit I've realized that my output was wrong, because I always want to save in the column "...
4
votes
3answers
39 views

Enumerate groups within groups in a data.table [duplicate]

This is related to multiple duplicates (1, 2, 3), but a slightly different problem that I'm stuck with. So far, I've seen pandas solution only. In this data table: dt = data.table(gr = rep(letters[1:...
1
vote
0answers
21 views

Conditional merge/replacement in R: Do it progamatically [duplicate]

My question pertains to this post: Conditional merge/replacement in R I am using the data.table solution referenced there. The only issue is that I need to do this as part of a loop. In essence, I ...
2
votes
3answers
55 views

Left joining in R between two timestamps

My goal is to perform a left join on intervals where the bike_id matches and the created_at timestamp in records is BETWEEN start and end in the intervals table > class(records) [1] "data.table" "...
0
votes
2answers
55 views

Subset a data.table with two joined conditions [closed]

I would like to use data.table to collect information from the following dataset. set.seed(1) TDT <- data.table(nr= c(1:100),Group = c(rep("A",10),rep("B",10),rep("C",10),rep("D",10),rep("E",10),...
1
vote
3answers
50 views

dynamic column names seem to work when := is used but not when = is used in data.table

Using this dummy dataset setDT(mtcars_copy<-copy(mtcars)) new_col<- "sum_carb" # for dynamic column referencing Why does Case 1 work but not Case 2? # Case 1 - Works fine mtcars_copy[,eval(...
1
vote
1answer
95 views

How can I create a string using toString?

I have a data table with 2 columns: category and priority. I'm classifying the data in the following way using a for loop: I check if the priority of the actual value is smaller to the previous one. ...
1
vote
1answer
32 views

Multiply columns in a DT by DT[i,j]

Question 1: line 1 throws an error. Why and how to multiply all columns by DT[i,j]? Question 2: line 2 works but are there better ways to multiply all other columns by one column? df=data.table(...
1
vote
3answers
42 views

Subsetting character string and returning string

I was wondering if there was a clean solution using data.table to the following problem possibly using other packages such as stringr. Suppose I have the following data table DT <- data.table(...
3
votes
4answers
39 views

Value mapping by condition in R

I have a raw data frame that looks like this: test id class time 1 1 start 2019-06-20 00:00:00 2 1 end 2019-06-20 00:05:00 3 1 start 2019-06-20 00:10:00 4 1 end 2019-06-...
3
votes
2answers
40 views

Getting correlations only for selected variables using a for loop

I have a dataset as follows: set.seed(1) TDT <- data.table(Group = c(rep("A",40),rep("B",60)), Id = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)), ...
-1
votes
1answer
51 views

The proper way to vectorise an if-else tower in R

I came across the following post: Vectorized IF statement in R?, which deals with a vecotrisation of one if-else construct in R. However, I do not want to build nested $ifelse$ functions in R, is ...
2
votes
1answer
45 views

How to Cbind a data.table and a vector

I want to cbind a data.table and a vector in such a way that the vector contents becomes new columns for the data.table with zero values. DT <- data.table(x=c("A", "B", "C", "D", "E", "F"), y = ...
1
vote
2answers
35 views

Getting the column names by index does not work correctly [duplicate]

I have a dataset which looks as follows: set.seed(1) TDT <- data.table(Group = c(rep("A",40),rep("B",60)), Id = c(rep(1,20),rep(2,20),rep(3,20),rep(4,20),rep(5,20)), ...
0
votes
5answers
56 views

Create a combination of unique column names from two dataset without looping

I have two vectors: a <- c(1,2,3) b <- c(11,12,13) I want to create a combination of column names (3*3 = 9) such that they use values from both: paper1grid11 paper1grid12 paper1grid13 ...
15
votes
1answer
258 views

Why is := faster than `:=`()?

Usually, I use the functional form `:=`() to compute multiple columns in a data.table, thinking that this is the most efficient method. But I've recently discovered that it's slower than ...
1
vote
0answers
40 views

Is it posible to iterate on a data.table by group?

I want to classify the overlapped categories depending on their priority. When I have overlapped categories I set the column overlap to 1 and then I made different groups to evaluate the categories ...
0
votes
1answer
35 views

colnames() behaviour with data.table in R

Using colnames() function with a data.table seems to convert the resulting variable to a "passed by reference" one. I'm using R 3.6.0 and data.table 1.12.2 library(data.table) DT = data.table( ID = ...
0
votes
1answer
55 views

How to deal with instable second in dataframe datetime column to get minutely data?

I have to make data minutely from raw data which is instable second. I can't use second() from data.table package to make it minutely from instable second with first half of minute being round down ...
1
vote
5answers
50 views

Efficient way to compare all columns in data table R

I have two data tables in R which have the same columns (number, name and order) and an ID as follows: library(data.table) dt1 <- data.table(ids = c(1, 2, 5), col1 = c("A", "B", "F"), col2 = c("B",...
3
votes
1answer
74 views

Benchmarking datatables in R using lapply, is it slower?

I have numerous datasets that will eventually be compared against one another. I've read that data.table and using lapply was the fastest way to analyze the data and saw some benchmark comparisons in ...
0
votes
2answers
49 views

Adding suffix to values meeting a condition(s) in R

I am trying to add a suffix letter based on a character value of another variable. Whenever I see an "e" in the category variable, then the id should have three rows like i_C, i_E, and i_O. This means ...
1
vote
1answer
33 views

How to convert r data.table expression into a function for looping

I am trying to convert an R data.table code snippet into an appropriate function, but I am not having success. I would like to summarize a variable using this code: library(data.table) mtcars_dt &...
1
vote
2answers
39 views

Reshaping rows to lists by variable in R [duplicate]

I have a data frame that looks like this: class id 1 foo 1 2 bar 1 3 baz 1 4 baz 2 5 bar 2 6 foo 2 7 foo 3 8 foo 3 9 foo 3 My goal is to reshape it into a data frame ...
1
vote
3answers
48 views

return ratio of rows when aggregating data

I have a large data-set in R, which i'm wrangling with data.table. I would like to aggregate some data, and have returned the ratio of the row values to the total for each row. I have managed to ...
1
vote
2answers
39 views

Check if some particular months fall between two date columns of data.table

I have a vector of month values, months = 5:10 (for May - October) and I have a data.table with two date columns. I want to remove all rows where the date range specified by these two columns does not ...
0
votes
0answers
28 views

Creating a variable using sd () function and lag () function together [duplicate]

For the following problem, I want to create a variable called VOL_ROA, which is the standard deviation of the variable ROA from period t-2 to t. I should create this for each "TICKER" variable, ...
1
vote
1answer
47 views

Classification of data in R

I want to evaluate all the values when I have a flag set to one. In the following example I want to do 2 evaluations: Which "input" is lower from line 1 to line 3, evaluating 1st line, 2nd line and ...
1
vote
2answers
47 views

Merge dataframes when timestamp of one is between another one's datetime intervals

I have two data frames with time data in POSIXct format and a corresponding location which I need to match. One dataset has time in a series of 30 minute bins, along with location data. location ...
0
votes
1answer
34 views

Using data.table::fwrite() to write .txt files — is.list(x) is not TRUE

I was trying to replace the base R function write.table() with data.table::fwrite() to speed up writing, but the function complains that is.list(x) is not TRUE. What is problem with the input I'm ...