Skip to content

[.data.table is very slow with a single integer #5636

@MLopez-Ibanez

Description

@MLopez-Ibanez

In this code:

library(data.table)
parameters <- list(types = c(p1 = "r", p2 = "r", p3 = "r", dummy = "c"),
                   digits = 4)
n <- 10000
newConfigurations <- data.table(p1 = runif(n), p2 = runif(n), p3 = runif(n),
                                dummy = sample(c("d1", "d2"), n, replace=TRUE))

repair_sum2one <- function(configuration, parameters)
{
  isreal <- names(which(parameters$types[colnames(configuration)] == "r"))
  digits <- parameters$digits[isreal]
  c_real <- unlist(configuration[isreal])
  c_real <- c_real / sum(c_real)
  c_real[-1] <- round(c_real[-1], digits[-1])
  c_real[1] <- 1 - sum(c_real[-1])
  configuration[isreal] <- c_real
  return(configuration)
}
j <- colnames(newConfigurations)
for (i in seq_len(nrow(newConfigurations)))
      set(newConfigurations, i, j = j, value = repair_sum2one(as.data.frame(newConfigurations[i]), parameters))

More than half the time is spent in [.data.table. Even the function repair_sum2one is faster.

Originally posted by @MLopez-Ibanez in #3735 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions