對改變矩陣列的for回圈進行矢量化-有解無憂

假設我有一個包含 100 棵樹的向量。然后我將這些樹齡分別為 5 年、10 年、15 年和 20 年，以創建今年的樹齡矩陣和未來的四個 5 年規劃期。

但是后來，我決定砍掉其中一些樹（每個計劃期間只有 10 棵），記錄在 T/F 值矩陣中，其中 T 被收獲而 F 不是（樹木不能被收獲兩次）。

age.vec <- sample(x = 1:150, size = 100, replace = T) # create our trees
age.mat <- cbind(age.vec, age.vec 5, age.vec   10, age.vec   15, age.vec   20) # grow them up
x.mat <- matrix(data = F, nrow = 100, ncol = 5) # create the empty harvest matrix
x.mat[cbind(sample(1:100, size = 50), rep(1:5, each = 10))] <-  T # 10 trees/year harvested

那么，當年采伐的樹齡為零： age.mat[x.mat] <- 0

然后，我想在接下來的時間里再次老化收獲的樹木。例如，如果在第一個計劃期收獲一棵樹，在第二個計劃期（5 年后），我希望樹的年齡為 5，然后在第三個計劃期（10 年后），我希望樹齡樹的長度為 10。我已在以下 for 回圈中成功實作了這一點：

for (i in 2:5){ # we don't need to calculate over the first year
  age.mat[,i]<-age.mat[,i-1] 5L # add 5 to previous year
  age.mat[x.mat[,i],i] <- 0L # reset age of harvested trees to zero
}

這可行，但是，它笨重且緩慢。有沒有辦法更快地實作這個（即沒有for回圈）？它也是在一個函式中實作的，這意味著使用“apply”實際上會減慢速度，因此需要直接對其進行矢量化。這是我要迭代數千次的東西，所以速度至關重要！

謝謝！

uj5u.com熱心網友回復：

根據rbenchmark.

這是一種依賴于這樣一個事實的方法，即采伐一棵樹不會停止時間的流逝，它只是重置時鐘。因此，我們可以將收獲視為從所有未來年齡中減去收獲年齡。

x.die <- x.mat * age.mat 
x.dif <- t(apply(x.die, 1, cumsum))
age.mat2 <- age.mat - x.dif

x.die，通過將收成乘以年齡，我們得到每次收成的年齡。下一行計算每一行的這些累積和，最后我們從原始年齡中減去這些。

我假設您的“樹木不能收獲兩次”意味著我們永遠不會在一行中看到兩個 TRUE x.mat？如果每個樹的位置不止一次收獲，我的代碼將無法正常作業。

uj5u.com熱心網友回復：

@Jon Spring 的答案的替代方案t(apply是matrixStats::rowCumsums.

library(matrixStats)

n <- 1e4L
n10 <- n/10L
age.mat <- outer(sample(150, n, TRUE), seq(0, 20, 5), " ")
x.mat <- matrix(FALSE, n, 5) # create the empty harvest matrix
# sample harvests so that no tree is harvested twice
x.mat[matrix(c(sample(n, n/2L), sample(n10:(6L*n10 - 1L)) %/% n10), n/2L)] <- TRUE

f1 <- function(age, x) {
  age[x[,1],] <- 0
  for (i in 2:5){ # we don't need to calculate over the first year
    age[,i] <- age[,i - 1]   5L # add 5 to previous year
    age[x[,i], i] <- 0L # reset age of harvested trees to zero
  }
  age
}

f2 <- function(age, x) {
  age - rowCumsums(x*age)
}

microbenchmark::microbenchmark(f1 = f1(age.mat, x.mat),
                               f2 = f2(age.mat, x.mat),
                               check = "equal")
#> Unit: microseconds
#>  expr   min    lq     mean median     uq     max neval
#>    f1 294.4 530.2 1023.450  566.6 629.35 33222.8   100
#>    f2 135.2 263.6  334.622  284.2 307.15  4343.6   100

uj5u.com熱心網友回復：

您可以使用 apply 逐行處理每個向量，然后在函式中使用一些邏輯來調整值。

應該快 4 倍左右

  age.mat |> 
  apply(1, \(x) { 
    
    if(any(x == 0 & (which(x == 0) != length(x)))) { 
      
   x[which(x == 0):length(x)] <- (0:(length(x) - which(x == 0))) * 5
   
   x
    
      } else x
  }) |> t()

     [,1] [,2] [,3] [,4] [,5]
  [1,]  101    0    5   10   15
  [2,]   55   60   65   70   75
  [3,]   23   28   33    0    5
  [4,]    0    5   10   15   20
  [5,]   23   28   33    0    5
  [6,]   84    0    5   10   15
  [7,]   52   57   62    0    5
  [8,]   26   31   36   41    0
  [9,]  114  119  124  129    0
 [10,]   33   38   43   48   53
 [11,]  144  149  154  159  164
 [12,]   19   24   29   34   39
 [13,]   43   48   53   58   63
 [14,]   69   74   79   84   89
 [15,]   98  103  108  113  118
 [16,]  110  115  120  125  130
 [17,]    8   13   18   23   28
 [18,]   16   21   26   31   36
 [19,]    1    6   11   16   21
 [20,]   60   65    0    5   10

uj5u.com熱心網友回復：

我找到了一種方法！我實作了從@john-spring 倒退的想法，在那里我創建了一個矩陣，其中填寫了收獲年份和隨后所有年份的展臺年齡，然后從我預先制作的老化中減去矩陣。我構建了一個類似于 tidyr 中的“fill”或 zoo 中的“na.locf”所做的功能（因為它們太慢了）。

首先，我使用 arrayInd 來確定更改的樹矩陣中的位置。然后我用它來制作另一個矩陣，該矩陣組合每個索引行的重復次數等于周期數減去收獲樹的周期加一，以及一個與周期相同長度的序列向量索引號的周期數。

x.ind <- arrayInd(which(x.mat), dim(x.mat)) # gets index of row that was changed
x.new.ind <- cbind(rep(x.ind[,1], times = nper-x.ind[,2] 1), sequence(nvec = nper-x.ind[,2] 1, from = x.ind[,2]))

例如，如果在位置 [4, 2] 收獲了一棵樹，這意味著第四棵樹是在第二個時期收獲的，并且我們總共有 5 個時期，它將創建一個矩陣：

    [,1] [,2] 
[1,]  4    2
[2,]  4    3
[3,]  4    4
[4,]  4    5

然后我制作了一個向量，其中包含在正確位置收獲的樹木的年齡，其余位置為零（例如，對于我們的示例，如果收獲的樹有 100 年歷史，我們將有一個向量0 0 0 100 0（如果我們有 5 棵樹））。

ages.vec <- vector(mode = "integer", length = nrow(age.mat))
ages.vec[x.ind[,1]]<- age.mat[x.ind]

然后我將這個向量乘以一個邏輯矩陣，上面矩陣中的行、列位置為“T”。

繼續上面的例子，我們得到：

    [,1] [,2] [,3] [,4] [,5]
[1,]  0    0    0    0    0
[2,]  0    0    0    0    0
[3,]  0    0    0    0    0
[4,]  0  100  100  100  100
[5,]  0    0    0    0    0

然后我從我們當前（已經老化）的年齡矩陣中減去它。所以樹四曾經是95 100 105 110 115，現在是95 0 5 10 15。

new.ages.mat<- age.mat - replace(x.mat, x.new.ind, TRUE)*ages.vec

雖然這可能不是最優雅的解決方案，但使用 microbenchmark，它比我們的 for 回圈快 90 倍，比 John 創建的可愛的 apply 函式快 3 倍。我會放入微基準呼叫和結果，但是這篇文章已經足夠長了！我知道有更好的方法來創建ages.vec 并將其合并，我將繼續努力，并將用我的結果更新這個答案！

uj5u.com熱心網友回復：

這種方法建立在which使用與 arr.ind=TRUE 一起創建一個兩列矩陣的基礎上，該矩陣對新植樹的起始位置（在第一列中）和時間（在第二列中）進行編碼。它確實違反了函式式編程范式，因為它使用<<-“就地”為age.mat`分配新值。

fiveseq <- seq(0,20, by=5) # this way one only needs to call `seq` once
apply(which(x.mat, arr.ind=TRUE) , 1, 
          function(r) {age.mat[ r[1], r[2]:5] <<- fiveseq[  1:(6-r[2])] } )

總之，它找到新的位置和間隔，并用序列中正確數量的專案替換該行的其余部分{0, 5, 10, 15, 20}

（我很想看看這與您已經建立的基準測驗框架相比如何。）

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/491643.html

標籤：r 表现循环 for循环矩阵

上一篇：使用函式庫，成員函式呼叫性能變慢

下一篇：為快速加載檔案格式注冊寬度和決議