如何計算資料集中每一列乘以第一列的總和？-有解無憂

我的資料框如下。

df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33), 
                 patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
                 patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
                 patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
                 row.names = c("gene1","gene2","gene3","gene4","gene5"))

> df
       stat patient1 patient2 patient3
gene1  3.38    -0.44    0.400    0.350
gene2 -3.40    -0.22    0.045    0.210
gene3  4.45     0.80   -0.140   -0.230
gene4 -4.21    -0.21   -0.078   -0.019
gene5  3.33    -0.22   -0.160   -0.210

我一直在努力尋找如何撰寫腳本或制作回圈來計算“stat”列和每個患者列的乘法總和，因為我的患者資料集中有 141 列和 142 行來完成這項作業。

所以，我想要一個名為“簽名分數”的新行，其計算值如下：

row.names(df)[nrow(df)] <- "Signature Score"

sum_multi_1 <- sum(df[c(1:nrow(df)-1),2]*df[c(1:nrow(df)-1),1])
sum_multi_2 <- sum(df[c(1:nrow(df)-1),3]*df[c(1:nrow(df)-1),1])
sum_multi_3 <- sum(df[c(1:nrow(df)-1),4]*df[c(1:nrow(df)-1),1])

df[nrow(df),2] <- sum_multi_1
df[nrow(df),3] <- sum_multi_2
df[nrow(df),4] <- sum_multi_3

這是...

> df
                 stat patient1 patient2 patient3
gene1            3.38  -0.4400  0.40000  0.35000
gene2           -3.40  -0.2200  0.04500  0.21000
gene3            4.45   0.8000 -0.14000 -0.23000
gene4           -4.21  -0.2100 -0.07800 -0.01900
gene5            3.33  -0.2200 -0.16000 -0.21000
Signature Score    NA   2.9723  0.37158 -1.17381

我試圖做一個像這樣的for回圈......

for (i in 1:nrow(df)){
  df[nrow(df),i 1] <- sum(df[c(1:nrow(df)-1,i 1)]*df[c(1:nrow(df)-1),1])
}

但它沒有做這項作業。誰能告訴我我缺少什么或我需要寫什么？

一切順利，Tj

uj5u.com熱心網友回復：

您可以使用mutateandacross計算所需的乘法，然后adorn_totals()從janitor包中添加總計列。

   library(dplyr)
    df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33), 
                       patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
                       patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
                       patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
                       row.names = c("gene1","gene2","gene3","gene4","gene5")) %>% 
  rownames_to_column(var = "genes") %>% 
  mutate(across(patient1:patient3, ~.x * stat)) %>% 
  janitor::adorn_totals(name = "Signature Score") 
  
  df[length(df) 1, 2] <- NA

輸出：

    rowname  stat patient1 patient2 patient3
           gene1  3.38  -1.4872  1.35200  1.18300
           gene2 -3.40   0.7480 -0.15300 -0.71400
           gene3  4.45   3.5600 -0.62300 -1.02350
           gene4 -4.21   0.8841  0.32838  0.07999
           gene5  3.33  -0.7326 -0.53280 -0.69930
 Signature Score    NA   2.9723  0.37158 -1.17381

uj5u.com熱心網友回復：

另一種可能的解決方案，在基礎 R 中：

rbind(df, signa = c(NA,colSums(df[,1] * df[-1])))

#>        stat patient1 patient2 patient3
#> gene1  3.38  -0.4400  0.40000  0.35000
#> gene2 -3.40  -0.2200  0.04500  0.21000
#> gene3  4.45   0.8000 -0.14000 -0.23000
#> gene4 -4.21  -0.2100 -0.07800 -0.01900
#> gene5  3.33  -0.2200 -0.16000 -0.21000
#> signa    NA   2.9723  0.37158 -1.17381

uj5u.com熱心網友回復：

我注意到你減去1是為了讓索引從0. 然而，與 Python 不同的是，R 中的索引從 1 開始。所以你可能想要這個：

colSums(df[-1]*df$stat)
# patient1 patient2 patient3 
#  2.97230  0.37158 -1.17381

uj5u.com熱心網友回復：

你太復雜了。
為了使代碼更清晰，定義一個輔助函式fun來對列進行乘法和求和。然后apply函式到資料。

df <- data.frame(stat = c(3.38, -3.40, 4.45, -4.21, 3.33), 
                 patient1 = c(-0.44, -0.22, 0.80, -0.21, -0.22),
                 patient2 = c(0.40, 0.045, -0.14, -0.078, -0.16),
                 patient3 = c(0.35, 0.21, -0.23, -0.019, -0.21),
                 row.names = c("gene1","gene2","gene3","gene4","gene5"))

# auxiliary function
fun <- function(x, y) sum(x * y)

apply(df[-1], 2, fun, y = df[[1]])
#> patient1 patient2 patient3 
#>  2.97230  0.37158 -1.17381

sigscore <- apply(df[-1], 2, fun, y = df[[1]])
rbind(df, `Signature Score` = c(NA, sigscore))
#>                  stat patient1 patient2 patient3
#> gene1            3.38  -0.4400  0.40000  0.35000
#> gene2           -3.40  -0.2200  0.04500  0.21000
#> gene3            4.45   0.8000 -0.14000 -0.23000
#> gene4           -4.21  -0.2100 -0.07800 -0.01900
#> gene5            3.33  -0.2200 -0.16000 -0.21000
#> Signature Score    NA   2.9723  0.37158 -1.17381

^{由reprex 包于 2022-05-05 創建(v2.0.1)}

uj5u.com熱心網友回復：

這是另一個tidyverse選項，我在其中應用該函式，summarise然后獲取列總計，然后更改行名，最后系結回原始資料框。

library(tidyverse)

df %>%
  summarise(across(-stat, ~ sum(.x * stat, na.rm = T))) %>%
  `row.names<-`("Signature Score") %>%
  bind_rows(df, .)

輸出

                 stat patient1 patient2 patient3
gene1            3.38  -0.4400  0.40000  0.35000
gene2           -3.40  -0.2200  0.04500  0.21000
gene3            4.45   0.8000 -0.14000 -0.23000
gene4           -4.21  -0.2100 -0.07800 -0.01900
gene5            3.33  -0.2200 -0.16000 -0.21000
Signature Score    NA   2.9723  0.37158 -1.17381

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/470697.html

標籤：r 数据框循环 for循环

上一篇：天真的反向字串迭代無限回圈和/或斷言失敗C /VisualStudio2022

下一篇：Python：無法在串列中獲取浮點數