我想做的簡單操作被證明不是那么簡單。所以我有一個時間序列資料集,我想執行逐行歸一化,所以對于每個觀察,(x- mean(row))/stdev(row)
.
這是一次嘗試但無濟于事,而且我已經用 0 替換了 NA 值,所以這似乎不是問題。
norm <- for (i in 1:nrow(clusterdatairaq2)){
for(j in 2:ncol(clusterdatairaq2)) {
clusterdatairaq2[i,j] <- (clusterdatairaq2[i,j] - mean(clusterdatairaq2[i,]))/ sd(clusterdatairaq2[i,])
}
}
在此先感謝您的幫助!!
uj5u.com熱心網友回復:
set.seed(42)
mtx <- matrix(sample(99, size=6*5, replace=TRUE), nrow=6)
df <- cbind(data.frame(id = letters[1:6]), mtx)
df
# id A B C D E
# 1 a 49 47 26 95 58
# 2 b 65 24 3 5 97
# 3 c 25 71 41 84 42
# 4 d 74 89 89 34 24
# 5 e 18 37 27 92 30
# 6 f 49 20 36 3 43
out <- t(apply(df[,-1], 1, function(X) (X-mean(X)) / sd(X)))
colnames(out) <- paste0(colnames(df[,-1]), "_norm")
df <- cbind(df, out)
df
# id A B C D E A_norm B_norm C_norm D_norm E_norm
# 1 a 49 47 26 95 58 -0.2376354 -0.3168472 -1.1485711 1.5842361 0.1188177
# 2 b 65 24 3 5 97 0.6393668 -0.3611690 -0.8736386 -0.8248320 1.4202728
# 3 c 25 71 41 84 42 -1.1427812 0.7618541 -0.4802994 1.3001207 -0.4388942
# 4 d 74 89 89 34 24 0.3878036 0.8725581 0.8725581 -0.9048751 -1.2280448
# 5 e 18 37 27 92 30 -0.7749098 -0.1291516 -0.4690243 1.7401483 -0.3670625
# 6 f 49 20 36 3 43 1.0067737 -0.5462283 0.3106004 -1.4566088 0.6854630
uj5u.com熱心網友回復:
假設我們有一個這樣的資料框:
library(dplyr)
df = tibble(
Destination = c("Belgium", "Bulgaria", "Czechia"),
`Jan 2008` = sample(1:1000, size=3),
`Feb 2008` = sample(1:1000, size=3),
`Mar 2008` = sample(1:1000, size=3)
)
df
# A tibble: 3 × 4
Destination `Jan 2008` `Feb 2008` `Mar 2008`
<chr> <int> <int> <int>
1 Belgium 811 299 31
2 Bulgaria 454 922 421
3 Czechia 638 709 940
做到這一點的 tidyverse 方法(我認為這里比 base R 更好)
library(dplyr)
library(tidyr)
scaled = df %>%
pivot_longer(`Jan 2008`:`Mar 2008`) %>%
group_by(Destination) %>%
mutate(value = as.numeric(scale(value))) %>%
ungroup()
scaled
Destination name value
<chr> <chr> <dbl>
1 Belgium Jan 2008 1.09
2 Belgium Feb 2008 -0.205
3 Belgium Mar 2008 -0.881
4 Bulgaria Jan 2008 -0.517
5 Bulgaria Feb 2008 1.15
6 Bulgaria Mar 2008 -0.635
7 Czechia Jan 2008 -0.787
8 Czechia Feb 2008 -0.338
9 Czechia Mar 2008 1.13
現在,您可以將其轉回原始形式,但沒有多大意義,因為長形式的分析會容易得多:
scaled %>% pivot_wider(names_from=name, values_from=value)
# A tibble: 3 × 4
Destination `Jan 2008` `Feb 2008` `Mar 2008`
<chr> <dbl> <dbl> <dbl>
1 Belgium 1.09 -0.205 -0.881
2 Bulgaria -0.517 1.15 -0.635
3 Czechia -0.787 -0.338 1.13
uj5u.com熱心網友回復:
我以 mtcars 資料集為例:
library(tidyverse)
mtcars %>% #the dataset
select(disp) %>% #disp is the row that we want to normalize just as an exemple
mutate(disp2=(disp-mean(disp))/sd(disp)) #disp2 is the name of the now normalized row
uj5u.com熱心網友回復:
一個dplyr
解決方案,重新使用@Migwell 玩具示例(請在您的問題中提供一個可重現的示例):
library(dplyr)
df = data.table(
Destination = c("Belgium", "Bulgaria", "Czechia"),
`Jan 2008` = sample(1:1000, size=3),
`Feb 2008` = sample(1:1000, size=3),
`Mar 2008` = sample(1:1000, size=3))
> df
Destination Jan 2008 Feb 2008 Mar 2008
1: Belgium 443 114 628
2: Bulgaria 755 801 493
3: Czechia 123 512 517
您可以使用:
df2 <- df %>% select(`Jan 2008`:`Mar 2008`) %>% mutate(normJan2008=(`Jan 2008`-rowMeans(.,na.rm=T))/apply(.,1,sd))
> df2
Jan 2008 Feb 2008 Mar 2008 normJan2008
1: 443 114 628 0.1843742
2: 755 801 493 0.4333577
3: 123 512 517 -1.1546299
并對需要標準化的每個變數執行此操作。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/354150.html