R：如何在匯總資料框中找到排名第二的值-有解無憂

各種 R 函式使得使用 group_by 和 summarize 從分組變數中提取值變得容易。因此，在生成的資料框中，我可以使用 group_by 和 summarize 來創建，例如，一個新列，其中包含每個組中變數的最大值或最小值。意思是，有了這些資料：

name, value
foo, 100
foo, 200
foo, 300
bar, 400
bar, 500
bar, 600

我可以輕松獲得每個名稱值的最大值或最小值：

group_by(name) %>% summarize(maxValue = max(value)

但是假設我想要每個名字的第二個值？意思是假設我希望我的結果是

name maxValue midValue
foo 300 200
bar 600 500

換句話說，我該如何填寫這個空白：

df %>% group_by(name) %>% 
summarize(maxValue = max(value), 
  secondValue = _________)

謝謝，來自 r 新手的任何幫助！

uj5u.com熱心網友回復：

library(dplyr)

df %>% 
  group_by(name) %>% 
  arrange(value) %>% 
  summarise(maxValue = max(value), 
            midValue = value[2])

結果：

# A tibble: 2 × 3
  name  maxValue midValue
  <chr>    <int>    <int>
1 Bar        600      500
2 Foo        300      200

uj5u.com熱心網友回復：

這應該做：

df %>% group_by(name) %>% arrange(desc(value)) %>% slice(2)

代碼：

a = 'name value
Foo 100
Foo 200
Foo 300
Bar 400
Bar 500
Bar 600'


df = read.table(text = a, header = T)
df %>% group_by(name) %>% arrange(desc(value)) %>% slice(2)

輸出：

# A tibble: 2 × 2
# Groups:   name [2]
  name  value
  <fct> <int>
1 Bar     500
2 Foo     200

uj5u.com熱心網友回復：

library(dplyr)

df <- data.frame(
  name = c("Foo", "Foo", "Foo", "Bar", "Bar", "Bar"),
  value = c(100, 200, 300, 400, 500, 600)
)

df %>% 
  group_by(name) %>% 
  summarize(secondValue = sort(value, decreasing = TRUE)[2])

uj5u.com熱心網友回復：

我們可以在每組中取出 2 個最大值，并選擇第二大的。

請注意，下面的代碼將處理關系（例如，如果組中的兩行bar具有值600，它將查找另一個值*）。這在您的示例中并不重要，但對于其他資料可能很重要。

df %>% 
  group_by(name) %>% 
  summarise(maxvalue = sortN(value, 1),
            midvalue = sortN(value, 2)[2])

# A tibble: 2 × 3
  name  maxvalue midvalue
  <chr>    <int>    <int>
1 Bar        600      500
2 Foo        300      200

該函式sortN(x, n, type = "max")定義如下。它提取的n最大/最小值x。這在很大程度上基于我評論中的帖子。沒有必要為這個問題定義一個完整的函式（如其他答案所示），但我發現這個函式對一系列問題很有用，所以有它很好。

sortN <- function(x, n, type = "max") {

  # GR 1 - If type is not "max" or "min", error
  if (! type %in% c("max", "min")) {
    stop("type must be max or min.")
  }

  # GR 2 - If n >= length(unique(x)), return whole vector
  if (n >= length(unique(x))){
    return(unique(x))
  }

  # Change based on whether the user wants min or max
  type <- switch(type, min = FALSE, max = TRUE)

  if (type) {
  x <- unique(x)
  partial <- length(x) - n   1
  out <- x[x >= sort(x, partial = partial)[partial]]
  sort(out, decreasing = TRUE)
  } else {
  out <- -sortN(x = -x, n = n, type = "max")
  sort(out, decreasing = FALSE)
  }
}

*盡管如此，如果組中沒有至少兩個不同的值，代碼將出錯。由您決定這是否重要。在任何情況下，它都可以通過圍繞現有代碼的一個小 ifelse 陳述句輕松解決。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/534818.html

標籤：r

上一篇：回歸時用“-”排除一些變數

下一篇：approx和map2的組合出奇地慢