我想根據在同一個洞穴中發現的連續月份對動物進行分組,但如果月份不連續,也將這些組分開。
#Input Data
burrow.data <- read.csv
Animal Burrow Date
1 027 B0961 2022-03-01
2 027 B0961 2022-04-26
3 033 1920 2021-11-02
4 033 1955 2022-03-29
5 033 1955 2022-04-26
6 063 B0540 2021-04-21
7 063 B0540 2022-01-04
8 063 B0540 2022-03-01
9 101 B0021 2020-11-23
10 101 B0021 2020-12-23
11 101 B0021 2021-11-04
12 101 B0021 2022-01-06
13 101 B0021 2022-02-04
14 101 B0021 2022-03-03
#Expected Output
Animal Burrow grp Date.Start Date.End
1 033 1920 1 2021-11-02 2021-11-02
2 033 1955 1 2022-03-29 2022-04-26
3 101 B0021 1 2020-11-23 2020-12-23
4 101 B0021 2 2022-01-06 2020-03-03
5 063 B0540 1 2021-04-21 2022-03-01
6 027 B0961 1 2022-03-01 2022-04-26
我使用了另一篇文章中的代碼:在 R 中對連續日期進行分組
And wrote:
burrow.input <- burrow.data[order(burrow.data$Date),]
burrow.input$grp <- ave(as.integer(burrow.input$Date), burrow.input[-4], FUN = function(z) cumsum(c(TRUE, diff(z)>1)))
burrow.input
out <- aggregate(Date ~ Animal Burrow grp, data = burrow.input, FUN = function(z) setNames(range(z), c("Start", "End")))
out <- do.call(data.frame,out)
out[,4:5] <- lapply(out[,4:5], as.Date, origin = "1970-01-01")
out
該代碼將 101 分組為一個組,而不是按日期間隔劃分的兩組(見下文)。我怎樣才能解決這個問題?
Animal Burrow grp Date.Start Date.End
1 033 1920 1 2021-11-02 2021-11-02
2 033 1955 1 2022-03-29 2022-04-26
3 101 B0021 1 2020-11-23 2022-03-03
4 063 B0540 1 2021-04-21 2022-03-01
5 027 B0961 1 2022-03-01 2022-04-26
uj5u.com熱心網友回復:
通過 Animal、Burrow 和一個分組變數對資料進行分組,該分組變數在每次日期跳躍超過 1 個月時都會發生變化。這里 as.yearmon 將日期轉換為 yearmon 物件,該物件內部是一年加上 0 表示一月,1/12 表示二月,...,11/12 表示十二月,因此將其乘以 12 并檢查它與先驗值大于1。取其累積和生成分組變數。最后總結一下,排序并洗掉添加的分組變數。
library(dplyr)
library(zoo)
burrow.data %>%
group_by(Animal, Burrow,
diff = cumsum( c(1, diff(12 * as.yearmon(Date)) > 1) ) ) %>%
summarize(Date.start = first(Date), Date.end = last(Date), .groups = "drop") %>%
arrange(Burrow) %>%
select(-diff)
給予:
# A tibble: 7 × 4
Animal Burrow Date.start Date.end
<int> <chr> <chr> <chr>
1 33 1920 2021-11-02 2021-11-02
2 33 1955 2022-03-29 2022-04-26
3 101 B0021 2020-11-23 2021-11-04
4 101 B0021 2022-01-06 2022-03-03
5 63 B0540 2021-04-21 2022-01-04
6 63 B0540 2022-03-01 2022-03-01
7 27 B0961 2022-03-01 2022-04-26
筆記
可重現形式的輸入資料是:
burrow.data <-
structure(list(Animal = c(27L, 27L, 33L, 33L, 33L, 63L, 63L,
63L, 101L, 101L, 101L, 101L, 101L, 101L), Burrow = c("B0961",
"B0961", "1920", "1955", "1955", "B0540", "B0540", "B0540", "B0021",
"B0021", "B0021", "B0021", "B0021", "B0021"), Date = c("2022-03-01",
"2022-04-26", "2021-11-02", "2022-03-29", "2022-04-26", "2021-04-21",
"2022-01-04", "2022-03-01", "2020-11-23", "2020-12-23", "2021-11-04",
"2022-01-06", "2022-02-04", "2022-03-03")), class = "data.frame",
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10",
"11", "12", "13", "14"))
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/492076.html
下一篇:R中最近的4個日期