我正在使用醫療索賠資料來創建住院事件。資料集中沒有“劇集識別符號”列。我的目的是為每一集創建一個唯一識別符號,以將索賠與索賠聯系起來,在我能夠正確識別每一集的正確入院和出院日期后,我可以處理該識別符號。為簡單起見,這里有一張假資料表,反映了我正在努力解決的情況:
病人編號 | 錄取日期 | 出院日期 |
---|---|---|
810 | 2020-12-15 | 2020-12-16 |
810 | 2021-06-17 | 2021-06-19 |
810 | 2021-06-19 | 2021-06-27 |
810 | 2021-06-27 | 2021-07-03 |
使用此示例資料框,第一行顯示了一個簡單的住院事件。第二到第四行的入院日期和出院日期是捆綁在一起的。這是由于患者切換醫院部門,啟動了新的 REV 代碼。
我最初使用的 ifelse 陳述句本身就失敗了。我使用它時沒有考慮像這樣的情況,在這種情況下,需要將更多的兩行組合為一個情節。
有沒有人對要使用的包/資源有任何建議,以便將第 2-4 行變成單行說
病人編號 | 錄取日期 | 出院日期 |
---|---|---|
810 | 2020-12-15 | 2020-12-16 |
810 | 2021-06-17 | 2021-07-03 |
謝謝!讓我知道是否需要任何進一步的解釋。
uj5u.com熱心網友回復:
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
tribble(
~patient, ~admitted, ~discharge,
810, "2020-12-15", "2020-12-16",
810, "2021-06-17", "2021-06-19",
810, "2021-06-19", "2021-06-27",
810, "2021-06-27", "2021-07-03"
) |>
mutate(across(-patient, ymd),
group = if_else(discharge == lead(admitted) | admitted == lag(discharge), 1, 0)) |>
group_by(patient, group) |>
summarise(admitted = first(admitted), discharge = last(discharge)) |>
arrange(admitted)
#> `summarise()` has grouped output by 'patient'. You can override using the
#> `.groups` argument.
#> # A tibble: 2 × 4
#> # Groups: patient [1]
#> patient group admitted discharge
#> <dbl> <dbl> <date> <date>
#> 1 810 NA 2020-12-15 2020-12-16
#> 2 810 1 2021-06-17 2021-07-03
由reprex 包于 2022-05-10 創建(v2.0.1)
uj5u.com熱心網友回復:
這是一種使用 package 的方法igraph
。它創建日期列的有向圖,獲取其連接的組件并使用這些組件來拆分日期。然后保留第一次入院日期和最后一次出院日期。
df1 <- read.table(text = "
'Patient ID' 'Admitted Date' 'Discharge Date'
810 2020-12-15 2020-12-16
810 2021-06-17 2021-06-19
810 2021-06-19 2021-06-27
810 2021-06-27 2021-07-03
", header = TRUE, check.names = FALSE)
df1[-1] <- lapply(df1[-1], as.Date)
suppressPackageStartupMessages(library(igraph))
g <- graph_from_data_frame(df1[-1])
cl <- components(g)
sp <- split(names(cl$membership), cl$membership)
new <- apply(df1[-1], 1, \(x) {
which(sapply(sp, \(y) all(x %in% y)))
})
result <- by(df1, new, \(x) {
data.frame('Patient ID' = x[1, 1, drop = TRUE],
'Admitted Date' = x[1, 2, drop = TRUE],
'Discharge date' = x[nrow(x), 3, drop = TRUE])
})
result <- do.call(rbind, result)
result
#> Patient.ID Admitted.Date Discharge.date
#> 1 810 2020-12-15 2020-12-16
#> 2 810 2021-06-17 2021-07-03
rm(sp, new) # final clean-up
由reprex 包于 2022-05-10 創建(v2.0.1)
uj5u.com熱心網友回復:
可能有一些內置的 dplyr 函式可以提供幫助,但我想我已經找到了這種“丑陋”的方法來解決這個問題:
check_if_tied_to_next_row <- function(df, row_number) {
if(df[row_number, "Discharge Date"] == df[row_number 1, "Admitted Date"] & df[row_number, "Patient ID"] == df[row_number 1, "Patient ID"]) {
return(TRUE)
} else {
return(FALSE)
}
}
merge_consecutive_rows <- function(df, row_number) {
df[row_number, "Discharge Date"] <- df[row_number 1, "Discharge Date"]
df <- remove_row(df, row_number 1)
return(df)
}
remove_row <- function(df, row_number) {
return(df[-row_number,])
}
n <- nrow(df) - 1 #Nothing to merge into the last row!
for(i in 1:n) {
while(check_if_tied_to_next_row(df, i)) {
df <- merge_consecutive_rows(df, i)
}
}
uj5u.com熱心網友回復:
這是另一個選項,使用dplyr
and lubridate
。
df <- read.table(
text = "
Patient ID Admitted Date Discharge Date
810 2020-12-15 2020-12-16
810 2021-06-17 2021-06-19
810 2021-06-19 2021-06-27
810 2021-06-27 2021-07-03
",
sep = "\t", header = TRUE)
library(dplyr)
library(lubridate)
df %>%
mutate(date_difference = ymd(Admitted.Date) - lag(ymd(Discharge.Date))) %>%
mutate(
is_start = ifelse(date_difference > 0 | is.na(date_difference), TRUE, FALSE),
is_end = ifelse(lead(date_difference) > 0 | is.na(lead(date_difference)), TRUE, FALSE)
) %>%
filter(is_start | is_end) %>%
mutate(
is_admitted = Admitted.Date,
is_discharged = case_when(
is_end ~ Discharge.Date,
!is_end ~ lead(Discharge.Date)
)
) %>%
na.omit() %>%
select(Patient.ID, Admitted.Date = is_admitted, Discharge.Date = is_discharged)
uj5u.com熱心網友回復:
這正是我創建ivs包的目的。它允許您使用像這里一樣的區間向量。您可以使用 來解決此問題iv_groups()
,它會計算每位患者的住院日期的非重疊“組”。
library(dplyr)
library(ivs)
df <- tribble(
~Patient.ID, ~Admitted.Date, ~Discharge.Date,
810L, "2020-12-15", "2020-12-16",
810L, "2021-06-17", "2021-06-19",
810L, "2021-06-19", "2021-06-27",
810L, "2021-06-27", "2021-07-03"
)
df <- mutate(df, Admitted.Date = as.Date(Admitted.Date))
df <- mutate(df, Discharge.Date = as.Date(Discharge.Date))
df
#> # A tibble: 4 × 3
#> Patient.ID Admitted.Date Discharge.Date
#> <int> <date> <date>
#> 1 810 2020-12-15 2020-12-16
#> 2 810 2021-06-17 2021-06-19
#> 3 810 2021-06-19 2021-06-27
#> 4 810 2021-06-27 2021-07-03
# Create an interval vector combining the hospital stay as:
# [Admitted.Date, Discharge.Date)
df <- df %>%
mutate(Stay = iv(Admitted.Date, Discharge.Date), .keep = "unused")
df
#> # A tibble: 4 × 2
#> Patient.ID Stay
#> <int> <iv<date>>
#> 1 810 [2020-12-15, 2020-12-16)
#> 2 810 [2021-06-17, 2021-06-19)
#> 3 810 [2021-06-19, 2021-06-27)
#> 4 810 [2021-06-27, 2021-07-03)
# Assuming you have multiple patients, we will group by `Patient.ID`.
# Then compute the non-overlapping interval "groups" per patient with `iv_groups()`
df %>%
group_by(Patient.ID) %>%
summarise(Stay = iv_groups(Stay), .groups = "drop")
#> # A tibble: 2 × 2
#> Patient.ID Stay
#> <int> <iv<date>>
#> 1 810 [2020-12-15, 2020-12-16)
#> 2 810 [2021-06-17, 2021-07-03)
# You can also see which "group" each stay fell in by using `iv_identify_group()`
df %>%
group_by(Patient.ID) %>%
mutate(Group = iv_identify_group(Stay))
#> # A tibble: 4 × 3
#> # Groups: Patient.ID [1]
#> Patient.ID Stay Group
#> <int> <iv<date>> <iv<date>>
#> 1 810 [2020-12-15, 2020-12-16) [2020-12-15, 2020-12-16)
#> 2 810 [2021-06-17, 2021-06-19) [2021-06-17, 2021-07-03)
#> 3 810 [2021-06-19, 2021-06-27) [2021-06-17, 2021-07-03)
#> 4 810 [2021-06-27, 2021-07-03) [2021-06-17, 2021-07-03)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/472590.html