所以我有一個問題,我使用 pivot-long 函式將列分成沒有行,但輸出是錯誤的。我試過這段代碼:
AT_ID <- c(1,2,3,4)
DEPARTURE_AIRPORT <- c("ZRH","ZRH","ZRH", "ZRH")
STOPOVER_1 <- c(NA, "BEL", "DUB", NA)
STOPOVER_2 <- c(NA, "RUO", NA, "IAE")
ARRIVAL_AIRPORT <- c("IAD", "LAX","BUD", "NOZ")
test_df <- data.frame(AT_ID, DEPARTURE_AIRPORT, STOPOVER_1, STOPOVER_2, ARRIVAL_AIRPORT)
print(test_df)
test_df$intinerary_id <- FALSE
split_rows <- function(file){
#Split rows for intinerary_id
file %>%
pivot_longer(
cols =c(DEPARTURE_AIRPORT, STOPOVER_1, STOPOVER_2),
names_to = "name",
values_to = "DEPARTURE_AIRPORT"
) %>%
filter(!is.na(DEPARTURE_AIRPORT)) %>%
mutate(intinerary_id = AT_ID,
AT_ID = row_number()) %>% relocate(DEPARTURE_AIRPORT, .before = ARRIVAL_AIRPORT)
}
test_df_preprocessed <- split_rows(test_df)
問題是,例如對于 AT_ID 2,出發機場和到達機場應該是 ZRH-BEL,然后從 BEL 到 RUO,從 RUO 到 LAX。相反,到達機場始終是原始到達機場,而不是正確的中轉站,因此代碼從 A - B、B - C、C -D 改為創建 A -D、B -D 和 C - D。我希望我的解釋是合乎邏輯的,是有道理的。感謝您的幫助!
uj5u.com熱心網友回復:
使用lag
,你可以lead
這樣coalesce
做:
library(tidyr)
library(dplyr)
split_rows <- function(file) {
test_df %>%
pivot_longer(
cols = c(DEPARTURE_AIRPORT, STOPOVER_1, STOPOVER_2),
names_to = "name",
values_to = "DEPARTURE_AIRPORT"
) %>%
filter(!is.na(DEPARTURE_AIRPORT)) %>%
group_by(AT_ID) %>%
mutate(
DEPARTURE_AIRPORT = coalesce(DEPARTURE_AIRPORT, lag(DEPARTURE_AIRPORT)),
ARRIVAL_AIRPORT = coalesce(lead(DEPARTURE_AIRPORT), ARRIVAL_AIRPORT),
) %>%
ungroup() %>%
mutate(
intinerary_id = AT_ID,
AT_ID = row_number()
) %>%
relocate(DEPARTURE_AIRPORT, .before = ARRIVAL_AIRPORT)
}
split_rows(test_df)
#> # A tibble: 8 × 5
#> AT_ID DEPARTURE_AIRPORT ARRIVAL_AIRPORT intinerary_id name
#> <int> <chr> <chr> <dbl> <chr>
#> 1 1 ZRH IAD 1 DEPARTURE_AIRPORT
#> 2 2 ZRH BEL 2 DEPARTURE_AIRPORT
#> 3 3 BEL RUO 2 STOPOVER_1
#> 4 4 RUO LAX 2 STOPOVER_2
#> 5 5 ZRH DUB 3 DEPARTURE_AIRPORT
#> 6 6 DUB BUD 3 STOPOVER_1
#> 7 7 ZRH IAE 4 DEPARTURE_AIRPORT
#> 8 8 IAE NOZ 4 STOPOVER_2
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/506830.html
上一篇:Pandas:對兩個資料幀進行最少兩次操作,同時保留索引
下一篇:如何在函式內更改全域變數的值