我有一張期刊出版物表,我想提取第一、第二和最后一位作者。
不幸的是,作者的數量差異很大,有的只有一個,有的多達 35 位。
如果一個出版物有一個作者,我希望只有一個第一作者。如果有兩個作者,我希望得到一個第一作者和最后一個作者。如果有三位作者,我希望有第一位、倒數第二位和最后一位作者,依此類推。
這是原始資料集:
pub1 <- structure(list(publication = c("pub1", "pub2", "pub3", "pub4",
"pub5", "pub6"), authors = c("author1", "author1, author2", "author1, author2, author3",
"author1, author2, author3, author4", "author1, author2, author3, author4, author5",
"author1, author2, author3, author4, author5, author6")),
class = "data.frame", row.names = c(NA, -6L))
這是預期的輸出:
pub2 <- structure(list(publication = c("pub1", "pub2", "pub3", "pub4",
"pub5", "pub6"), authors = c("author1", "author1, author2", "author1, author2, author3",
"author1, author2, author3, author4", "author1, author2, author3, author4, author5",
"author1, author2, author3, author4, author5, author6"),
author_first = c("author1", "author1", "author1", "author1", "author1", "author1"),
author_second_last = c("", ""," author2", " author3", " author4", " author5"),
author_last = c("", " author2", " author3", " author4", " author5", " author6")),
class = "data.frame", row.names = c(NA, -6L))
我不知道該怎么做。
uj5u.com熱心網友回復:
這是一個關于如何使用dplyr
和stringr
library(dplyr)
library(stringr)
author_position = function(str, p, position) {
stopifnot(is.numeric(position))
# split the string up into a vector of pieces using a pattern (in this case `,`)
# and trim the white space
s = str_trim(str_split(str, p, simplify = TRUE))
len = length(s)
# Return NA if the author position chosen is greater than or equal to the length of the new vector
# Caveat: If the position is 1, then return the value at the first position
if(abs(position) >= len) {
if(position == 1) {
first(s)
} else {
NA
}
# Return the the value at the selected position
} else {
nth(s, position)
}
}
pub1 %>%
rowwise() %>% # group by row
mutate(author_first = author_position(authors,",",1),
author_second_last = author_position(authors,",",-2),
author_last = author_position(authors,",",-1))
# # A tibble: 6 × 5
# # Rowwise:
# publication authors author_first author_second_last author_last
# <chr> <chr> <chr> <chr> <chr>
# 1 pub1 author1 author1 NA NA
# 2 pub2 author1, author2 author1 NA author2
# 3 pub3 author1, author2, author3 author1 author2 author3
# 4 pub4 author1, author2, author3, author4 author1 author3 author4
# 5 pub5 author1, author2, author3, author4, author5 author1 author4 author5
# 6 pub6 author1, author2, author3, author4, author5, author6 author1 author5 author6
編輯:允許回傳任何作者位置和添加評論的能力。
這里唯一的限制是第一作者/最后作者是固定的。因此,如果您想回傳倒數第三位作者,而該出版物只有 3 位作者,它將回傳 NA,因為從技術上講,這被認為是第一位。回傳第 3 位作者也是如此,因為如果只有 3 位作者,那將被視為最后一位作者。
pub1 %>%
rowwise() %>% # group by row
mutate(author_third = author_position(authors,",",3),
author_third_last = author_position(authors, ",", -3))
# # A tibble: 6 × 4
# # Rowwise:
# publication authors author_third author_third_last
# <chr> <chr> <chr> <chr>
# 1 pub1 author1 NA NA
# 2 pub2 author1, author2 NA NA
# 3 pub3 author1, author2, author3 NA NA
# 4 pub4 author1, author2, author3, author4 author3 author2
# 5 pub5 author1, author2, author3, author4, author5 author3 author3
# 6 pub6 author1, author2, author3, author4, author5, author6 author3 author4
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/535412.html
標籤:r