我有 1000 多個可能具有相同結構的 XML 檔案。我想使用所有檔案中的資料創建一個資料庫。在昨天之前,我從不知道 XML 檔案的外觀。在 Google 的幫助下,我嘗試使用 r-packages 在 RStudio 中加載單個 XML 檔案。但是當我嘗試將其轉換為資料框時,發生了錯誤。
這是檔案的樣子:檔案 A
<?xml-stylesheet type='text/xsl' href='anzctrTransform.xsl'?>
<ANZCTR_Trial requestNumber="42">
<stage>Registered</stage>
<submitdate>19/07/2005</submitdate>
<approvaldate>19/07/2005</approvaldate>
<dateLastUpdated>14/12/2010</dateLastUpdated>
<actrnumber>ACTRN12605000026628</actrnumber>
<trial_identification>
<studytitle>Phase II study of fixed dose rate Gemcitabine-Oxaliplatin Integrated with concomitant 5FU and 3-D Conformal Radiotherapy for the treatment of localised pancreatic cancer: GOFURTGO</studytitle>
<scientifictitle>Phase II study of fixed dose rate Gemcitabine-Oxaliplatin Integrated with concomitant 5FU and 3-D Conformal Radiotherapy for the treatment of localised pancreatic cancer: GOFURTGO</scientifictitle>
<utrn />
<trialacronym>GOFURTGO</trialacronym>
<secondaryid>GOFURTGO</secondaryid>
</trial_identification>
<conditions>
<healthcondition>Locally advanced or locally recurrent inoperable pancreatic cancer not previously treated with chemotherapy or radiotherapy.</healthcondition>
<conditioncode>
<conditioncode1>Cancer</conditioncode1>
<conditioncode2>Pancreatic</conditioncode2>
</conditioncode>
</conditions>
<interventions>
<interventions>All patients enrolled in the study will receive the same treatment consisting of all of the following:
a) 1 cycle of chemotherapy: the cycle is 28 days (gemcitabine on days 1 and 15 and oxaliplatin on days 2 and 16, followed by:
b)radiotherpay plus continuous 5FU infusion: 5FU is given continuously (7 days a week for 6 weeks), radiotherpay is given 5 days a week (Mon-Fri) for 6 weeks followed by:
c) 3 cycles of chemotherapy: each cycle is 28 days (gemcitabine on days 1 and 15 and oxaliplatin on days 2 and 16</interventions>
<comparator>This is a single group trial</comparator>
<control>Uncontrolled</control>
<interventioncode>Treatment: Other</interventioncode>
</interventions>
<outcomes>
<primaryOutcome>
<outcome>The primary objective is to determine the proportions of patients starting and finishing greater than or equal to 80% of the planned dose on time for each component of the treatment.</outcome>
<timepoint>The outcome will be measured once all patients have enrolled and have completeed the study treatment.</timepoint>
</primaryOutcome>
<secondaryOutcome>
<outcome>Adverse events</outcome>
<timepoint>Assessed at the end of ecah treatment cycle, and at end of treatment.</timepoint>
</secondaryOutcome>
<secondaryOutcome>
<outcome>Objective tumour response rates</outcome>
<timepoint>Before and after radiotherapy, at the end of treatment, and then as clinically indicated.</timepoint>
</secondaryOutcome>
<secondaryOutcome>
<outcome>Time to progression</outcome>
<timepoint>Before and after radiotherapy, at the end of treatment, and then as clinically indicated.</timepoint>
</secondaryOutcome>
<secondaryOutcome>
<outcome>CA 19-9 response rates</outcome>
<timepoint>Before and after radiotherapy, at the end of treatment, and then 2 monthly during follow up.</timepoint>
</secondaryOutcome>
<secondaryOutcome>
<outcome>Health-related quality of life.</outcome>
<timepoint>Before and after radiotherapy, at the end of treatment, and then 2 monthly until progression/disease recurrence.</timepoint>
</secondaryOutcome>
</outcomes>
<eligibility>
<inclusivecriteria>Patient must have histologically/cytologically proven adenocarcinoma of the pancreas located in the head or the body of the pancreas (primary) or in the pancreatic bed (locally recurrent).Locoregional disease must be confirmed by dual phase CT (arterial and portal phases) without distant metastases (confirmed by CT of the chest, abdomen and pelvis).Patients must be assessed by a surgeon and considered inoperable.Performance status must be ECOG grade 0, 1 or 2.</inclusivecriteria>
<inclusiveminage>0</inclusiveminage>
<inclusiveminagetype>Not stated</inclusiveminagetype>
<inclusivemaxage>0</inclusivemaxage>
<inclusivemaxagetype>Not stated</inclusivemaxagetype>
<inclusivegender>Both males and females</inclusivegender>
<healthyvolunteer>No</healthyvolunteer>
<exclusivecriteria>1.Histological types other than pancreatic ductal adenocarcinoma
2. Metastatic disease.
3. Tumours of the tail of pancreas
4. Major co-morbid illnesses that, in the opinion of the investigator, would jeopardise the likely completion of the treatment program
5. Patients with peripheral sensory neuropathy with functional impairment.
6. Derangement of LFTs consistent with hepatic cellular dysfunction (ALT and/or AST >3 times upper limit of normal), or a bilirubin >3 times upper limit of normal. Patients with LFTs consistent with hepatic obstruction that is relieved (eg. by stenting, bypass) are eligible, provided the bilirubin has fallen to <3 times upper limit of normal.
7. Patients with significant loss of bodyweight, who, at the investigator’s discretion, is deemed not suitable for this study (eg.>15% weight loss since surgery or diagnosis)
8. Treatment with a drug within the last 30 days that has not received regulatory approval at the time of study entry.
9. Treatment with any previous cytotoxic chemotherapy for this malignancy. Previous hormonal manipulation (including HRT) is allowed.
10. Previous abdominal radiotherapy
11. A previous history of malignancy other than non-melanomatous skin cancers, in –situ carcinoma, or patients who are disease–free from non-pancreatic tumours treated definitively more than 5 years ago.
12. Pregnant or lactating women, or women of childbearing potential not using adequate contraception.</exclusivecriteria>
</eligibility>
<trial_design>
<studytype>Interventional</studytype>
<purpose>Treatment</purpose>
<allocation>Non-randomised trial</allocation>
<concealment>Paper enrolment through the AGITG Coordinating Centre, NHMRC Clinical Trials Centre</concealment>
<sequence>n/a</sequence>
<masking>Open (masking not used)</masking>
<assignment>Single group</assignment>
<designfeatures />
<endpoint>Safety</endpoint>
<statisticalmethods />
<masking1 />
<masking2 />
<masking3 />
<masking4 />
<patientregistry />
<followup />
<followuptype />
<purposeobs />
<duration />
<selection />
<timing />
</trial_design>
<recruitment>
<phase>Phase 2</phase>
<anticipatedstartdate>13/04/2005</anticipatedstartdate>
<actualstartdate />
<anticipatedenddate />
<actualenddate />
<samplesize>45</samplesize>
<actualsamplesize />
<currentsamplesize />
<recruitmentstatus>Completed</recruitmentstatus>
<anticipatedlastvisitdate />
<actuallastvisitdate />
<dataanalysis />
<withdrawnreason />
<withdrawnreasonother />
<recruitmentcountry>Australia</recruitmentcountry>
<recruitmentstate />
</recruitment>
<sponsorship>
<primarysponsortype>Other Collaborative groups</primarysponsortype>
<primarysponsorname>AGITG</primarysponsorname>
<primarysponsoraddress>92-94 Parramatta Rd, Camperdown NSW 2050</primarysponsoraddress>
<primarysponsorcountry>Australia</primarysponsorcountry>
<fundingsource>
<fundingtype>Commercial sector/Industry</fundingtype>
<fundingname>Sanofi-Aventis</fundingname>
<fundingaddress>Sanofi-Aventis Group
Talavera Corporate Centre
Building D
12-24 Talavera Road
Macquarie Park NSW 2113</fundingaddress>
<fundingcountry>Australia</fundingcountry>
</fundingsource>
<fundingsource>
<fundingtype>Other Collaborative groups</fundingtype>
<fundingname>AGITG</fundingname>
<fundingaddress>NHMRC Clinical Trials Centre
University of Sydney
Locked Bag 77
CAMPERDOWN NSW 1450</fundingaddress>
<fundingcountry>Australia</fundingcountry>
</fundingsource>
<fundingsource>
<fundingtype>University</fundingtype>
<fundingname>CTC</fundingname>
<fundingaddress>NHMRC Clinical Trials Centre
University of Sydney
Locked Bag 77
CAMPERDOWN NSW 1450</fundingaddress>
<fundingcountry>Australia</fundingcountry>
</fundingsource>
<secondarysponsor>
<sponsortype>Other Collaborative groups</sponsortype>
<sponsorname>AGITG</sponsorname>
<sponsoraddress>NHMRC Clinical Trials Centre
University of Sydney
Locked Bag 77
CAMPERDOWN NSW 1450</sponsoraddress>
<sponsorcountry>Australia</sponsorcountry>
</secondarysponsor>
</sponsorship>
<ethicsAndSummary>
<summary />
<trialwebsite />
<publication />
<ethicsreview>Approved</ethicsreview>
<publicnotes />
<ethicscommitee>
<ethicname>University of Sydney</ethicname>
<ethicaddress>Human Research Ethics Committee
Main Quad
University of Sydney NSW 2006</ethicaddress>
<ethicapprovaldate />
<hrec>11-2004/5/7779</hrec>
<ethicsubmitdate />
<ethiccountry>Australia</ethiccountry>
</ethicscommitee>
<ethicscommitee>
<ethicname>Prince of Wales Hospital</ethicname>
<ethicaddress />
<ethicapprovaldate />
<hrec />
<ethicsubmitdate />
<ethiccountry>Australia</ethiccountry>
</ethicscommitee>
<ethicscommitee>
<ethicname>Border Medical Oncology</ethicname>
<ethicaddress />
<ethicapprovaldate />
<hrec />
<ethicsubmitdate />
<ethiccountry>Australia</ethiccountry>
</ethicscommitee>
<ethicscommitee>
<ethicname>St. George Hospital</ethicname>
<ethicaddress />
<ethicapprovaldate />
<hrec />
<ethicsubmitdate />
<ethiccountry>Australia</ethiccountry>
</ethicscommitee>
<ethicscommitee>
<ethicname>Newcastle Mater</ethicname>
<ethicaddress />
<ethicapprovaldate />
<hrec />
<ethicsubmitdate />
<ethiccountry>Australia</ethiccountry>
</ethicscommitee>
<ethicscommitee>
<ethicname>Alfred Hospital</ethicname>
<ethicaddress />
<ethicapprovaldate />
<hrec />
<ethicsubmitdate />
<ethiccountry>Australia</ethiccountry>
</ethicscommitee>
<ethicscommitee>
<ethicname>Nepean Hospital</ethicname>
<ethicaddress />
<ethicapprovaldate />
<hrec />
<ethicsubmitdate />
<ethiccountry>Australia</ethiccountry>
</ethicscommitee>
<ethicscommitee>
<ethicname>Royal Adelaide Hospital</ethicname>
<ethicaddress />
<ethicapprovaldate />
<hrec />
<ethicsubmitdate />
<ethiccountry>Australia</ethiccountry>
</ethicscommitee>
</ethicsAndSummary>
<attachment />
<contacts>
<contact>
<title />
<name>Dr David Goldstein</name>
<address>Department of Medical Oncology
Prince of Wales Hospital
High Street
Randwick NSW 2031</address>
<phone> 61 2 93822577</phone>
<fax> 61 2 93822578</fax>
<email>[email protected]</email>
<country>Australia</country>
<type>Scientific Queries</type>
</contact>
<contact>
<title />
<name>Dr David Goldstein</name>
<address>Department of Medical Oncology
Prince of Wales Hospital
High Street
Randwick NSW 2031</address>
<phone> 61 2 93822577</phone>
<fax> 61 2 93822578</fax>
<email>[email protected]</email>
<country>Australia</country>
<type>Public Queries</type>
</contact>
<contact>
<title />
<name />
<address />
<phone />
<fax />
<email />
<country />
<type>Principal Investigator</type>
</contact>
</contacts>
</ANZCTR_Trial>
檔案 B。
<?xml-stylesheet type='text/xsl' href='anzctrTransform.xsl'?>
<ANZCTR_Trial requestNumber="6">
<stage>Registered</stage>
<submitdate>08/07/2005</submitdate>
<approvaldate>08/07/2005</approvaldate>
<dateLastUpdated>24/06/2010</dateLastUpdated>
<actrnumber>ACTRN12605000003673</actrnumber>
<trial_identification>
<studytitle>Bisphosphonate and Anastrozole trial - Bone Maintenance Algorithm Assessment</studytitle>
<scientifictitle>Maintaining skeletal health in postmenopausal women with surgically resected Stage I-IIIa hormone-receptor positive breast cancer who are receiving anastrozole, through the use of alendronate as determined by the Osteoporosis Australia Bone Maintenance Algorithm</scientifictitle>
<utrn />
<trialacronym>BATMAN</trialacronym>
<secondaryid>Andrew Love Cancer Centre: ALCC 04.02</secondaryid>
</trial_identification>
<conditions>
<healthcondition>Breast Cancer</healthcondition>
<conditioncode>
<conditioncode1>Cancer</conditioncode1>
<conditioncode2>Breast</conditioncode2>
</conditioncode>
</conditions>
<interventions>
<interventions>This trial aims to assess the utility, through DEXA scans and biochemical markers of bone turnover, of a strategy of monitoring and intervention with oral alendronate in postmenopausal women with hormone-receptor positive breast cancer receiving five years of adjuvant anastrozole. It specifically addressed the issues of osteopaenic and osteoporotic women in this setting and will test three years versus five years of alendronate use.</interventions>
<comparator>Five years of treatment with 70mg oral alendronate once weekly</comparator>
<control>Active</control>
<interventioncode>Treatment: Drugs</interventioncode>
</interventions>
<outcomes>
<primaryOutcome>
<outcome>Changes in lumbar vertebra and femoral neck BMD T-score after 5 years of anastrozole treatment</outcome>
<timepoint>After 5 years of anastrozole treatment</timepoint>
</primaryOutcome>
<secondaryOutcome>
<outcome>Percent change in the lumbar vertebrae</outcome>
<timepoint>Annually for 5 years</timepoint>
</secondaryOutcome>
<secondaryOutcome>
<outcome>Biochemical markers</outcome>
<timepoint>6 months after commencing alendronate</timepoint>
</secondaryOutcome>
<secondaryOutcome>
<outcome>Evaluate the Osteoporosis Australia strategy for bone protection for this patient group.</outcome>
<timepoint>At 5 years</timepoint>
</secondaryOutcome>
</outcomes>
<eligibility>
<inclusivecriteria>Postmenopausal women- Adequately diagnosed and treated Stage I-IIIa early breast cancer- Oestrogen receptor and/or progesterone receptor positive breast cancer- Anastrozole is clinically indicated to be the best adjuvant strategy</inclusivecriteria>
<inclusiveminage>18</inclusiveminage>
<inclusiveminagetype>Years</inclusiveminagetype>
<inclusivemaxage>0</inclusivemaxage>
<inclusivemaxagetype>Not stated</inclusivemaxagetype>
<inclusivegender>Females</inclusivegender>
<healthyvolunteer>No</healthyvolunteer>
<exclusivecriteria>Clinical or radiological evidence of distant spread- prior treatment with bisphosphonates within the past 12 months</exclusivecriteria>
</eligibility>
<trial_design>
<studytype>Interventional</studytype>
<purpose>Prevention</purpose>
<allocation>Randomised controlled trial</allocation>
<concealment>central randomisation via fax and phone</concealment>
<sequence>Computer generated stratified blocks</sequence>
<masking>Open (masking not used)</masking>
<assignment>Parallel</assignment>
<designfeatures />
<endpoint>Efficacy</endpoint>
<statisticalmethods />
<masking1 />
<masking2 />
<masking3 />
<masking4 />
<patientregistry />
<followup />
<followuptype />
<purposeobs />
<duration />
<selection />
<timing />
</trial_design>
<recruitment>
<phase>Phase 3</phase>
<anticipatedstartdate>05/07/2005</anticipatedstartdate>
<actualstartdate />
<anticipatedenddate />
<actualenddate />
<samplesize>300</samplesize>
<actualsamplesize />
<currentsamplesize />
<recruitmentstatus>Active, not recruiting</recruitmentstatus>
<anticipatedlastvisitdate />
<actuallastvisitdate />
<dataanalysis />
<withdrawnreason />
<withdrawnreasonother />
<recruitmentcountry>Australia</recruitmentcountry>
<recruitmentstate />
</recruitment>
<sponsorship>
<primarysponsortype>Hospital</primarysponsortype>
<primarysponsorname>Barwon Health</primarysponsorname>
<primarysponsoraddress>272-322 Ryrie Street, Geelong, Victoria 3220</primarysponsoraddress>
<primarysponsorcountry>Australia</primarysponsorcountry>
<fundingsource>
<fundingtype>Commercial sector/Industry</fundingtype>
<fundingname>Astra Zeneca</fundingname>
<fundingaddress>P.O Box 131, North Ryde PBC NSW 1670</fundingaddress>
<fundingcountry>Australia</fundingcountry>
</fundingsource>
<secondarysponsor>
<sponsortype>None</sponsortype>
<sponsorname>Nil</sponsorname>
<sponsoraddress>Nil</sponsoraddress>
<sponsorcountry />
</secondarysponsor>
</sponsorship>
<ethicsAndSummary>
<summary />
<trialwebsite />
<publication />
<ethicsreview>Approved</ethicsreview>
<publicnotes />
<ethicscommitee>
<ethicname>Barwon Health</ethicname>
<ethicaddress />
<ethicapprovaldate />
<hrec />
<ethicsubmitdate />
<ethiccountry>Australia</ethiccountry>
</ethicscommitee>
</ethicsAndSummary>
<attachment />
<contacts>
<contact>
<title />
<name>Associate Professor Richard Bell</name>
<address>Andrew Love Cancer Centre
The Geelong Hospital
70 Swanston Street
Geelong VIC 3220</address>
<phone> 61 3 52267855</phone>
<fax> 61 3 52465168</fax>
<email>[email protected]</email>
<country>Australia</country>
<type>Scientific Queries</type>
</contact>
<contact>
<title />
<name>Ms Elaine Yeow</name>
<address>Andrew Love Cancer Centre
The Geelong Hospital
70 Swanston Street
Geelong VIC 3220</address>
<phone> 61 3 52267858</phone>
<fax> 61 3 52465168</fax>
<email>[email protected]</email>
<country>Australia</country>
<type>Public Queries</type>
</contact>
<contact>
<title />
<name />
<address />
<phone />
<fax />
<email />
<country />
<type>Principal Investigator</type>
</contact>
</contacts>
</ANZCTR_Trial>
以下是我的代碼。
library(XML)
library(xml2)
x = read_xml("ACTRN12605000026628.xml")
print(x)
審判 1。
x_df = as.data.frame(x)
Error in as.data.frame.default(x) :
cannot coerce class ‘c("xml_document", "xml_node")’ to a data.frame
審判2。
xmlToList(x)
Error in UseMethod("xmlSApply") :
no applicable method for 'xmlSApply' applied to an object of class "c('xml_document', 'xml_node')"
審判 3。
xmlToDataFrame(x)
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘xmlToDataFrame’ for signature ‘"xml_document", "missing", "missing", "missing", "missing"’
我需要有關為什么會發生該錯誤以及如何將多個檔案的資料轉換為 r 中的資料框或表格的幫助。
uj5u.com熱心網友回復:
您不能直接將XML
檔案轉換為dataframe
. 您需要獲取這些標簽中的標簽和資料,然后創建dataframe
.
這是可以解決問題的代碼:
library(XML)
library(xml2)
df <- read_xml("1.xml")
records <- xml_find_all(df, "//ANZCTR_Trial")
records
nodenames <- xml_name(xml_children(records))
nodevalues <- trimws(xml_text(xml_children(records)))
df <- as.data.frame(t(nodevalues))
colnames(df) <- nodenames
write.csv(x = df, file = 'trialData.csv')
records
包含父 ta 中的所有標簽和資料。在您的情況下,它ANZCTR_Trial
在您在問題中共享的兩個檔案中。
nodenames
是標簽的名稱,即父標簽。而nodevalues
包含資料。
要從grandchildren
標簽內的標簽中獲取資料(例如電話、聯系人內的傳真),您需要進一步更新代碼,如下所示:
records <- xml_find_all(df, "//contacts") ### You just keep changing it according to your need
records
一切都保持不變。
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/401040.html
上一篇:如何將XML決議為字典串列?