Easy Test Analysis
The easy test focuses on basic XML parsing using the xml2 package. It involves extracting specific information from a simple XML document. The code snippet below demonstrates how to load the xml2 package and parse a simple XML document to extract the director name for the second movie.
Setting Up the Environment
Section 1: Loading Libraries and XML String
xml_string <- c( '<?xml version="1.0" encoding="UTF-8"?>',
'<movie mins="126" lang="eng">',
'<title>Good Will Hunting</title>',
'<last_name>Van Sant</last_name>',
'<movie mins="106" lang="spa">',
'<title>Y tu mama tambien</title>',
The xml2 library is loaded to handle XML data in R.
The stringr library is loaded for string manipulation, though it’s not used in this snippet.
An XML string representing a list of movies is defined, including details like title, director, year, and genre.
Section 2: Parsing the XML Document
doc <- read_xml(paste(xml_string, collapse = ''))
## {xml_document}
## <movies>
## [1] <movie mins="126" lang="eng">\n <title>Good Will Hunting</title>\n <dir ...
## [2] <movie mins="106" lang="spa">\n <title>Y tu mama tambien</title>\n <dir ...
The read_xml function from the xml2 package is used to parse the XML string into an XML document object.
The paste function with collapse = ’’ is used to concatenate the XML string into a single string before parsing.
The parsed XML document is stored in the variable doc.
Section 3: Navigating the XML Document
tu_mama <- xml_child(doc, search = 2)
## {xml_node}
## <movie mins="106" lang="spa">
## [1] <title>Y tu mama tambien</title>
## [2] <director>\n <first_name>Alfonso</first_name>\n <last_name>Cuaron</last ...
## [3] <year>2001</year>
## [4] <genre>drama</genre>
## {xml_nodeset (4)}
## [1] <title>Y tu mama tambien</title>
## [2] <director>\n <first_name>Alfonso</first_name>\n <last_name>Cuaron</last ...
## [3] <year>2001</year>
## [4] <genre>drama</genre>
The xml_children function lists all child nodes of the XML document.
The xml_child function is used to select a specific child node by its index, in this case, the second movie.
Section 4: Extracting director Information
director <- xml_child(tu_mama,"director")
## {xml_node}
## <director>
## [1] <first_name>Alfonso</first_name>
## [2] <last_name>Cuaron</last_name>
## {xml_nodeset (2)}
## [1] <first_name>Alfonso</first_name>
## [2] <last_name>Cuaron</last_name>
## [1] "AlfonsoCuaron"
The xml_child function is used again to select the “director” child node of the selected movie.
The xml_contents function lists all nodes within the “director” node.
The xml_text function extracts the text content of the “director” node, providing the director’s name.