Skip to the content.

Easy Test Analysis

Introduction

The easy test focuses on basic XML parsing using the xml2 package. It involves extracting specific information from a simple XML document. The code snippet below demonstrates how to load the xml2 package and parse a simple XML document to extract the director name for the second movie.

Setting Up the Environment

Section 1: Loading Libraries and XML String

library(xml2)
library(stringr)

xml_string <- c( '<?xml version="1.0" encoding="UTF-8"?>',
  '<movies>',
  '<movie mins="126" lang="eng">',
  '<title>Good Will Hunting</title>',
  '<director>',
  '<first_name>Gus</first_name>',
  '<last_name>Van Sant</last_name>',
  '</director>',
  '<year>1998</year>',
  '<genre>drama</genre>',
  '</movie>',
  '<movie mins="106" lang="spa">',
  '<title>Y tu mama tambien</title>',
  '<director>',
  '<first_name>Alfonso</first_name>',
  '<last_name>Cuaron</last_name>',
  '</director>',
  '<year>2001</year>',
  '<genre>drama</genre>',
  '</movie>',
  '</movies>')

Explanation:

Section 2: Parsing the XML Document

doc <- read_xml(paste(xml_string, collapse = ''))
doc
## {xml_document}
## <movies>
## [1] <movie mins="126" lang="eng">\n  <title>Good Will Hunting</title>\n  <dir ...
## [2] <movie mins="106" lang="spa">\n  <title>Y tu mama tambien</title>\n  <dir ...

Explanation:

Section 3: Navigating the XML Document

tu_mama <- xml_child(doc, search = 2)
tu_mama
## {xml_node}
## <movie mins="106" lang="spa">
## [1] <title>Y tu mama tambien</title>
## [2] <director>\n  <first_name>Alfonso</first_name>\n  <last_name>Cuaron</last ...
## [3] <year>2001</year>
## [4] <genre>drama</genre>
xml_children(tu_mama)
## {xml_nodeset (4)}
## [1] <title>Y tu mama tambien</title>
## [2] <director>\n  <first_name>Alfonso</first_name>\n  <last_name>Cuaron</last ...
## [3] <year>2001</year>
## [4] <genre>drama</genre>

Explanation

Section 4: Extracting director Information

director <- xml_child(tu_mama,"director")
director
## {xml_node}
## <director>
## [1] <first_name>Alfonso</first_name>
## [2] <last_name>Cuaron</last_name>
xml_contents(director)
## {xml_nodeset (2)}
## [1] <first_name>Alfonso</first_name>
## [2] <last_name>Cuaron</last_name>
xml_text(director)
## [1] "AlfonsoCuaron"

Explanation