XML: Tools for Parsing and Generating XML Within R and S-Plus
XML文書を取り扱うためのパッケージ
> library(XML)
> doc <- xmlTreeParse(system.file("exampleData", "mtcars.xml", package = "XML"))
バージョン: 3.98.1.4
関数名 | 概略 |
---|---|
Doctype |
Constructor for DTD reference |
Doctype-class |
Class to describe a reference to an XML DTD |
ExternalReference-class |
Classes for working with XML Schema |
SAXState-class |
A virtual base class defining methods for SAX parsing |
XMLAttributes-class |
Class '"XMLAttributes"' |
XMLCodeFile-class |
Simple classes for identifying an XML document containing R code |
XMLInternalDocument-class |
Class to represent reference to C-level data structure for an XML document |
XMLNode-class |
Classes to describe an XML node object. |
[.XMLNode |
Convenience accessors for the children of XMLNode objects. |
[<-.XMLNode |
Assign sub-nodes to an XML node |
addChildren |
Add child nodes to an XML node |
addNode |
Add a node to a tree |
append.xmlNode |
Add children to an XML node |
asXMLNode |
Converts non-XML node objects to XMLTextNode objects |
asXMLTreeNode |
Convert a regular XML node to one for use in a "flat" tree |
catalogLoad |
Manipulate XML catalog contents |
catalogResolve |
Look up an element via the XML catalog mechanism |
compareXMLDocs |
Indicate differences between two XML documents |
docName |
Accessors for name of XML document |
dtdElement |
Gets the definition of an element or entity from a DTD. |
dtdElementValidEntry |
Determines whether an XML element allows a particular type of sub-element. |
dtdIsAttribute |
Query if a name is a valid attribute of a DTD element. |
dtdValidElement |
Determines whether an XML tag is valid within another. |
ensureNamespace |
Ensure that the node has a definition for particular XML namespaces |
findXInclude |
Find the XInclude node associated with an XML node |
free |
Release the specified object and clean up its memory usage |
genericSAXHandlers |
SAX generic callback handler list |
getChildrenStrings |
Get the individual |
getEncoding |
Determines the encoding for an XML document or node |
getHTMLLinks |
Get links or names of external files in HTML document |
getLineNumber |
Determine the location - file & line number of an (internal) XML node |
getNodeSet |
Find matching nodes in an internal XML tree/DOM |
getRelativeURL |
Compute name of URL relative to a base URL |
getSibling |
Manipulate sibling XML nodes |
getXIncludes |
Find the documents that are XInclude'd in an XML document |
getXMLErrors |
Get XML/HTML document parse errors |
isXMLString |
Facilities for working with XML strings |
length.XMLNode |
Determine the number of children in an XMLNode object. |
libxmlVersion |
Query the version and available features of the libxml library. |
makeClassTemplate |
Create S4 class definition based on XML node(s) |
names.XMLNode |
Get the names of an XML nodes children. |
newXMLDoc |
Create internal XML node or document object |
newXMLNamespace |
Add a namespace definition to an XML node |
parseDTD |
Read a Document Type Definition (DTD) |
parseURI |
Parse a URI string into its elements |
parseXMLAndAdd |
Parse XML content and add it to a node |
print.XMLAttributeDef |
Methods for displaying XML objects |
processXInclude |
Perform the XInclude substitutions |
readHTMLList |
Read data in an HTML list or all lists in a document |
readHTMLTable |
Read data from one or more HTML tables |
readKeyValueDB |
Read an XML property-list style document |
readSolrDoc |
Read the data from a Solr document |
removeXMLNamespaces |
Remove namespace definitions from a XML node or document |
saveXML |
Output internal XML Tree |
setXMLNamespace |
Set the name space on a node |
startElement.SAX |
Generic Methods for SAX callbacks |
supportsExpat |
Determines which native XML parsers are being used. |
toHTML |
Create an HTML representation of the given R object, using internal C-level nodes |
toString.XMLNode |
Creates string representation of XML node |
xmlApply |
Applies a function to each of the children of an XMLNode |
xmlAttributeType |
The type of an XML attribute for element from the DTD |
xmlAttrs |
Get the list of attributes of an XML node. |
xmlChildren |
Gets the sub-nodes within an XMLNode object. |
xmlCleanNamespaces |
Remove redundant namespaces on an XML document |
xmlClone |
Create a copy of an internal XML document or node |
xmlContainsEntity |
Checks if an entity is defined within a DTD. |
xmlDOMApply |
Apply function to nodes in an XML tree/DOM. |
xmlElementSummary |
Frequency table of names of elements and attributes in XML content |
xmlElementsByTagName |
Retrieve the children of an XML node with a specific tag name |
xmlEventHandler |
Default handlers for the SAX-style event XML parser |
xmlEventParse |
XML Event/Callback element-wise Parser |
xmlFlatListTree |
Constructors for trees stored as flat list of nodes with information about parents and children. |
xmlGetAttr |
Get the value of an attribute in an XML node |
xmlHandler |
Example XML Event Parser Handler Functions |
xmlName |
Extraces the tag name of an XMLNode object. |
xmlNamespace |
Retrieve the namespace value of an XML node. |
xmlNamespaceDefinitions |
Get definitions of any namespaces defined in this XML node |
xmlNode |
Create an XML node |
xmlOutputBuffer |
XML output streams |
xmlParent |
Get parent node of XMLInternalNode or ancestor nodes |
xmlParseDoc |
Parse an XML document with options controlling the parser. |
xmlParserContextFunction |
Identifies function as expecting an xmlParserContext argument |
xmlRoot |
Get the top-level XML node. |
xmlSchemaValidate |
Validate an XML document relative to an XML schema |
xmlSearchNs |
Find a namespace definition object by searching ancestor nodes |
xmlSerializeHook |
Functions that help serialize and deserialize XML internal objects |
xmlSize |
The number of sub-elements within an XML node. |
xmlSource |
Source the R code, examples, etc. from an XML document |
xmlStopParser |
Terminate an XML parser |
xmlStructuredStop |
Condition/error handler functions for XML parsing |
xmlToDataFrame |
Extract data from a simple XML document |
xmlToList |
Convert an XML node/document to a more R-like list |
xmlToS4 |
General mechanism for mapping an XML node to an S4 object |
xmlTree |
An internal, updatable DOM object for building XML trees |
xmlTreeParse |
XML Parser |
xmlValue |
Extract or set the contents of a leaf XML node |
getNodeSet / xpathApply/ xpathSApply / matchNamespaces
> doc <- xmlParse(system.file("exampleData", "tagnames.xml", package = "XML"))
> getNodeSet(doc, "/doc//a[@status]")
[[1]]
<a status="xyz"/>
[[2]]
<a status="1"/>
attr(,"class")
[1] "XMLNodeSet"
names.XMLNode
> xmlRoot(doc) %>% names()
comment a
"comment" "a"
> xmlRoot(doc) %>% .[names(.) == "variables"]
named list()
attr(,"class")
[1] "XMLInternalNodeList" "XMLNodeList"
newXMLDoc
XMLノード・ドキュメントの作成
readHTMLList
HTMLでのリスト要素を取得する
Arguments
- doc
- trim
- elFun
- which
- ...
> readHTMLList("http://suryu.me/rpkg_showcase/dataset/index.html", which = 14)
[1] "13.1.\n \n agricolae"
[2] "13.2.\n \n biotools"
[3] "13.3.\n \n changepoint"
[4] "13.4.\n \n describer"
[5] "13.5.\n \n DescTools"
[6] "13.6.\n \n gam"
[7] "13.7.\n \n Kendall"
[8] "13.8.\n \n lmtest"
[9] "13.9.\n \n mgcv"
[10] "13.10.\n \n MuMIn"
[11] "13.11.\n \n outliers"
[12] "13.12.\n \n smatr"
[13] "13.13.\n \n statcheck"
[14] "13.14.\n \n stats"
readHTMLTable
HTMLのtable要素を読み込む
Arguments
- doc
- header
- colClasses
- skip.rows
- trim
- elFun
- as.data.frame
- which
- ...
> tables <- "https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population" %>%
+ RCurl::getURL() %>%
+ readHTMLTable()
> str(tables)
List of 4
$ NULL: NULL
$ NULL: NULL
$ NULL:'data.frame': 244 obs. of 6 variables:
..$ Rank : Factor w/ 196 levels "–","1","10","100",..: 2 109 120 131 142 153 164 175 186 3 ...
..$ Country (or dependent territory): Factor w/ 244 levels "Abkhazia[Note 19]",..: 44 95 233 96 30 160 152 18 175 134 ...
..$ Population : Factor w/ 244 levels "1,132,657","1,167,242",..: 8 6 129 102 92 76 72 61 53 48 ...
..$ Date : Factor w/ 59 levels "April 1, 2016",..: 54 54 54 26 54 54 26 54 4 26 ...
..$ % of world
population : Factor w/ 181 levels "0.00000075%",..: 175 174 181 180 179 178 177 176 173 172 ...
..$ Source : Factor w/ 28 levels "2008 census result",..: 13 28 13 14 13 13 26 13 12 14 ...
$ NULL:'data.frame': 24 obs. of 2 variables:
..$ V1: Factor w/ 13 levels "","Cities","Continental",..: 1 13 1 3 1 12 1 2 1 10 ...
..$ V2: Factor w/ 11 levels "Age at first marriage\nDivorce rate\nDomestic citizens\nEthnic and cultural diversity level\nForeign-born population\nImmigrant"| __truncated__,..: NA 7 NA 4 NA 2 NA 9 NA 10 ...
> tables[[1]]
NULL
saveXML
xmlChildren
XMLノードオブジェクト内のサブノードを取得
> xmlChildren(doc$doc$children[["dataset"]]) %>% names()
Error in doc$doc: object of type 'externalptr' is not subsettable
xmlGetAttr
> doc <- xmlParse(system.file("exampleData", "tagnames.xml", package = "XML"))
> els <- getNodeSet(doc, "/doc//a[@status]")
> sapply(els, function(el) xmlGetAttr(el, "status"))
[1] "xyz" "1"
xmlRoot
> xmlTreeParse(system.file("exampleData", "mtcars.xml", package="XML")) %>% xmlRoot()
<dataset name="mtcars" numRecords="32" source="R Project">
<variables count="11">
<variable unit="Miles/gallon">mpg</variable>
<variable>cyl</variable>
<variable>disp</variable>
<variable>hp</variable>
<variable>drat</variable>
<variable>wt</variable>
<variable>qsec</variable>
<variable>vs</variable>
<variable type="FactorVariable" levels="automatic,manual">am</variable>
<variable>gear</variable>
<variable>carb</variable>
</variables>
<record id="Mazda RX4">21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4</record>
<record id="Mazda RX4 Wag">21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4</record>
<record id="Datsun 710">22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1</record>
<record id="Hornet 4 Drive">21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1</record>
<record id="Hornet Sportabout">18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2</record>
<record id="Valiant">18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1</record>
<record id="Duster 360">14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4</record>
<record id="Merc 240D">24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2</record>
<record id="Merc 230">22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2</record>
<record id="Merc 280">19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4</record>
<record id="Merc 280C">17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4</record>
<record id="Merc 450SE">16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3</record>
<record id="Merc 450SL">17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3</record>
<record id="Merc 450SLC">15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3</record>
<record id="Cadillac Fleetwood">10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4</record>
<record id="Lincoln Continental">10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4</record>
<record id="Chrysler Imperial">14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4</record>
<record id="Fiat 128">32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1</record>
<record id="Honda Civic">30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2</record>
<record id="Toyota Corolla">33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1</record>
<record id="Toyota Corona">21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1</record>
<record id="Dodge Challenger">15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2</record>
<record id="AMC Javelin">15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2</record>
<record id="Camaro Z28">13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4</record>
<record id="Pontiac Firebird">19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2</record>
<record id="Fiat X1-9">27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1</record>
<record id="Porsche 914-2">26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2</record>
<record id="Lotus Europa">30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2</record>
<record id="Ford Pantera L">15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4</record>
<record id="Ferrari Dino">19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6</record>
<record id="Maserati Bora">15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8</record>
<record id="Volvo 142E">21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2</record>
</dataset>
xmlToDataFrame
XML文書からデータフレーム生成
Arguments
- doc
- colClasses
- homogeneous
- collectNames
- nodes
- stringsAsFactors
> path <- system.file("exampleData", "size.xml", package = "XML")
> xmlToDataFrame(doc = path, c("integer", "integer", "numeric")) %>% {
+ class(.) %>% print() # data.frameクラス
+ print(.)
+ }
[1] "data.frame"
age sex number
1 0 0 500
2 0 1 300
3 1 0 200
4 1 1 400
5 10 0 NA
xmlTreeParse / xmlInternalTreeParse / xmlNativeTreeParse / htmlTreeParse / htmlParse
XMLパーサー。
xmlParse
、htmlParse
はそれぞれ、xmlTreeParse()
およびhtmlTreeParse()
のuseInternalNodes引数をTRUEにした時と同じ挙動。
Arguments
- file
- ignoreBlanks
- handlers
- replaceEntities
- asText... 文字列で与える場合にTRUE
- trim
- validate
- getDTD
- isURL
- asTree
- addAttributeNamespaces
- useInternalNodes
- isSchema
- fullNamespaceInfo
- encoding
- useDotNames
- xinclude
- addFinalizer
- error
- isHTML
- options
- parentFirst
> system.file("exampleData", "test.xml", package = "XML") %>%
+ xmlTreeParse() %>%
+ xmlRoot()
<foo x="1">
<element attrib1="my value"/>
test entity bar <?R sum(rnorm(100))?>
<a>
<!--A comment-->
<b>%extEnt;</b>
</a>
<![CDATA[
This is escaped data
containing < and &. ]]>
Note that this caused a segmentation fault if replaceEntities was
not TRUE.
That is,
<code>xmlTreeParse("test.xml", replaceEntities = TRUE)</code>
works, but
<code>xmlTreeParse("test.xml")</code>
does not if this is called before the one above.
This is now fixed and was caused by
treating an xmlNodePtr in the C code
that had type XML_ELEMENT_DECL
and so was in fact an xmlElementPtr.
Aaah, C and casting!
</foo>