# pdftools: Extract Text and Data from PDF Documents

- CRAN: http://cran.r-project.org/web/packages/pdftools/index.html
- GitHub: https://github.com/jeroenooms/pdftools

```
> library(pdftools)
```

バージョン: 0.1

関数名 | 概略 |
---|---|

`pdf_info` |
PDF utilities |

`pdf_render_page` |
Render PDF to bitmap |

## pdf_info / pdf_text / pdf_fonts / pdf_attachments / pdf_toc

PDFの情報を取得するためのユーティリティ（日本語でもおk）

```
> path <- system.file("doc/zoo.pdf", package = "zoo")
>
> pdf_info(pdf = path)
```

```
$version
[1] "1.5"
$pages
[1] 29
$encrypted
[1] FALSE
$linearized
[1] FALSE
$keys
$keys$Creator
[1] "David M. Jones"
$keys$Producer
[1] "GPL Ghostscript 9.06"
$keys$Title
[1] "CMB10"
$created
[1] "2015-03-16 14:27:01 JST"
$modified
[1] "2015-03-16 14:27:01 JST"
$metadata
[1] "<?xpacket begin='ï»¿' id='W5M0MpCehiHzreSzNTczkc9d'?>\n<?adobe-xap-filters esc=\"CRLF\"?>\n<x:xmpmeta xmlns:x='adobe:ns:meta/' x:xmptk='XMP toolkit 2.9.1-13, framework 1.6'>\n<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:iX='http://ns.adobe.com/iX/1.0/'>\n<rdf:Description rdf:about='uuid:ad65eba4-03fc-11f0-0000-c48715a6ff54' xmlns:pdf='http://ns.adobe.com/pdf/1.3/' pdf:Producer='GPL Ghostscript 9.06'/>\n<rdf:Description rdf:about='uuid:ad65eba4-03fc-11f0-0000-c48715a6ff54' xmlns:xmp='http://ns.adobe.com/xap/1.0/'><xmp:ModifyDate>2015-03-16T14:27:01+01:00</xmp:ModifyDate>\n<xmp:CreateDate>2015-03-16T14:27:01+01:00</xmp:CreateDate>\n<xmp:CreatorTool>David M. Jones</xmp:CreatorTool></rdf:Description>\n<rdf:Description rdf:about='uuid:ad65eba4-03fc-11f0-0000-c48715a6ff54' xmlns:xapMM='http://ns.adobe.com/xap/1.0/mm/' xapMM:DocumentID='uuid:ad65eba4-03fc-11f0-0000-c48715a6ff54'/>\n<rdf:Description rdf:about='uuid:ad65eba4-03fc-11f0-0000-c48715a6ff54' xmlns:dc='http://purl.org/dc/elements/1.1/' dc:format='application/pdf'><dc:title><rdf:Alt><rdf:li xml:lang='x-default'>CMB10</rdf:li></rdf:Alt></dc:title></rdf:Description>\n</rdf:RDF>\n</x:xmpmeta>\n \n \n<?xpacket end='w'?>"
$locked
[1] FALSE
$attachments
[1] FALSE
$layout
[1] "no_layout"
```

```
> pdf_text(path)
```

```
[1] " zoo: An S3 Class and Methods for Indexed Totally\n Ordered Observations\n Achim Zeileis Gabor Grothendieck\n Universität Innsbruck GKX Associates Inc.\n Abstract\n A previous version to this introduction to the R package zoo has been published as\n Zeileis and Grothendieck (2005) in the Journal of Statistical Software.\n zoo is an R package providing an S3 class with methods for indexed totally ordered\n observations, such as discrete irregular time series. Its key design goals are independence\n of a particular index/time/date class and consistency with base R and the \"ts\" class for\n regular time series. This paper describes how these are achieved within zoo and provides\n several illustrations of the available methods for \"zoo\" objects which include plotting,\n merging and binding, several mathematical operations, extracting and replacing data and\n index, coercion and NA handling. A subclass \"zooreg\" embeds regular time series into\n the \"zoo\" framework and thus bridges the gap between regular and irregular time series\n classes in R.\nKeywords: totally ordered observations, irregular time series, regular time series, S3, R.\n 1. Introduction\nThe R system for statistical computing (R Development Core Team 2008, http://www.\nR-project.org/) ships with a class for regularly spaced time series, \"ts\" in package stats,\nbut has no native class for irregularly spaced time series. With the increased interest in com-\nputational finance with R over the last years several implementations of classes for irregular\ntime series emerged which are aimed particularly at finance applications. These include the\nS4 classes \"timeSeries\" in package timeSeries (previously fSeries) from the Rmetrics suite\n(Wuertz 2010), \"its\" in package its (Heywood 2009) and the S3 class \"irts\" in package\ntseries (Trapletti 2009). With these packages available, why would anybody want yet another\npackage providing infrastructure for irregular time series? The above mentioned implemen-\ntations have in common that they are restricted to a particular class for the time scale: the\nformer implementation comes with its own time class \"timeDate\" from package timeDate\n(previously fCalendar) built on top of the \"POSIXct\" class available in base R whereas the\nlatter two use \"POSIXct\" directly. And this was the starting point for the zoo project: the\nfirst author of the present paper needed more general support for ordered observations, inde-\npendent of a particular index class, for the package strucchange (Zeileis, Leisch, Hornik, and\nKleiber 2002). Hence, the package was called zoo which stands for Z’s ordered observations.\nSince the first release, a major part of the additions to zoo were provided by the second author\nof this paper, so that the name of the package does not really reflect the authorship anymore.\nNevertheless, independence of a particular index class remained the most important design\n"
[2] "2 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\ngoal. While the package evolved to its current status, a second key design goal became more\nand more clear: to provide methods to standard generic functions for the \"zoo\" class that\nare similar to those for the \"ts\" class (and base R in general) such that the usage of zoo is\nvery intuitive because few additional commands have to be learned. This paper describes how\nthese design goals are implemented in zoo. The resulting package provides the \"zoo\" class\nwhich offers an extensive (and still growing) set of standard and new methods for working\nwith indexed observations and ‘talks’ to the classes \"ts\", \"its\", \"irts\" and \"timeSeries\".\n(In addition to these independent approaches, the class \"xts\" built upon \"zoo\" was recently\nintroduced by Ryan and Ulrich 2010, .). zoo also bridges the gap between regular and irreg-\nular time series by providing coercion with (virtually) no loss of information between \"ts\"\nand \"zoo\". With these tools zoo provides the basic infrastructure for working with indexed\ntotally ordered observations and the package can be either employed by users directly or can\nbe a basic ingredient on top of which other more specialized applications can be built.\nThe remainder of the paper is organized as follows: Section 2 explains how \"zoo\" objects\nare created and illustrates how the corresponding methods for plotting, merging and binding,\nseveral mathematical operations, extracting and replacing data and index, coercion and NA\nhandling can be used. Section 3 outlines how other packages can build on this basic infras-\ntructure. Section 4 gives a few summarizing remarks and an outlook on future developments.\nFinally, an appendix provides a reference card that gives an overview of the functionality\ncontained in zoo.\n 2. The class \"zoo\" and its methods\nThis section describes how \"zoo\" series can be created and subsequently manipulated, visual-\nized, combined or coerced to other classes. In Section 2.1, the general class \"zoo\" for totally\nordered series is described. Subsequently, in Section 2.2, the subclass \"zooreg\" for regular\n\"zoo\" series, i.e., series which have an index with a specified frequency, is discussed. The\nmethods illustrated in the remainder of the section are mostly the same for both \"zoo\" and\n\"zooreg\" objects and hence do not have to be discussed separately. The few differences in\nmerging and binding are briefly highlighted in Section 2.4.\n2.1. Creation of \"zoo\" objects\nThe simple idea for the creation of \"zoo\" objects is to have some vector or matrix of obser-\nvations x which are totally ordered by some index vector. In time series applications, this\nindex is a measure of time but every other numeric, character or even more abstract vector\nthat provides a total ordering of the observations is also suitable. Objects of class \"zoo\" are\ncreated by the function\nzoo(x, order.by)\nwhere x is the vector or matrix of observations1 and order.by is the index by which the ob-\nservations should be ordered. It has to be of the same length as NROW(x), i.e., either the same\n 1\n In principle, more general objects can be indexed, but currently zoo does not support this. Development\nplans are that zoo should eventually support indexed factors, data frames and lists.\n"
[3] " Achim Zeileis, Gabor Grothendieck 3\nlength as x for vectors or the same number of rows for matrices.2 The \"zoo\" object created\nis essentially the vector/matrix as before but has an additional \"index\" attribute in which\nthe index is stored.3 Both the observations in the vector/matrix x and the index order.by\ncan, in principle, be of arbitrary classes. However, most of the following methods (plotting,\naggregating, mathematical operations) for \"zoo\" objects are typically only useful for numeric\nobservations x. Special effort in the design was put into independence from a particular class\nfor the index vector. In zoo, it is assumed that combination c(), querying the length(),\nvalue matching MATCH(), subsetting [, and, of course, ordering ORDER() work when applied\nto the index. In addition, an as.character() method might improve printed output4 and\nas.numeric() could be used for computing distances between indexes, e.g., in interpolation.\nBoth methods are not necessary for working with \"zoo\" objects but could be used if avail-\nable. All these methods are available, e.g., for standard numeric and character vectors and\nfor vectors of classes \"Date\", \"POSIXct\" or \"times\" from package chron and \"timeDate\"\nin timeDate. Because not all required methods used to be available for \"timeDate\" in older\nversions of fCalendar, Section 3.3 has a rather outdated example how to provide such methods\nso that \"zoo\" objects work with \"timeDate\" indexes. To achieve this independence of the\nindex class, new generic functions for ordering (ORDER()) and value matching (MATCH()) are\nintroduced as the corresponding base functions order() and match() are non-generic. The\ndefault methods simply call the corresponding base functions, i.e., no new method needs to be\nintroduced for a particular index class if the non-generic functions order() and match() work\nfor this class. R now also provides a new generic xtfrm() which was not available when the\nnew generic ORDER() was introduced. If there is a xtfrm() for a class, the default ORDER()\nmethod typically works.\nTo illustrate the usage of zoo(), we first load the package and set the random seed to make\nthe examples in this paper exactly reproducible.\nR> library(\"zoo\")\nR> set.seed(1071)\nThen, we create two vectors z1 and z2 with \"POSIXct\" indexes, one with random observations\nR> z1.index <- ISOdatetime(2004, rep(1:2,5), sample(28,10), 0, 0, 0)\nR> z1.data <- rnorm(10)\nR> z1 <- zoo(z1.data, z1.index)\nand one with a sine wave\nR> z2.index <- as.POSIXct(paste(2004, rep(1:2, 5), sample(1:28, 10),\n+ sep = \"-\"))\nR> z2.data <- sin(2*1:10/pi)\nR> z2 <- zoo(z2.data, z2.index)\n 2\n The only case where this restriction is not imposed is for zero-length vectors, i.e., vectors that only have\nan index but no data.\n 3\n There is some limited support for indexed factors available in which case the \"zoo\" object also has an\nattribute \"oclass\" with the original class of x. This feature is still under development and might change in\nfuture versions.\n 4\n If an as.character() method is already defined, but gives not the desired output for printing, then an\nindex2char() method can be defined. This is a generic convenience function used for creating character\nrepresentations of the index vector and it defaults to using as.character().\n"
[4] "4 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\nFurthermore, we create a matrix Z with random observations and a \"Date\" index\nR> Z.index <- as.Date(sample(12450:12500, 10))\nR> Z.data <- matrix(rnorm(30), ncol = 3)\nR> colnames(Z.data) <- c(\"Aa\", \"Bb\", \"Cc\")\nR> Z <- zoo(Z.data, Z.index)\nIn the examples above, the generation of indexes looks a bit awkward due to the fact the\nindexes need to be randomly generated (and there are no special functions for random indexes\nbecause these are rarely needed in practice). In “real world” applications, the indexes are\ntypically part of the raw data set read into R so the code would be even simpler. See Section 3\nfor such examples.5\nMethods to several standard generic functions are available for \"zoo\" objects, such as print,\nsummary, str, head, tail and [ (subsetting), a few of which are illustrated in the following.\nThere are three printing code styles for \"zoo\" objects: vectors are by default printed in\n\"horizontal\" style\nR> z1\n 2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07\n 0.74675994 0.02107873 -0.29823529 0.68625772 1.94078850 1.27384445\n 2004-02-12 2004-02-16 2004-02-20 2004-02-24\n 0.22170438 -2.07607585 -1.78439244 -0.19533304\nR> z1[3:7]\n2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12\n-0.2982353 0.6862577 1.9407885 1.2738445 0.2217044\nand matrices in \"vertical\" style\nR> Z\n Aa Bb Cc\n2004-02-02 1.2554339 0.6815732 -0.63292049\n2004-02-08 -1.4945833 1.3234122 -1.49442269\n2004-02-09 -1.8746225 -0.8732929 0.62733971\n2004-02-21 -0.1453861 0.4523490 -0.14597401\n2004-02-22 0.2254242 0.5383894 0.23136133\n2004-02-29 1.2069552 0.3181422 -0.01129202\n2004-03-05 -1.2086102 1.4237978 -0.81614483\n2004-03-10 -0.1103956 1.3477425 0.95522468\n2004-03-14 0.8420238 -2.7384202 0.23150695\n2004-03-20 -0.1901910 0.1230887 -1.51862157\n 5\n Note, that in the code above a new as.Date method, provided in zoo, is used to convert days since\n1970-01-01 to class \"Date\". See the respective help page for more details.\n"
[5] " Achim Zeileis, Gabor Grothendieck 5\nR> Z[1:3, 2:3]\n Bb Cc\n2004-02-02 0.6815732 -0.6329205\n2004-02-08 1.3234122 -1.4944227\n2004-02-09 -0.8732929 0.6273397\nAdditionally, there is a \"plain\" style which simply first prints the data and then the index.\nAbove, we have illustrated that \"zoo\" series can be indexed like vectors or matrices respec-\ntively, i.e., with integers correponding to their observation number (and column number).\nBut for indexed observations, one would obviously also like to be able to index with the index\nclass. This is also available in [ which only uses vector/matrix-type subsetting if its first\nargument is of class \"numeric\", \"integer\" or \"logical\".\nR> z1[ISOdatetime(2004, 1, c(14, 25), 0, 0, 0)]\n2004-01-14 2004-01-25\n0.02107873 0.68625772\nIf the index class happens to be \"numeric\", the index has to be either insulated in I() like\nz[I(i)] or the window() method can be used (see Section 2.6).\nSummaries and most other methods for \"zoo\" objects are carried out column wise, reflecting\nthe rectangular structure. In addition, a summary of the index is provided.\nR> summary(z1)\n Index z1\n Min. :2004-01-05 00:00:00 Min. :-2.07608\n 1st Qu.:2004-01-20 12:00:00 1st Qu.:-0.27251\n Median :2004-02-01 12:00:00 Median : 0.12139\n Mean :2004-02-01 09:36:00 Mean : 0.05364\n 3rd Qu.:2004-02-15 00:00:00 3rd Qu.: 0.73163\n Max. :2004-02-24 00:00:00 Max. : 1.94079\nR> summary(Z)\n Index Aa Bb Cc\n Min. :2004-02-02 Min. :-1.8746 Min. :-2.7384 Min. :-1.51862\n 1st Qu.:2004-02-12 1st Qu.:-0.9540 1st Qu.: 0.1719 1st Qu.:-0.77034\n Median :2004-02-25 Median :-0.1279 Median : 0.4954 Median :-0.07863\n Mean :2004-02-25 Mean :-0.1494 Mean : 0.2597 Mean :-0.25739\n 3rd Qu.:2004-03-08 3rd Qu.: 0.6879 3rd Qu.: 1.1630 3rd Qu.: 0.23147\n Max. :2004-03-20 Max. : 1.2554 Max. : 1.4238 Max. : 0.95522\n"
[6] "6 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\n2.2. Creation of \"zooreg\" objects\nStrictly regular series are such series observations where the distance between the indexes\nof every two adjacent observations is the same. Such series can also be described by their\nfrequency, i.e., the reciprocal value of the distance between two observations. As \"zoo\" can be\nused to store series with arbitrary type of index, it can, of course, also be used to store series\nwith regular indexes. So why should this case be given special attention, in particular as there\nis already the \"ts\" class devoted entirely to regular series? There are two reasons: First, to\nbe able to convert back and forth between \"ts\" and \"zoo\", the frequency of a certain series\nneeds to be stored on the \"zoo\" side. Second, \"ts\" is limited to strictly regular series and\nthe regularity is lost if some internal observations are omitted. Series that can be created by\nomitting some internal observations from strictly regular series will in the following be refered\nto as being (weakly) regular. Therefore, a class that bridges the gap between irregular and\nstrictly regular series is needed and \"zooreg\" fills this gap. Objects of class \"zooreg\" inherit\nfrom class \"zoo\" but have an additional attribute \"frequency\" in which the frequency of\nthe series is stored. Therefore, they can be employed to represent both strictly and weakly\nregular series.\nTo create a \"zooreg\" object, either the command zoo() can be used or the command\nzooreg().\nzoo(x, order.by, frequency)\nzooreg(data, start, end, frequency, deltat, ts.eps, order.by)\nIf zoo() is called as in the previous section but with an additional frequency argument,\nit is checked whether frequency complies with the index order.by: if it does an object of\nclass \"zooreg\" inheriting from \"zoo\" is returned. The command zooreg() takes mostly the\nsame arguments as ts().6 In both cases, the index class is more restricted than in the plain\n\"zoo\" case. The index must be of a class which can be coerced to \"numeric\" (for checking\nits regularity) and when converted to numeric the index must be expressable as multiples of\n1/frequency. Furthermore, adding/substracting a numeric to/from an observation of the index\nclass, should return the correct value of the index class again, i.e., group generic functions\nOps should be defined.7\nThe following calls yield equivalent series\nR> zr1 <- zooreg(sin(1:9), start = 2000, frequency = 4)\nR> zr2 <- zoo(sin(1:9), seq(2000, 2002, by = 1/4), 4)\nR> zr1\n 2000(1) 2000(2) 2000(3) 2000(4) 2001(1) 2001(2) 2001(3)\n 0.8414710 0.9092974 0.1411200 -0.7568025 -0.9589243 -0.2794155 0.6569866\n 2001(4) 2002(1)\n 0.9893582 0.4121185\nR> zr2\n 6\n Only if order.by is specified in the zooreg() call, then zoo(x, order.by, frequency) is called.\n 7\n An application of non-numeric indexes for regular series are the classes \"yearmon\" and \"yearqtr\" which\nare designed for monthly and quarterly series respectively and are discussed in Section 3.4.\n"
[7] " Achim Zeileis, Gabor Grothendieck 7\n 2000(1) 2000(2) 2000(3) 2000(4) 2001(1) 2001(2) 2001(3)\n 0.8414710 0.9092974 0.1411200 -0.7568025 -0.9589243 -0.2794155 0.6569866\n 2001(4) 2002(1)\n 0.9893582 0.4121185\nto which methods to standard generic functions for regular series can be applied, such as\nfrequency, deltat, cycle.\nAs stated above, the advantage of \"zooreg\" series is that they remain regular even if an\ninternal observation is dropped:\nR> zr1 <- zr1[-c(3, 5)]\nR> zr1\n 2000(1) 2000(2) 2000(4) 2001(2) 2001(3) 2001(4) 2002(1)\n 0.8414710 0.9092974 -0.7568025 -0.2794155 0.6569866 0.9893582 0.4121185\nR> class(zr1)\n[1] \"zooreg\" \"zoo\"\nR> frequency(zr1)\n[1] 4\nThis facilitates NA handling significantly compared to \"ts\" and makes \"zooreg\" a much more\nattractive data type, e.g., for time series regression.\nzooreg() can also deal with non-numeric indexes provided that adding \"numeric\" observa-\ntions to the index class preserves the class and does not coerce to \"numeric\".\nR> zooreg(1:5, start = as.Date(\"2005-01-01\"))\n2005-01-01 2005-01-02 2005-01-03 2005-01-04 2005-01-05\n 1 2 3 4 5\nTo check whether a certain series is (strictly) regular, the new generic function is.regular(x,\nstrict = FALSE) can be used:\nR> is.regular(zr1)\n[1] TRUE\nR> is.regular(zr1, strict = TRUE)\n[1] FALSE\n"
[8] "8 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\nThis function (and also the frequency, deltat and cycle) also work for \"zoo\" objects if the\nregularity can still be inferred from the data:\nR> zr1 <- as.zoo(zr1)\nR> zr1\n 2000 2000.25 2000.75 2001.25 2001.5 2001.75 2002\n 0.8414710 0.9092974 -0.7568025 -0.2794155 0.6569866 0.9893582 0.4121185\nR> class(zr1)\n[1] \"zoo\"\nR> is.regular(zr1)\n[1] TRUE\nR> frequency(zr1)\n[1] 4\nOf course, inferring the underlying regularity is not always reliable and it is safer to store a\nregular series as a \"zooreg\" object if it is intended to be a regular series.\nIf a weakly regular series is coerced to \"ts\" the missing observations are filled with NAs (see\nalso Section 2.8). For strictly regular series with numeric index, the class can be switched\nbetween \"zoo\" and \"ts\" without loss of information.\nR> as.ts(zr1)\n Qtr1 Qtr2 Qtr3 Qtr4\n2000 0.8414710 0.9092974 NA -0.7568025\n2001 NA -0.2794155 0.6569866 0.9893582\n2002 0.4121185\nR> identical(zr2, as.zoo(as.ts(zr2)))\n[1] TRUE\nThis enables direct use of functions such as acf, arima, stl etc. on \"zooreg\" objects as these\nmethods coerce to \"ts\" first. The result only has to be coerced back to \"zoo\", if appropriate.\n2.3. Plotting\nThe plot method for \"zoo\" objects, in particular for multivariate \"zoo\" series, is based on\nthe corresponding method for (multivariate) regular time series. It relies on plot and lines\nmethods being available for the index class which can plot the index against the observations.\nBy default the plot method creates a panel for each series\n"
[9] " Achim Zeileis, Gabor Grothendieck 9\nR> plot(Z)\nbut can also display all series in a single panel\nR> plot(Z, plot.type = \"single\", col = 2:4)\nIn both cases additional graphical parameters like color col, plotting character pch and line\ntype lty can be expanded to the number of series. But the plot method for \"zoo\" objects\noffers some more flexibility in specification of graphical parameters as in\nR> plot(Z, type = \"b\", lty = 1:3, pch = list(Aa = 1:5, Bb = 2, Cc = 4),\n+ col = list(Bb = 2, 4))\nThe argument lty behaves as before and sets every series in another line type. The pch\nargument is a named list that assigns to each series a different vector of plotting characters\neach of which is expanded to the number of observations. Such a list does not necessarily\nhave to include the names of all series, but can also specify a subset. For the remaining series\nthe default parameter is then used which can again be changed: e.g., in the above example\nthe col argument is set to display the series \"Bb\" in red and all remaining series in blue.\nThe results of the multiple panel plots are depicted in Figure 2 and the single panel plot in\nFigure 1.\n2.4. Merging and binding\nAs for many rectangular data formats in R, there are both methods for combining the rows\nand columns of \"zoo\" objects respectively. For the rbind method the number of columns of\nthe combined objects has to be identical and the indexes may not overlap.\n 1\n 0\n Z\n −1\n −2\n Feb 02 Feb 09 Feb 16 Feb 23 Mar 01 Mar 15\n Index\n Figure 1: Example of a single panel plot\n"
[10] "10 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\n Z\n 0.5\n Aa −0.5\n −1.5\n 1\n 0\n Bb −1\n −2\n 0.5\n Cc\n −0.5\n −1.5\n Feb 02 Feb 09 Feb 16 Feb 23 Mar 01 Mar 08 Mar 15\n Index\n Z\n ● ●\n 0.5\n Aa −0.5\n −1.5\n 1\n 0\n Bb −1\n −2\n 0.5\n Cc\n −0.5\n −1.5\n Feb 02 Feb 09 Feb 16 Feb 23 Mar 01 Mar 08 Mar 15\n Index\n Figure 2: Examples of multiple panel plots\n"
[11] " Achim Zeileis, Gabor Grothendieck 11\nR> rbind(z1[5:10], z1[2:3])\n 2004-01-14 2004-01-19 2004-01-27 2004-02-07 2004-02-12 2004-02-16\n 0.02107873 -0.29823529 1.94078850 1.27384445 0.22170438 -2.07607585\n 2004-02-20 2004-02-24\n-1.78439244 -0.19533304\nThe c method simply calls rbind and hence behaves in the same way.\nThe cbind method by default combines the columns by the union of the indexes and fills the\ncreated gaps by NAs.\nR> cbind(z1, z2)\n z1 z2\n2004-01-03 NA 0.94306673\n2004-01-05 0.74675994 -0.04149429\n2004-01-14 0.02107873 NA\n2004-01-17 NA 0.59448077\n2004-01-19 -0.29823529 -0.52575918\n2004-01-24 NA -0.96739776\n2004-01-25 0.68625772 NA\n2004-01-27 1.94078850 NA\n2004-02-07 1.27384445 NA\n2004-02-08 NA 0.95605566\n2004-02-12 0.22170438 -0.62733473\n2004-02-13 NA -0.92845336\n2004-02-16 -2.07607585 NA\n2004-02-20 -1.78439244 NA\n2004-02-24 -0.19533304 NA\n2004-02-25 NA 0.56060280\n2004-02-26 NA 0.08291711\nIn fact, the cbind method is synonymous with the merge method8 except that the latter\nprovides additional arguments which allow for combining the columns by the intersection of\nthe indexes using the argument all = FALSE\nR> merge(z1, z2, all = FALSE)\n z1 z2\n2004-01-05 0.7467599 -0.04149429\n2004-01-19 -0.2982353 -0.52575918\n2004-02-12 0.2217044 -0.62733473\n 8\n Note, that in some situations the column naming in the resulting object is somewhat problematic in the\ncbind method and the merge method might provide better formatting of the column names.\n"
[12] "12 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\nAdditionally, the filling pattern can be changed in merge, the naming of the columns can be\nmodified and the return class of the result can be specified. In the case of merging of objects\nwith different index classes, R gives a warning and tries to coerce the indexes. Merging\nobjects with different index classes is generally discouraged—if it is used nevertheless, it is\nthe responsibility of the user to ensure that the result is as intended. If at least one of the\nmerged/binded objects was a \"zooreg\" object, then merge tries to return a \"zooreg\" object.\nThis is done by assessing whether there is a common maximal frequency and by checking\nwhether the resulting index is still (weakly) regular.\nIf non-\"zoo\" objects are included in merging, then merge gives plain vectors/factors/matrices\nthe index of the first argument (if it is of the same length). Scalars are always added for the\nfull index without missing values.\nR> merge(z1, pi, 1:10)\n z1 pi 1:10\n2004-01-05 0.74675994 3.141593 1\n2004-01-14 0.02107873 3.141593 2\n2004-01-19 -0.29823529 3.141593 3\n2004-01-25 0.68625772 3.141593 4\n2004-01-27 1.94078850 3.141593 5\n2004-02-07 1.27384445 3.141593 6\n2004-02-12 0.22170438 3.141593 7\n2004-02-16 -2.07607585 3.141593 8\n2004-02-20 -1.78439244 3.141593 9\n2004-02-24 -0.19533304 3.141593 10\nAnother function which performs operations along a subset of indexes is aggregate, which is\ndiscussed in this section although it does not combine several objects. Using the aggregate\nmethod, \"zoo\" objects are split into subsets along a coarser index grid, summary statistics\nare computed for each and then the reduced object is returned. In the following example,\nfirst a function is set up which returns for a given \"Date\" value the corresponding first of the\nmonth. This function is then used to compute the coarser grid for the aggregate call: in\nthe first example, the grouping is computed explicitely by firstofmonth(index(Z)) and the\nmean of the observations in the month is returned—in the second example, only the function\nthat computes the grouping (when applied to index(Z)) is supplied and the first observation\nis used for aggregation.\nR> firstofmonth <- function(x) as.Date(sub(\"..$\", \"01\", format(x)))\nR> aggregate(Z, firstofmonth(index(Z)), mean)\n Aa Bb Cc\n2004-02-01 -0.1377964 0.40676219 -0.2376514\n2004-03-01 -0.1667933 0.03905223 -0.2870087\nR> aggregate(Z, firstofmonth, head, 1)\n"
[13] " Achim Zeileis, Gabor Grothendieck 13\n Aa Bb Cc\n2004-02-01 1.255434 0.6815732 -0.6329205\n2004-03-01 -1.208610 1.4237978 -0.8161448\nThe opposite of aggregation is disaggregation. For example, the Nile dataset is an annual\n\"ts\" class series. To disaggregate it into a quarterly series, convert it to a \"zoo class series,\ninsert intermediate quarterly points containing NA values and then fill the NA values using\nna.approx, na.locf or na.spline. (More details on NA handling in general can be found in\nSection 2.8.)\nR> Nile.na <- merge(as.zoo(Nile),\n+ zoo(, seq(start(Nile)[1], end(Nile)[1], 1/4)))\nR> head(as.zoo(Nile))\n1871 1872 1873 1874 1875 1876\n1120 1160 963 1210 1160 1160\nR> head(na.approx(Nile.na))\n1871(1) 1871(2) 1871(3) 1871(4) 1872(1) 1872(2)\n1120.00 1130.00 1140.00 1150.00 1160.00 1110.75\nR> head(na.locf(Nile.na))\n1871(1) 1871(2) 1871(3) 1871(4) 1872(1) 1872(2)\n 1120 1120 1120 1120 1160 1160\nR> head(na.spline(Nile.na))\n 1871(1) 1871(2) 1871(3) 1871(4) 1872(1) 1872(2)\n1120.000 1199.059 1224.985 1208.419 1160.000 1091.970\n2.5. Mathematical operations\nTo allow for standard mathematical operations among \"zoo\" objects, zoo extends group\ngeneric functions Ops. These perform the operations only for the intersection of the indexes\nof the objects. As an example, the summation and logical comparison with < of z1 and z2\nyield\nR> z1 + z2\n2004-01-05 2004-01-19 2004-02-12\n 0.7052657 -0.8239945 -0.4056304\nR> z1 < z2\n"
[14] "14 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\n2004-01-05 2004-01-19 2004-02-12\n FALSE FALSE FALSE\nAdditionally, methods for transposing t of \"zoo\" objects—which coerces to a matrix before—\nand computing cumulative quantities such as cumsum, cumprod, cummin, cummax which are all\napplied column wise.\nR> cumsum(Z)\n Aa Bb Cc\n2004-02-02 1.2554339 0.6815732 -0.6329205\n2004-02-08 -0.2391494 2.0049854 -2.1273432\n2004-02-09 -2.1137718 1.1316925 -1.5000035\n2004-02-21 -2.2591579 1.5840415 -1.6459775\n2004-02-22 -2.0337337 2.1224309 -1.4146162\n2004-02-29 -0.8267785 2.4405731 -1.4259082\n2004-03-05 -2.0353888 3.8643710 -2.2420530\n2004-03-10 -2.1457844 5.2121135 -1.2868283\n2004-03-14 -1.3037606 2.4736933 -1.0553214\n2004-03-20 -1.4939516 2.5967820 -2.5739429\n2.6. Extracting and replacing the data and the index\nzoo provides several generic functions and methods to work on the data contained in a \"zoo\"\nobject, the index (or time) attribute associated to it, and on both data and index.\nThe data stored in \"zoo\" objects can be extracted by coredata which strips off all \"zoo\"-\nspecific attributes and it can be replaced using coredata<-. Both are new generic functions9\nwith methods for \"zoo\" objects as illustrated in the following example.\nR> coredata(z1)\n [1] 0.74675994 0.02107873 -0.29823529 0.68625772 1.94078850 1.27384445\n [7] 0.22170438 -2.07607585 -1.78439244 -0.19533304\nR> coredata(z1) <- 1:10\nR> z1\n2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12\n 1 2 3 4 5 6 7\n2004-02-16 2004-02-20 2004-02-24\n 8 9 10\n 9\n The coredata functionality is similar in spirit to the core function in its and value in tseries. However, the\nfocus of those functions is somewhat narrower and we try to provide more general purpose generic functions.\nSee the respective manual page for more details.\n"
[15] " Achim Zeileis, Gabor Grothendieck 15\nThe index associated with a \"zoo\" object can be extracted by index and modified by index<-.\nAs the interpretation of the index as “time” in time series applications is natural, there are\nalso synonymous methods time and time<-. Hence, the commands index(z2) and time(z2)\nreturn equivalent results.\nR> index(z2)\n [1] \"2004-01-03 GMT\" \"2004-01-05 GMT\" \"2004-01-17 GMT\" \"2004-01-19 GMT\"\n [5] \"2004-01-24 GMT\" \"2004-02-08 GMT\" \"2004-02-12 GMT\" \"2004-02-13 GMT\"\n [9] \"2004-02-25 GMT\" \"2004-02-26 GMT\"\nThe index scale of z2 can be changed to that of z1 by\nR> index(z2) <- index(z1)\nR> z2\n 2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07\n 0.94306673 -0.04149429 0.59448077 -0.52575918 -0.96739776 0.95605566\n 2004-02-12 2004-02-16 2004-02-20 2004-02-24\n-0.62733473 -0.92845336 0.56060280 0.08291711\nThe start and the end of the index/time vector can be queried by start and end:\nR> start(z1)\n[1] \"2004-01-05 GMT\"\nR> end(z1)\n[1] \"2004-02-24 GMT\"\nTo work on both data and index/time, zoo provides window and window<- methods for \"zoo\"\nobjects. In both cases the window is specified by\nwindow(x, index, start, end)\nwhere x is the \"zoo\" object, index is a set of indexes to be selected (by default the full index\nof x) and start and end can be used to restrict the index set.\nR> window(Z, start = as.Date(\"2004-03-01\"))\n Aa Bb Cc\n2004-03-05 -1.2086102 1.4237978 -0.8161448\n2004-03-10 -0.1103956 1.3477425 0.9552247\n2004-03-14 0.8420238 -2.7384202 0.2315069\n2004-03-20 -0.1901910 0.1230887 -1.5186216\n"
[16] "16 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\nR> window(Z, index = index(Z)[5:8], end = as.Date(\"2004-03-01\"))\n Aa Bb Cc\n2004-02-22 0.2254242 0.5383894 0.23136133\n2004-02-29 1.2069552 0.3181422 -0.01129202\nThe first example selects all observations starting from 2004-03-01 whereas the second selects\nfrom the from the 5th to 8th observation those up to 2004-03-01.\nThe same syntax can be used for the corresponding replacement function.\nR> window(z1, end = as.POSIXct(\"2004-02-01\")) <- 9:5\nR> z1\n2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12\n 9 8 7 6 5 6 7\n2004-02-16 2004-02-20 2004-02-24\n 8 9 10\nTwo methods that are standard in time series applications are lag and diff. These are\navailable with the same arguments as the \"ts\" methods.10\nR> lag(z1, k = -1)\n2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12 2004-02-16\n 9 8 7 6 5 6 7\n2004-02-20 2004-02-24\n 8 9\nR> merge(z1, lag(z1, k = 1))\n z1 lag(z1, k = 1)\n2004-01-05 9 8\n2004-01-14 8 7\n2004-01-19 7 6\n2004-01-25 6 5\n2004-01-27 5 6\n2004-02-07 6 7\n2004-02-12 7 8\n2004-02-16 8 9\n2004-02-20 9 10\n2004-02-24 10 NA\nR> diff(z1)\n 10\n diff also has an additional argument that also allows for geometric and not only allows arithmetic dif-\nferences. Furthermore, note the sign of the lag in lag which behaves like the \"ts\" method, i.e., by default it\nis positive and shifts the observations forward, to obtain the more standard backward shift the lag has to be\nnegative.\n"
[17] " Achim Zeileis, Gabor Grothendieck 17\n2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12 2004-02-16\n -1 -1 -1 -1 1 1 1\n2004-02-20 2004-02-24\n 1 1\n2.7. Coercion to and from \"zoo\"\nCoercion to and from \"zoo\" objects is available for objects of various classes, in particular\n\"ts\", \"irts\" and \"its\" objects can be coerced to \"zoo\" and back if the index is of the\nappropriate class.11\nCoercion between \"zooreg\" and \"zoo\" is also available and is essentially dropping the\n\"frequency\" attribute or trying to add one, respectively.\nFurthermore, \"zoo\" objects can be coerced to vectors, matrices, lists and data frames (the\nlatter dropping the index/time attribute). A simple example is\nR> as.data.frame(Z)\n Aa Bb Cc\n2004-02-02 1.2554339 0.6815732 -0.63292049\n2004-02-08 -1.4945833 1.3234122 -1.49442269\n2004-02-09 -1.8746225 -0.8732929 0.62733971\n2004-02-21 -0.1453861 0.4523490 -0.14597401\n2004-02-22 0.2254242 0.5383894 0.23136133\n2004-02-29 1.2069552 0.3181422 -0.01129202\n2004-03-05 -1.2086102 1.4237978 -0.81614483\n2004-03-10 -0.1103956 1.3477425 0.95522468\n2004-03-14 0.8420238 -2.7384202 0.23150695\n2004-03-20 -0.1901910 0.1230887 -1.51862157\n2.8. NA handling\nA wide range of methods for dealing with NAs (missing observations) in the observations\nare applicable to \"zoo\" objects including na.omit, na.contiguous, na.approx, na.spline,\nand na.locf among others. na.omit—or its default method to be more precise—returns\na \"zoo\" object with incomplete observations removed. na.contiguous extracts the longest\nconsecutive stretch of non-missing values. Furthermore, new generic functions na.approx,\nna.spline, and na.locf and corresponding default methods are introduced in zoo. The\nformer two replace NAs by interpolation (using the function approx and spline, respectively)\nand the name of the latter stands for last observation carried forward. It replaces missing\nobservations by the most recent non-NA prior to it. Leading NAs, which cannot be replaced by\nprevious observations, are removed in both functions by default.\nR> z1[sample(1:10, 3)] <- NA\nR> z1\n 11\n Coercion from \"zoo\" to \"irts\" is contained in the tseries package.\n"
[18] "18 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\n2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12\n 9 NA 7 6 5 6 NA\n2004-02-16 2004-02-20 2004-02-24\n 8 9 NA\nR> na.omit(z1)\n2004-01-05 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-16 2004-02-20\n 9 7 6 5 6 8 9\nR> na.contiguous(z1)\n2004-01-19 2004-01-25 2004-01-27 2004-02-07\n 7 6 5 6\nR> na.approx(z1)\n2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12\n 9.000000 7.714286 7.000000 6.000000 5.000000 6.000000 7.111111\n2004-02-16 2004-02-20\n 8.000000 9.000000\nR> na.approx(z1, 1:NROW(z1))\n2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12\n 9 8 7 6 5 6 7\n2004-02-16 2004-02-20\n 8 9\nR> na.spline(z1)\n2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12\n 9.000000 6.766410 7.000000 6.000000 5.000000 6.000000 7.167209\n2004-02-16 2004-02-20 2004-02-24\n 8.000000 9.000000 10.157026\nR> na.locf(z1)\n2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07 2004-02-12\n 9 9 7 6 5 6 6\n2004-02-16 2004-02-20 2004-02-24\n 8 9 9\nAs the above example illustrates, na.approx (and also na.spline) use by default the under-\nlying time scale for interpolation. This can be changed, e.g., to an equidistant spacing, by\n"
[19] " Achim Zeileis, Gabor Grothendieck 19\nsetting the second argument of na.approx. Furthermore, a different output time index can\nbe supplied as well.\nIn addition to the methods discussed above, there are also other methods for dealing with\nmissing values in zoo such as na.aggregate, na.fill, na.trim, and na.StructTS.\n2.9. Rolling functions\nA typical task to be performed on ordered observations is to evaluate some function, e.g., com-\nputing the mean, in a window of observations that is moved over the full sample period. The\nresulting statistics are usually synonymously referred to as rolling/running/moving statistics.\nIn zoo, the generic function rollapply12 is provided along with a \"zoo\" and a \"ts\" method.\nThe most important arguments are\nrollapply(data, width, FUN)\nwhere the function FUN is applied to a rolling window of size width of the observations data.\nThe function rollapply by default only evaluates the function for windows of full size width\nand then the result has width - 1 fewer observations than the original series and is aligned at\nthe center of the rolling window. Setting further arguments such as partial, align, or fill\nalso allows for rolling computations on partial windows with arbitrary aligning and flexible\nfilling. For example, without partial evaluation the ‘lost’ observations could be filled with NAs\nand aligned at the left of the sample.\nR> rollapply(Z, 5, sd)\n Aa Bb Cc\n2004-02-09 1.2814876 0.8018950 0.8218959\n2004-02-21 1.2658555 0.7891358 0.8025043\n2004-02-22 1.2102011 0.8206819 0.5319727\n2004-02-29 0.8662296 0.5266261 0.6411751\n2004-03-05 0.9363400 1.7011273 0.6356144\n2004-03-10 0.9508642 1.6892246 0.9578196\nR> rollapply(Z, 5, sd, fill = NA, align = \"left\")\n Aa Bb Cc\n2004-02-02 1.2814876 0.8018950 0.8218959\n2004-02-08 1.2658555 0.7891358 0.8025043\n2004-02-09 1.2102011 0.8206819 0.5319727\n2004-02-21 0.8662296 0.5266261 0.6411751\n2004-02-22 0.9363400 1.7011273 0.6356144\n2004-02-29 0.9508642 1.6892246 0.9578196\n2004-03-05 NA NA NA\n 12\n In previous versions of zoo, this function was called rapply. It was renamed because from R 2.4.0 on, base\nR provides a different function rapply for recursive (and not rolling) application of functions. The function\nzoo::rapply is still provided for backward compatibility, however it dispatches now to rollapply methods.\n"
[20] "20 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\n2004-03-10 NA NA NA\n2004-03-14 NA NA NA\n2004-03-20 NA NA NA\nTo improve the performance of rollapply(x, k, foo) for some frequently used functions foo,\nmore efficient implementations rollfoo(x, k) are available (and also called by rollapply).\nCurrently, these are the generic functions rollmean, rollmedian and rollmax which have\nmethods for \"zoo\" and \"ts\" series and a default method for plain vectors.\nR> rollmean(z2, 5, fill = NA)\n 2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27\n NA NA 0.0005792538 0.0031770388 -0.1139910497\n 2004-02-07 2004-02-12 2004-02-16 2004-02-20 2004-02-24\n-0.4185778750 -0.2013054791 0.0087574946 NA NA\n 3. Combining zoo with other packages\nThe main purpose of the package zoo is to provide basic infrastructure for working with\nindexed totally ordered observations that can be either employed by users directly or can be\na basic ingredient on top of which other packages can build. The latter is illustrated with\na few brief examples involving the packages strucchange, tseries and timeDate/fCalendar in\nthis section. Finally, the classes \"yearmon\" and \"yearqtr\" (provided in zoo) are used for\nillustrating how zoo can be extended by creating a new index class.\n3.1. strucchange: Empirical fluctuation processes\nThe package strucchange provides a collection of methods for testing, monitoring and dating\nstructural changes, in particular in linear regression models. Tests for structural change assess\nwhether the parameters of a model remain constant over an ordering with respect to a specified\nvariable, usually time. To adequately store and visualize empirical fluctuation processes\nwhich capture instabilities over this ordering, a data type for indexed ordered observations is\nrequired. This was the motivation for starting the zoo project.\nA simple example for the need of \"zoo\" objects in strucchange which can not be (easily)\nimplemented by other irregular time series classes available in R is described in the following.\nWe assess the constancy of the electrical resistance over the apparent juice content of kiwi\nfruits.13 The data set fruitohms is contained in the DAAG package (Maindonald and Braun\n2009). The fitted ocus object contains the OLS-based CUSUM process for the mean of the\nelectrical resistance (variable ohms) indexed by the juice content (variable juice).\nR> library(\"strucchange\")\nR> library(\"DAAG\")\nR> data(\"fruitohms\")\nR> ocus <- gefp(ohms ~ 1, order.by = ~ juice, data = fruitohms)\n 13\n A different approach would be to test whether the slope of a regression of electrical resistance on juice\ncontent changes with increasing juice content, i.e., to test for instabilities in ohms ~ juice instead of ohms ~\n1. Both lead to similar results.\n"
[21] " Achim Zeileis, Gabor Grothendieck 21\nR> plot(ocus)\n M−fluctuation test\n Empirical fluctuation process\n 4\n 3\n 2\n 1\n 0\n 10 20 30 40 50 60\n juice\n Figure 3: Empirical M-fluctuation process for fruitohms data\nThis OLS-based CUSUM process can be visualized using the plot method for \"gefp\" objects\nwhich builds on the \"zoo\" method and yields in this case the plot in Figure 3 showing the\nprocess which crosses its 5% critical value and thus signals a significant decrease in the mean\nelectrical resistance over the juice content. For more information on the package strucchange\nand the function gefp see Zeileis et al. (2002) and Zeileis (2006).\n3.2. tseries: Historical financial data\nThis section was written when tseries did not yet support \"zoo\" series directly. For historical\nreasons and completeness, the example is still included but for practical purposes it is not\nrelevant anymore because, from version 0.9-30 on, get.hist.quote returns a \"zoo\" series\nby default.\nA typical application for irregular time series which became increasingly important over the\nlast years in computational statistics and finance is daily (or higher frequency) financial data.\nThe package tseries provides the function get.hist.quote for obtaining historical financial\ndata by querying Yahoo! Finance at http://finance.yahoo.com/, an online portal quoting\ndata provided by Reuters. The following code queries the quotes of Microsoft Corp. starting\nfrom 2001-01-01 until 2004-09-30:\nR> library(\"tseries\")\nR> MSFT <- get.hist.quote(instrument = \"MSFT\", start = \"2001-01-01\",\n+ end = \"2004-09-30\", origin = \"1970-01-01\", retclass = \"ts\")\nIn the returned MSFT object the irregular data is stored by extending it in a regular grid and\nfilling the gaps with NAs. The time is stored in days starting from an origin, in this case\n"
[22] "22 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\nspecified to be 1970-01-01, the origin used by the \"Date\" class. This series can be transformed\neasily into a \"zoo\" series using a \"Date\" index.\nR> MSFT <- as.zoo(MSFT)\nR> index(MSFT) <- as.Date(index(MSFT))\nR> MSFT <- na.omit(MSFT)\nBecause this is daily data, the series has a natural underlying regularity. Thus, as.zoo()\nreturns a \"zooreg\" object by default. To treat it as an irregular series as.zoo() can be\napplied a second time, yielding a \"zoo\" series. The corresponding log-difference returns are\ndepicted in Figure 4.\nR> MSFT <- as.zoo(MSFT)\n3.3. timeDate/fCalendar: Indexes of class \"timeDate\"\nThe original version of this section was written when fCalendar (now: timeDate) and\nzoo did not yet include enough methods to attach \"timeDate\" indexes to \"zoo\" series. For\nhistorical reasons and completeness, we still briefly comment on the communcation between\nthe packages and their classes.\nAlthough the methods in zoo work out of the box for many index classes, it might be necessary\nfor some index classes to provide c(), length(), [, ORDER() and MATCH() methods such\nthat the methods in zoo work properly. Previously, this was the case \"timeDate\" from the\nfCalendar package which is why it was used as an example in this vigntte. Meanwhile however,\nboth zoo and fCalendar/timeDate have been enhanced: The latter contains the methods for\nc(), length(), and [, while zoo has methods for ORDER() and MATCH() for class \"timeDate\".\nThe last two functions essentially work by coercing to the underlying \"POSIXct\" and then\nusing the associated methods.\nThe following example illustrates how z2 can be transformed to use the \"timeDate\" class.\nR> library(\"timeDate\")\nR> z2td <- zoo(coredata(z2), timeDate(index(z2), FinCenter = \"GMT\"))\nR> z2td\n 2004-01-05 2004-01-14 2004-01-19 2004-01-25 2004-01-27 2004-02-07\n 0.94306673 -0.04149429 0.59448077 -0.52575918 -0.96739776 0.95605566\n 2004-02-12 2004-02-16 2004-02-20 2004-02-24\n-0.62733473 -0.92845336 0.56060280 0.08291711\n3.4. The classes \"yearmon\" and \"yearqtr\": Roll your own index\nOne of the strengths of the zoo package is its independence of the index class, such that the\nindex can be easily customized. The previous section already explained how an existing class\n(\"timeDate\") can be used as the index if the necessary methods are created. This section\nhas a similar but slightly different focus: it describes how new index classes can be created\n"
[23] " Achim Zeileis, Gabor Grothendieck 23\nR> plot(diff(log(MSFT)))\n diff(log(MSFT))\n 0.0\n −0.2\n Open\n −0.4\n −0.6\n 0.0\n −0.2\n High\n −0.4\n −0.6\n 0.0\n −0.2\n Low\n −0.4\n −0.6\n 0.0\n −0.2\n Close\n −0.4\n −0.6\n 2001 2002 2003 2004\n Index\n Figure 4: Log-difference returns for Microsoft Corp.\naddressing a certain type of indexes. These classes are \"yearmon\" and \"yearqtr\" (already\ncontained in zoo) which provide indexes for monthly and quarterly data respectively. As the\ncode is virtually identical for both classes—except that one has the frequency 12 and the\nother 4—we will only discuss \"yearmon\" explicitly.\nOf course, monthly data can simply be stored using a numeric index just as the class \"ts\"\ndoes. The problem is that this does not have the meta-information attached that this is really\n"
[24] "24 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\nspecifying monthly data which is in \"yearmon\" simply added by a class attribute. Hence, the\nclass creator is simply defined as\nyearmon <- function(x) structure(floor(12*x + .0001)/12, class = \"yearmon\")\nwhich is very similar to the as.yearmon coercion functions provided.\nAs \"yearmon\" data is now explicitly declared to describe monthly data, this can be exploited\nfor coercion to other time classes: either to coarser time scales such as \"yearqtr\" or to finer\ntime scales such as \"Date\", \"POSIXct\" or \"POSIXlt\" which by default associate the first day\nwithin a month with a \"yearmon\" observation. Adding a format and as.character method\nproduces human readable character representations of \"yearmon\" data and Ops and MATCH\nmethods complete the methods needed for conveniently working with monthly data in zoo.\nNote, that all of these methods are very simple and rather obvious (as can be seen in the zoo\nsources), but prove very helpful in the following examples.\nFirst, we create a regular series zr3 with \"yearmon\" index which leads to improved printing\ncompared to the regular series zr1 and zr2 from Section 2.2.\nR> zr3 <- zooreg(rnorm(9), start = as.yearmon(2000), frequency = 12)\nR> zr3\n Jan 2000 Feb 2000 Mar 2000 Apr 2000 May 2000 Jun 2000\n-0.30969096 0.08699142 -0.64837101 -0.62786277 -0.61932674 -0.95506154\n Jul 2000 Aug 2000 Sep 2000\n-1.91736406 0.38108885 1.51405511\nThis could be aggregated to quarterly data via\nR> aggregate(zr3, as.yearqtr, mean)\n 2000 Q1 2000 Q2 2000 Q3\n-0.2903569 -0.7340837 -0.0074067\nThe index can easily be transformed to \"Date\", the default being the first day of the month\nbut which can also be changed to the last day of the month.\nR> as.Date(index(zr3))\n[1] \"2000-01-01\" \"2000-02-01\" \"2000-03-01\" \"2000-04-01\" \"2000-05-01\"\n[6] \"2000-06-01\" \"2000-07-01\" \"2000-08-01\" \"2000-09-01\"\nR> as.Date(index(zr3), frac = 1)\n[1] \"2000-01-31\" \"2000-02-29\" \"2000-03-31\" \"2000-04-30\" \"2000-05-31\"\n[6] \"2000-06-30\" \"2000-07-31\" \"2000-08-31\" \"2000-09-30\"\n"
[25] " Achim Zeileis, Gabor Grothendieck 25\nFurthermore, \"yearmon\" indexes can easily be coerced to \"POSIXct\" such that the series\ncould be exported as a \"its\" or \"irts\" series.\nR> index(zr3) <- as.POSIXct(index(zr3))\nR> as.irts(zr3)\n2000-01-01 00:00:00 GMT -0.3097\n2000-02-01 00:00:00 GMT 0.08699\n2000-03-01 00:00:00 GMT -0.6484\n2000-04-01 00:00:00 GMT -0.6279\n2000-05-01 00:00:00 GMT -0.6193\n2000-06-01 00:00:00 GMT -0.9551\n2000-07-01 00:00:00 GMT -1.917\n2000-08-01 00:00:00 GMT 0.3811\n2000-09-01 00:00:00 GMT 1.514\nAgain, this functionality makes switching between different time scales or index representa-\ntions particularly easy and zoo provides the user with the flexibility to adjust a certain index\nto his/her problem of interest.\n 4. Summary and outlook\nThe package zoo provides an S3 class and methods for indexed totally ordered observations,\nsuch as both regular and irregular time series. Its key design goals are independence of a\nparticular index class and compatibility with standard generics similar to the behaviour of\nthe corresponding \"ts\" methods. This paper describes how these are implemented in zoo and\nillustrates the usage of the methods for plotting, merging and binding, several mathematical\noperations, extracting and replacing data and index, coercion and NA handling.\nAn indexed object of class \"zoo\" can be thought of as data plus index where the data are\nessentially vectors or matrices and the index can be a vector of (in principle) arbitrary class.\nFor (weakly) regular \"zooreg\" series, a \"frequency\" attribute is stored in addition. There-\nfore, objects of classes \"ts\", \"its\", \"irts\" and \"timeSeries\" can easily be transformed into\n\"zoo\" objects—the reverse transformation is also possible provided that the index fulfills the\nrestrictions of the respective class. Hence, the \"zoo\" class can also be used as the basis for\nother classes of indexed observations and more specific functionality can be built on top of it.\nFurthermore, it bridges the gap between irregular and regular series, facilitating operations\nsuch as NA handling compared to \"ts\".\nWhereas a lot of effort was put into achieving independence of a particular index class, the\ntypes of data that can be indexed with \"zoo\" are currently limited to vectors and matrices,\ntypically containing numeric values. Although, there is some limited support available for\nindexed factors, one important direction for future development of zoo is to add better support\nfor other objects that can also naturally be indexed including specifically factors, data frames\nand lists.\n"
[26] "26 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\n Computational details\nThe results in this paper were obtained using R 3.1.2 with the packages zoo 1.7–12, strucchange\n1.5–0, timeDate 3010.98, tseries 0.10–34 and DAAG 1.20. R itself and all packages used are\navailable from CRAN at http://CRAN.R-project.org/.\n References\nHeywood G (2009). its: Irregular Time Series. Portfolio & Risk Advisory Group and\n Commerzbank Securities. R package version 1.1.8, URL http://CRAN.R-project.org/\n package=its.\nMaindonald J, Braun WJ (2009). DAAG: Data Analysis and Graphics. R package version\n 1.01, URL http://CRAN.R-project.org/package=DAAG.\nR Development Core Team (2008). R: A Language and Environment for Statistical Computing.\n R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http:\n //www.R-project.org/.\nRyan JA, Ulrich JM (2010). xts: Extensible Time Series. R package version 0.7-1, URL\n http://CRAN.R-project.org/package=xts.\nTrapletti A (2009). tseries: Time Series Analysis and Computational Finance. R package\n version 0.10-22, URL http://CRAN.R-project.org/package=tseries.\nWuertz D (2010). Rmetrics: An Environment and Software Collection for Teaching Finan-\n cial Engineering and Computational Finance. R packages fArma, fAsianOptions, fAssets,\n fBasics, fCalendar, fCopulae, fEcofin, fExoticOptions, fExtremes, fGarch, fImport, fMulti-\n var, fNonlinear, fOptions, fPortfolio, fRegression, fSeries, fTrading, fUnitRoots, fUtilities,\n URL http://www.Rmetrics.org/.\nZeileis A (2006). “Implementing a Class of Structural Change Tests: An Econometric\n Computing Approach.” Computational Statistics & Data Analysis, 50, 2987–3008. doi:\n 10.1016/j.csda.2005.07.001.\nZeileis A, Grothendieck G (2005). “zoo: S3 Infrastructure for Regular and Irregular Time\n Series.” Journal of Statistical Software, 14(6), 1–27. URL http://www.jstatsoft.org/\n v14/i06/.\nZeileis A, Leisch F, Hornik K, Kleiber C (2002). “strucchange: An R Package for Testing\n for Structural Change in Linear Regression Models.” Journal of Statistical Software, 7(2),\n 1–38. URL http://www.jstatsoft.org/v07/i02/.\n"
[27] "Achim Zeileis, Gabor Grothendieck 27"
[28] "28 zoo: An S3 Class and Methods for Indexed Totally Ordered Observations\n A. Reference card\n Creation\n zoo(x, order.by) creation of a \"zoo\" object from the observations x (a\n vector or a matrix) and an index order.by by which\n the observations are ordered.\n For computations on arbitrary index classes, methods\n to the following generic functions are assumed to work:\n combining c(), querying length length(), subsetting\n [, ordering ORDER() and value matching MATCH(). For\n pretty printing an as.character and/or index2char\n method might be helpful.\n Creation of regular series\n zoo(x, order.by, freq) works as above but creates a \"zooreg\" object which\n inherits from \"zoo\" if the frequency freq complies\n with the index order.by. An as.numeric method\n has to be available for the index class.\n zooreg(x, start, end, freq) creates a \"zooreg\" series with a numeric index as\n above and has (almost) the same interface as ts().\n Standard methods\n plot plotting\n lines adding a \"zoo\" series to a plot\n print printing\n summary summarizing (column-wise)\n str displaying structure of \"zoo\" objects\n head, tail head and tail of \"zoo\" objects\n Coercion\n as.zoo coercion to \"zoo\" is available for objects of class \"ts\",\n \"its\", \"irts\" (plus a default method).\n as.class.zoo coercion from \"zoo\" to other classes. Currently avail-\n able for class in \"matrix\", \"vector\", \"data.frame\",\n \"list\", \"irts\", \"its\" and \"ts\".\n is.zoo querying wether an object is of class \"zoo\"\n Merging and binding\n merge union, intersection, left join, right join along indexes\n cbind column binding along the intersection of the index\n c, rbind combining/row binding (indexes may not overlap)\n aggregate compute summary statistics along a coarser grid of\n indexes\n Mathematical operations\n Ops group generic functions performed along the intersec-\n tion of indexes\n t transposing (coerces to \"matrix\" before)\n cumsum compute (columnwise) cumulative quantities: sums\n cumsum(), products cumprod(), maximum cummax(),\n minimum cummin().\n"
[29] " Achim Zeileis, Gabor Grothendieck 29\n Extracting and replacing data and index\n index, time extract the index of a series\n index<-, time<- replace the index of a series\n coredata, coredata<- extract and replace the data associated with a \"zoo\"\n object\n lag lagged observations\n diff arithmetic and geometric differences\n start, end querying start and end of a series\n window, window<- subsetting of \"zoo\" objects using their index\n NA handling\n na.omit omit NAs\n na.contiguous compute longest sequence of non-NA observations\n na.locf impute NAs by carrying forward the last observation\n na.approx impute NAs by interpolation\n na.trim remove leading and/or trailing NAs\n Rolling functions\n rollapply apply a function to rolling margin of an array\n rollmean more efficient functions for computing the rolling\n mean, median and maximum are rollmean(),\n rollmedian() and rollmax(), respectively\n Methods for regular series\n is.regular checks whether a series is weakly (or strictly if strict\n = TRUE) regular\n frequency, deltat extracts the frequency or its reciprocal value respec-\n tively from a series, for \"zoo\" series the functions try\n to determine the regularity and frequency in a data-\n driven way\n cycle gives the position in the cycle of a regular series\nAffiliation:\nAchim Zeileis\nUniversität Innsbruck\nE-mail: Achim.Zeileis@R-project.org\nGabor Grothendieck\nGKX Associates Inc.\nE-mail: ggrothendieck@gmail.com\n"
```

```
> pdf_fonts(path)
```

```
name type embedded file
1 GOLQXE+CMB10 type1c TRUE
2 OGKCHM+CMBX12 type1c TRUE
3 MAFAWY+CMSSBX10 type1c TRUE
4 AXINBO+CMR10 type1c TRUE
5 MQVSZN+CMBX10 type1c TRUE
6 GDRMFP+CMSS10 type1c TRUE
7 FDHKEF+CMTI10 type1c TRUE
8 LLMLMG+CMTT10 type1c TRUE
9 ECIDJB+CMBXSL10 type1c TRUE
10 XBTYLL+CMSL10 type1c TRUE
11 MUVTSF+CMSSI10 type1c TRUE
12 HJMIPE+CMTT12 type1c TRUE
13 ZCUJTH+CMSLTT10 type1c TRUE
14 ZVTVKO+CMR8 type1c TRUE
15 ZVTVKO+CMR6 type1c TRUE
16 TZIHOI+CMR9 type1c TRUE
17 XFSHQU+CMTT9 type1c TRUE
18 Helvetica type1 FALSE /Library/Fonts/Microsoft/Arial.ttf
19 Helvetica-Bold type1 FALSE /Library/Fonts/Microsoft/Arial Bold.ttf
20 ZapfDingbats type1 FALSE /System/Library/Fonts/ZapfDingbats.ttf
21 QAYMJR+CMMI10 type1c TRUE
22 ZDZXBN+CMTI9 type1c TRUE
23 QKDOVZ+CMSS9 type1c TRUE
24 HWQZWS+CMBXTI10 type1c TRUE
```

```
> pdf_attachments(path)
```

```
list()
```

```
> pdf_toc(path)
```

```
$title
[1] ""
$children
$children[[1]]
$children[[1]]$title
[1] "Introduction"
$children[[1]]$children
list()
$children[[2]]
$children[[2]]$title
[1] "The class \"zoo\" and its methods"
$children[[2]]$children
$children[[2]]$children[[1]]
$children[[2]]$children[[1]]$title
[1] "Creation of \"zoo\" objects"
$children[[2]]$children[[1]]$children
list()
$children[[2]]$children[[2]]
$children[[2]]$children[[2]]$title
[1] "Creation of \"zooreg\" objects"
$children[[2]]$children[[2]]$children
list()
$children[[2]]$children[[3]]
$children[[2]]$children[[3]]$title
[1] "Plotting"
$children[[2]]$children[[3]]$children
list()
$children[[2]]$children[[4]]
$children[[2]]$children[[4]]$title
[1] "Merging and binding"
$children[[2]]$children[[4]]$children
list()
$children[[2]]$children[[5]]
$children[[2]]$children[[5]]$title
[1] "Mathematical operations"
$children[[2]]$children[[5]]$children
list()
$children[[2]]$children[[6]]
$children[[2]]$children[[6]]$title
[1] "Extracting and replacing the data and the index"
$children[[2]]$children[[6]]$children
list()
$children[[2]]$children[[7]]
$children[[2]]$children[[7]]$title
[1] "Coercion to and from \"zoo\""
$children[[2]]$children[[7]]$children
list()
$children[[2]]$children[[8]]
$children[[2]]$children[[8]]$title
[1] "NA handling"
$children[[2]]$children[[8]]$children
list()
$children[[2]]$children[[9]]
$children[[2]]$children[[9]]$title
[1] "Rolling functions"
$children[[2]]$children[[9]]$children
list()
$children[[3]]
$children[[3]]$title
[1] "Combining zoo with other packages"
$children[[3]]$children
$children[[3]]$children[[1]]
$children[[3]]$children[[1]]$title
[1] "strucchange: Empirical fluctuation processes"
$children[[3]]$children[[1]]$children
list()
$children[[3]]$children[[2]]
$children[[3]]$children[[2]]$title
[1] "tseries: Historical financial data"
$children[[3]]$children[[2]]$children
list()
$children[[3]]$children[[3]]
$children[[3]]$children[[3]]$title
[1] "timeDate/fCalendar: Indexes of class \"timeDate\""
$children[[3]]$children[[3]]$children
list()
$children[[3]]$children[[4]]
$children[[3]]$children[[4]]$title
[1] "The classes \"yearmon\" and \"yearqtr\": Roll your own index"
$children[[3]]$children[[4]]$children
list()
$children[[4]]
$children[[4]]$title
[1] "Summary and outlook"
$children[[4]]$children
list()
$children[[5]]
$children[[5]]$title
[1] "Reference card"
$children[[5]]$children
list()
```

## pdf_render_page / poppler_config

PDFをビットマップに変換

### Arguments

- page
- dpi
- numeric
- opw
- upw

```
> pdf_render_page(path, dpi = 300) %>% {
+ png::writePNG(., "page.png")
+ jpeg::writeJPEG(., "page.jpeg")
+ webp::write_webp(., "page.webp")
+ }
```