gpk: 100 Data Sets for Statistics Education


> library(gpk)
> data("Butterflies")

AIDS AIDS Data Set AirPollution Air Pollution Data
AizawlCancer Sex-wise differences in cancer types
Allergy Allergy Data Set
Asthma1 Testing Effect of Curcuma Longa
Asthma2 Testing effect of treatment on milk induced Eosinophilia in mice
Asthma3 Effect of curcuma longa on de-granulation of mast cells in mice
Asthma4 Testing effect of Curcuma longa on paw inflammation in rats
BANK Bank Churn data set
BPSYS Two drug comparison
Bacteria A multi-factorial experiment on the bacteria growth in the packaged foods
BambooGrowth Data set relating growth of bamboo to geographic location
Bamboolife Preparing a life table for the Bamboo plant
Bananabats The Bat Census data
Barleyheight Comparison of genotypes and checking time trend
BatGroup Fitting distributions to the bat group size data
Batcapture Understanding seasonality and species composition of bat population
Batrecapture Fitting a model to bat recapture data
Biodegradation Biodegradation of Dimethoate in Industrial Effluents by Brevundimonas species
BirthDeath Changes in Human birth and death rates in India over the 20th century
Butterflies Study of distribution of butterfly species count among 5 groups and in different localities in India
COWSDATA Crossbreeding of Cows
Chitalparasite Understanding the correlation of occurrence of a parasite
Cosmetic1 Testing efficacy of a cosmetic product
Crack Healing the heel
Crime Relation between crime and intelligence
DroughtStress Modeling Genotypic variation in photosynthetic competence of Sorghum bicolor
Dunglife Dung decay data
Earthquake Modeling earthquake aftershocks
EarthwormSeason Population dynamics of earthworms
Earthwormbiomass Earthworms in cultivated soils
Euphorbiaceae Relationship between tree height and girth of Euphorbiaceae
Extruder Understanding effect of manufacturing conditions on product characteristics
FAMILY Understand relationship between height of parents and child
Fairness Comparison of formulations and sample size determination of a fairness product
FilariasisSex Sex related prevalence in human filariasis
Filariasisage Infection among Filariasis
Filariasistype Filariasis and different parasites causing it
Fish Fish species interaction
Frog_survival Fitting Ricker curve to frog survival data
Frogfood Study of growth and food preference over age in frogs
Frogmating Relation between body size and number of mates for the frogs
GDS Modeling Trends in Gross Domestic Savings
Geometricbirds Rank abundance distribution of bird species
Heart Comparison of Test drug with Placebo for Heart Attack
Highjump Guessing the gold medal score for 2004 Olympics
IMR Changes in Infant mortality over last century across countries
IOCSharePrice Modeling share price series of IOC
IslandSpArea Species area relationship
Ivoryweight Trends in illegal ivory trade
Lognormalbirds Species abundance distribution
Logseriesbirds Species abundance distribution
Loops Loops of the finger prints
Lung Smoking and Lung capacity study
Mammals Birth weight and brain size of mammals
Mice Protein intake and lifespan of mice
Microgrow Fit sigmoidal model to bacterial growth
Mimosaceae Relationship between tree height and girth
OralCancer Comparison of two chemotherapy treatments for oral cancer
Plaque Studying effect of toothpaste on plaque accumulation
Plastic Seasonality in sales of plastic granules
Poliocases The number of polio cases
Preserve Predicting fungal growth
Production Quality control for examining consistency in weight
Pureforsure Detection of adulteration
Rabbit Relating Foot length to Body mass
Rat Study of rat burrow architecture
RiceWheat Modeling Rice and Wheat production
Sheeplife Fitting probability distribution to life data of Sheeps
SholapurWeather Has the weather in Sholapur changed over 3 decades?
Sorghumheight Modeling sorghum plant growth
SpaccHerb Species accumulation curve
SpaccShrubs Species accumulation curve
Spaceshuttle Modeling Space shuttle O-ring failure data
Spareabirds Species area curve
StemDensity Vegetation types and tree density
TeethNormal Modeling indicators of dental health
Tiger7 Identification of individual tigers from pugmarks
TigerIdentity Tiger census using scat samples
Timber Genetic and environmental components of tree characteristics
Valvefailure Valve characteristics and numbers of failures in a nuclear reactor
Waterquality Water quality analysis using clustering atombomb Cancer deaths of atomic bomb survivors birdextinct Bird extinct at a national park
cloudseed Cloud Seeding
elephant Age and mating success for Elephants
fishtoxin Toxicity effect on fish
hundredmrun Guessing the gold medal score for 2004 Olympics
magazine Time trends in authorship distribution
mammalsize Correlates of brain size for the mammals
moth Natural selection
salamander Habitat preference of salamander
widowbird Mate selection by females







> data("Butterflies")
> Butterflies %>% {
+   print(class(.))
+   print(dim(.))
+   print(VIM::countNA(.))
+   dplyr::tbl_df(.) %>% dplyr::sample_n(., 4)
+ }
[1] "data.frame"
[1] 44  9
[1] 0
Source: local data frame [4 x 9]

  Serial_Number                 Area         Locality Total_Species_count
          (int)               (fctr)           (fctr)               (int)
1             2     Western Himalaya Western Himalaya                 417
2            20 Other parts of India      South Bihar                 124
3            39 Other parts of India  Andaman Nikobar                 217
4            22 Other parts of India          Lucknow                 109
Variables not shown: Skippers (int), Swallow_tails (int), Whites_Yellows
  (int), Blues (int), Brush_Footed (int)