RecordLinkage: Record Linkage in R
- CRAN: http://cran.r-project.org/web/packages/RecordLinkage/index.html
- URL: https://r-forge.r-project.org/projects/recordlinkage/, http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Sariyar+Borg.pdf
> library(RecordLinkage)
Loading required package: DBI
Attaching package: 'DBI'
The following object is masked from 'package:git2r':
fetch
Loading required package: RSQLite
Loading required package: ff
Loading required package: bit
Attaching package bit
package:bit (c) 2008-2012 Jens Oehlschlaegel (GPL-2)
creators: bit bitwhich
coercion: as.logical as.integer as.bit as.bitwhich which
operator: ! & | xor != ==
querying: print length any all min max range sum summary
bit access: length<- [ [<- [[ [[<-
for more help type ?bit
Attaching package: 'bit'
The following object is masked from 'package:git2r':
clone
The following object is masked from 'package:base':
xor
Attaching package ff
- getOption("fftempdir")=="/var/folders/8f/s_lbgwks6q7g3lz52q93ngph0000gn/T//RtmpOF72Zx"
- getOption("ffextension")=="ff"
- getOption("ffdrop")==TRUE
- getOption("fffinonexit")==TRUE
- getOption("ffpagesize")==65536
- getOption("ffcaching")=="mmnoflush" -- consider "ffeachflush" if your system stalls on large writes
- getOption("ffbatchbytes")==16777216 -- consider a different value for tuning your system
- getOption("ffmaxbytes")==536870912 -- consider a different value for tuning your system
Attaching package: 'ff'
The following objects are masked from 'package:bit':
clone, clone.default, clone.list
The following objects are masked from 'package:utils':
write.csv, write.csv2
The following objects are masked from 'package:git2r':
add, clone
The following object is masked from 'package:magrittr':
add
The following objects are masked from 'package:base':
is.factor, is.ordered
Loading required package: ffbase
Attaching package: 'ffbase'
The following objects are masked from 'package:ff':
[.ff, [.ffdf, [<-.ff, [<-.ffdf
The following objects are masked from 'package:base':
%in%, table
RecordLinkage library
[c] IMBEI Mainz
Attaching package: 'RecordLinkage'
The following object is masked from 'package:ff':
clone
The following object is masked from 'package:bit':
clone
The following object is masked from 'package:git2r':
clone
バージョン: 0.4.8
関数名 | 概略 |
---|---|
%append%-methods |
Concatenate comparison patterns or classification results |
RLBigData-class |
Class "RLBigData" |
RLBigDataDedup |
Constructors for big data objects. |
RLBigDataDedup-class |
Class "RLBigDataDedup" |
RLBigDataLinkage-class |
Class "RLBigDataLinkage" |
RLResult-class |
Class "RLResult" |
RLdata500 |
Test data for Record Linkage |
RecLinkClassif-class |
Class "RecLinkClassif" |
RecLinkData-class |
Class "RecLinkData" |
RecLinkData.object |
Record Linkage Data Object |
RecLinkResult |
Record Linkage Result Object |
RecLinkResult-class |
Class "RecLinkResult" |
[.RecLinkData |
Subset operator for record linkage objects |
classifySupv |
Supervised Classification |
classifyUnsup |
Unsupervised Classification |
clone |
Serialization of record linkage object. |
compare.dedup |
Compare Records |
deleteNULLs |
Remove NULL Values |
editMatch |
Edit Matching Status |
emClassify |
Weight-based Classification of Data Pairs |
emWeights |
Calculate weights |
epiClassify |
Classify record pairs with EpiLink weights |
epiWeights |
Calculate EpiLink weights |
ff_vector-class |
Class '"ff_vector"' |
ffdf-class |
Class '"ffdf"' |
fsClassify |
Stochastic record linkage. |
genSamples |
Generate Training Set |
getErrorMeasures-methods |
Calculate Error Measures |
getExpectedSize |
Estimate number of record pairs. |
getFrequencies-methods |
Get attribute frequencies |
getMinimalTrain |
Create a minimal training set |
getPairs |
Extract Record Pairs |
getParetoThreshold |
Estimate Threshold from Pareto Distribution |
getTable-methods |
Build contingency table |
gpdEst Estimate |
Threshold from Pareto Distribution |
isFALSE |
Check for FALSE |
mygllm |
Generalized Log-Linear Fitting |
optimalThreshold |
Optimal Threshold for Record Linkage |
phonetics |
Phonetic Code |
resample |
Safe Sampling |
show |
Show a RLBigData object |
splitData |
Split Data |
strcmp |
String Metrics |
summary.RLBigData |
summary methods for '"RLBigData"' objects. |
summary.RLResult |
Summary method for '"RLResult"' objects. |
summary.RecLinkData |
Print Summary of Record Linkage Data |
trainSupv |
Train a Classifier |
unorderedPairs |
Create Unordered Pairs |
compare.dedup
Arguments
- dataset
- dataset1, dataset2
- phonetic
- strcmp
- strcmpfun
- identity, identity1, identity2
- n_match, n_non_match
> compare.dedup(RLdata500, identity=identity.RLdata500, strcmp=TRUE, blockfld=list(1,c(5,6,7)))
Error in is.data.frame(dataset): object 'RLdata500' not found
RLdata500
> data("RLdata500")
> RLdata500 %>% str()
'data.frame': 500 obs. of 7 variables:
$ fname_c1: Factor w/ 146 levels "ALEXANDER","ANDRE",..: 19 42 114 128 112 77 42 139 26 99 ...
$ fname_c2: Factor w/ 23 levels "ALEXANDER","ANDREAS",..: NA NA NA NA NA NA NA NA NA NA ...
$ lname_c1: Factor w/ 108 levels "ALBRECHT","BAUER",..: 61 2 31 106 50 23 76 61 77 30 ...
$ lname_c2: Factor w/ 8 levels "ENGEL","FISCHER",..: NA NA NA NA NA NA NA NA NA NA ...
$ by : int 1949 1968 1930 1957 1966 1929 1967 1942 1978 1971 ...
$ bm : int 7 7 4 9 1 7 8 9 3 2 ...
$ bd : int 22 27 30 2 13 4 1 20 4 27 ...
strcmp / jarowinkler / levenshteinSim / levenshteinDist
文字列の類似度を計算する
Arguments
- str1, str2
- W_1, W_2, W_3
- r
> jarowinkler("Apple", "Apple")
[1] 1
> jarowinkler("Apple", "Andreas")
[1] 0.6057143
> levenshteinSim("Andreas", c("Anreas", "Andeas"))
[1] 0.8571429 0.8571429
> jarowinkler(c("Andreas", "Borg"), c("Andreas", "Bork"))
[1] 1.0000000 0.8833333
> levenshteinSim("Andreas", c("Anreas", "Andeas"))
[1] 0.8571429 0.8571429
trainSupv
> trainSupv(rpairs, method, use.pred = FALSE, omit.possible = TRUE,
+ convert.na = TRUE, include.data = FALSE, ...)