Create tokenizing data.frame using Sudachi

tokenize_to_df(x, mode, instance = NULL)

Arguments

x

Input text vectors

mode

Select split mode (A, B, C)

instance

This is optional if you already have an instance of <sudachipy.tokenizer.Tokenizer> Giving them a predefined instance will speed up their execution.

Examples

if (FALSE) { tokenizer("Tokyo, Japan", mode = "A") }