The do_transliterate
option may be helpful when trying to work with multiple files from the same competition, since different text encodings may be used on different files. This can lead to e.g. multiple versions of the same team name. Transliterating can help avoid this, at the cost of losing e.g. diacriticals. Transliteration is applied after converting from the specified text encoding to UTF-8. Common encodings used with DataVolley files include "windows-1252" (western Europe), "windows-1250" (central Europe), "iso-8859-1" (western Europe and Americas), "iso-8859-2" (central/eastern Europe), "iso-8859-13" (Baltic languages)
Usage
read_dv(
filename,
insert_technical_timeouts = TRUE,
do_warn = FALSE,
do_transliterate = FALSE,
encoding = "guess",
date_format = "guess",
extra_validation = 2,
validation_options = list(),
surname_case = "asis",
skill_evaluation_decode = "default",
custom_code_parser,
metadata_only = FALSE,
verbose = FALSE,
edited_meta
)
dv_read(
filename,
insert_technical_timeouts = TRUE,
do_warn = FALSE,
do_transliterate = FALSE,
encoding = "guess",
date_format = "guess",
extra_validation = 2,
validation_options = list(),
surname_case = "asis",
skill_evaluation_decode = "default",
custom_code_parser,
metadata_only = FALSE,
verbose = FALSE,
edited_meta
)
Arguments
- filename
string: file name to read
- insert_technical_timeouts
logical or list: should we insert technical timeouts? If TRUE, technical timeouts are inserted at points 8 and 16 of sets 1--4 (for indoor files) or when the team scores sum to 21 in sets 1--2 (beach). Otherwise a two-element list can be supplied, giving the scores at which technical timeouts will be inserted for sets 1--4, and set 5.
- do_warn
logical: should we issue warnings about the contents of the file as we read it?
- do_transliterate
logical: should we transliterate all text to ASCII? See details
- encoding
character: text encoding to use. Text is converted from this encoding to UTF-8. A vector of multiple encodings can be provided, and this function will attempt to choose the best. If encoding is "guess", the encoding will be guessed
- date_format
string: the expected date format (one of "ymd", "mdy", or "dmy") or "guess". If
date_format
is something other than "guess", that date format will be preferred where dates are ambiguous- extra_validation
numeric: should we run some extra validation checks on the file? 0=no extra validation, 1=check only for major errors, 2=somewhat more extensive, 3=the most extra checking
- validation_options
list: additional options to pass to the validation step. See
help('validate_dv')
for details- surname_case
string or function: should we change the case of player surnames? If
surname_case
is a string, valid values are "upper","lower","title", or "asis"; otherwisesurname_case
may be a function that will be applied to the player surname strings- skill_evaluation_decode
function or string: if
skill_evaluation_decode
is a string, it can be either "default" (use the default DataVolley conventions for dvw or vsm files), "volleymetrics" (to follow the scouting conventions used by VolleyMetrics), "german" (same as "default" but with B/ and B= swapped), or "guess" (use volleymetrics if it looks like a VolleyMetrics file, otherwise default). Ifskill_evaluation_decode
is a function, it should convert skill evaluation codes into meaningful phrases. Seeskill_evaluation_decoder
- custom_code_parser
function: function to process any custom codes that might be present in the datavolley file. This function takes one input (the
datavolley
object) and should return a list with two named components:plays
andmessages
- metadata_only
logical: don't process the plays component of the file, just the match and player metadata
- verbose
logical: if TRUE, show progress
- edited_meta
list: [very much experimental] if supplied, will be used in place of the metadata present in the file itself. This makes it possible to, for example, read a file, edit the metadata, and re-parse the file but using the modified metadata
Value
A named list with several elements. meta
provides match metadata, plays
is the main play-by-play data in the form of a data.frame. raw
is the line-by-line content of the datavolley file. messages
is a data.frame describing any inconsistencies found in the file.
Examples
if (FALSE) {
## to read the example file bundled with the package
myfile <- dv_example_file()
x <- dv_read(myfile, insert_technical_timeouts=FALSE)
summary(x)
## or to read your own file:
x <- dv_read("c:/some/path/myfile.dvw", insert_technical_timeouts=FALSE)
## Insert a technical timeout at point 12 in sets 1 to 4:
x <- dv_read(myfile, insert_technical_timeouts=list(c(12),NULL))
## to read a VolleyMetrics file
x <- dv_read(myfile, skill_evaluation_decode = "volleymetrics")
}