I'm playing around with some box score data that I downloaded from retrosheet.org. Instead of providing a run total for the home and away team, the data provides a line score in the following format: "10030(11)02x"
where each digit represents an inning. A number in () indicates more than 9 runs scored in an inning and x represents a half inning in which the team did not bat (the home team was ahead at the bottom of the 9th inning).
I'm trying to figure out a way to systematically sum up the total runs using a function. Ideally I could run something like this:
f("10030(11)02x") = 17
I'm using sum(sapply(strsplit("10001000x", ""), as.numeric), na.rm=T) to compute a sum for all observations that don't contain an inning with double digits, but I'm struggling figuring out how to deal with the double digit innings and parenthesis.
Aucun commentaire:
Enregistrer un commentaire