mercredi 25 février 2015

Sum list of characters using R

I'm playing around with some box score data that I downloaded from retrosheet.org. Instead of providing a run total for the home and away team, the data provides a line score in the following format: "10030(11)02x"


where each digit represents an inning. A number in () indicates more than 9 runs scored in an inning and x represents a half inning in which the team did not bat (the home team was ahead at the bottom of the 9th inning).


I'm trying to figure out a way to systematically sum up the total runs using a function. Ideally I could run something like this:



f("10030(11)02x") = 17


I'm using sum(sapply(strsplit("10001000x", ""), as.numeric), na.rm=T) to compute a sum for all observations that don't contain an inning with double digits, but I'm struggling figuring out how to deal with the double digit innings and parenthesis.


Aucun commentaire:

Enregistrer un commentaire