vendredi 27 février 2015

R regex to remove all except letters, apostrophes and specified multi-character strings

Is there an R regex to remove all except letters, apostrophes and specified multi-character strings? The "specified multi-character strings" are arbitrary and of arbitrary length. Let's say "~~" & && in this case (so ~ & & should be removed but not ~~ & &&)


Here I have:



gsub("[^ a-zA-Z']", "", "I like~~cake~too&&much&now.")


Which gives:



## [1] "I like~~cake~toomuchnow"


And...



gsub("[^ a-zA-Z'~&]", "", "I like~~cake~too&&much&now.")


gives...



## "I like~~cake~too&&much&now"


How can I write an R regex to give:



"I like~~caketoo&&muchnow"


EDIT Corner cases from Casimir and BrodieG...


I'd expect this behavior:



x <- c("I like~~cake~too&&much&now.", "a~~~b", "a~~~~b", "a~~~~~b", "a~&a")

## [1] "I like~~caketoo&&muchnow." "a~~b"
## [3] "a~~~~b" "a~~~~b"
## [5] "aa"


Neither of the current approaches gives this.


Aucun commentaire:

Enregistrer un commentaire