Is there an R regex to remove all except letters, apostrophes and specified multi-character strings? The "specified multi-character strings" are arbitrary and of arbitrary length. Let's say "~~" & && in this case (so ~ & & should be removed but not ~~ & &&)
Here I have:
gsub("[^ a-zA-Z']", "", "I like~~cake~too&&much&now.")
Which gives:
## [1] "I like~~cake~toomuchnow"
And...
gsub("[^ a-zA-Z'~&]", "", "I like~~cake~too&&much&now.")
gives...
## "I like~~cake~too&&much&now"
How can I write an R regex to give:
"I like~~caketoo&&muchnow"
EDIT Corner cases from Casimir and BrodieG...
I'd expect this behavior:
x <- c("I like~~cake~too&&much&now.", "a~~~b", "a~~~~b", "a~~~~~b", "a~&a")
## [1] "I like~~caketoo&&muchnow." "a~~b"
## [3] "a~~~~b" "a~~~~b"
## [5] "aa"
Neither of the current approaches gives this.
Aucun commentaire:
Enregistrer un commentaire