vendredi 17 avril 2015

extract the difference between two strings in r

I can't find a way to do this...



raw_string <- "\"+001\", la bonne surprise de M. Jenn M. Ayache http://goo.gl/3EXxy6 via @MYTF1News"

clean_string <- "+001, la bonne surprise de Jenn Ayache"

desired_string <- "\"\"M. M. http://goo.gl/3EXxy6 via @MYTF1News"


I am not sure about how to call this transformation. I would say "difference" (as in set theory, opposed to "union" and "intersection").


My desired string has only and all the characters missing from the clean_string, in the good order, once for every time they appear, including spaces, punctuation and everything.


The best I managed to do isn't good enough:



> a <- paste(Reduce(setdiff, strsplit(c(raw_string, clean_string), split = " ")), collapse = " ")
> a
[1] "\"+001\", M. http://goo.gl/3EXxy6 via @MYTF1News"

Aucun commentaire:

Enregistrer un commentaire