lundi 2 mars 2015

python: How to calculate the cosine similarity of two lists?

I want to calculate the cosine similarity of two lists like following: A = [u'home (private)', u'bank', u'bank', u'building(condo/apartment)','factory'] B = [u'home (private)', u'school', u'bank', u'shopping mall']


I know the cosine similarity of A and B should be 3/(sqrt(7)*sqrt(4)). I try to reform the lists into forms like 'home bank bank building factory', which looks like a sentence, however, some elements (e.g. home (private)) have blank space in itself and some elements have brackets so I find it difficult to calculate the word occurrence.


Do you know how to calculate the word occurrence in this complicated list, so that for list B, word occurrence can be represented as {'home (private):1, 'school':1, 'bank': 1, 'shopping mall':1}?


Or do you know how to calculate the cosine similarity of these two lists?


Thank you very much


Aucun commentaire:

Enregistrer un commentaire