mangoes.utils.metrics module¶
Utility metrics functions.
-
mangoes.utils.metrics.
rowwise_cosine_similarity
(A, B)¶ Compute cosine_similarity between each corresponding rows of A and B
- Parameters
- A: matrix-like object
- B: matrix-like object
- Returns
- list of float
-
mangoes.utils.metrics.
pairwise_non_negative_cosine_similarity
(first, second, normalize=True)¶ Compute non negative cosine similary between all pairs of vectors in matrices
- Parameters
- first: matrix-like object
mangoes.utils.arrays.Matrix
with n vectors- second: matrix-like object
mangoes.utils.arrays.Matrix
with k vectors- normalize: bool
the matrices have to be normalized : if they both are, set this parameter to False
- Returns
- matrix-like object
mangoes.utils.arrays.Matrix
of shape (n x k)
-
mangoes.utils.metrics.
pairwise_cosine_similarity
(first, second, normalize=True)¶ Compute cosine similary between all pairs of vectors in matrices
- Parameters
- first: matrix-like object
a mangoes.utils.arrays.Matrix with n vectors
- second: matrix-like object
a mangoes.utils.arrays.Matrix with k vectors
- normalize: bool
the matrices have to be normalized : if they both are, set this parameter to False
- Returns
- matrix-like object
mangoes.utils.arrays.Matrix of shape (n x k)
-
mangoes.utils.metrics.
word_mover_distance
(representation, sentence1, sentence2, stopwords=None, metric='euclidean', return_flow=False, emd=None)¶ Compute the Word Mover’s Distance between two phrases
- Parameters
- representation: mangoes.Representation
Words vectors
- sentence1: str or list of str
- sentence2: str or list of str
The two sentences, phrases or documents to compare
- stopwords: list of str (optional)
List of words to ignore
- metric: str
Metric to use to compute distances between words (see Representation.pairwise_distances())
- return_flow: boolean (optional)
If True, returns the flow matrix and the corresponding words with the distance
- emd: None (default), “pot”, “pyemd” or callable
Implementation of Earth Mover’s distance to use. If None, try to import POT or pyemd and use it. If neither POT nor pyemd is installed, use the implementation in this module. You can also use your own implementation (see _earth_mover_distance())
- Returns
- float or (float, dict)
Returns the computed word mover distance. If return_flow, also returns a dictionary with the outgoing flows from the words in sentence1 to the words of sentence2
References
Matt Kusner et al. “From Word Embeddings To Document Distances”
POT: Python Optimal Transport : http://pot.readthedocs.io pyemd : https://github.com/wmayner/pyemd
Examples
>>> representation = mangoes.Embedding(...) >>> sentence_obama = 'Obama speaks to the media in Illinois' >>> sentence_president = 'The president greets the press in Chicago' >>> import nltk.corpus >>> stopwords = nltk.corpus.stopwords.words('english') >>> distance = mangoes.utils.metrics.word_mover_distance(representation, >>> sentence_obama, sentence_president, stopwords=stopwords)