mangoes.utils.metrics module

Utility metrics functions.

mangoes.utils.metrics.rowwise_cosine_similarity(A, B)

Compute cosine_similarity between each corresponding rows of A and B

Parameters
A: matrix-like object
B: matrix-like object
Returns
list of float
mangoes.utils.metrics.pairwise_non_negative_cosine_similarity(first, second, normalize=True)

Compute non negative cosine similary between all pairs of vectors in matrices

Parameters
first: matrix-like object

mangoes.utils.arrays.Matrix with n vectors

second: matrix-like object

mangoes.utils.arrays.Matrix with k vectors

normalize: bool

the matrices have to be normalized : if they both are, set this parameter to False

Returns
matrix-like object

mangoes.utils.arrays.Matrix of shape (n x k)

mangoes.utils.metrics.pairwise_cosine_similarity(first, second, normalize=True)

Compute cosine similary between all pairs of vectors in matrices

Parameters
first: matrix-like object

a mangoes.utils.arrays.Matrix with n vectors

second: matrix-like object

a mangoes.utils.arrays.Matrix with k vectors

normalize: bool

the matrices have to be normalized : if they both are, set this parameter to False

Returns
matrix-like object

mangoes.utils.arrays.Matrix of shape (n x k)

mangoes.utils.metrics.word_mover_distance(representation, sentence1, sentence2, stopwords=None, metric='euclidean', return_flow=False, emd=None)

Compute the Word Mover’s Distance between two phrases

Parameters
representation: mangoes.Representation

Words vectors

sentence1: str or list of str
sentence2: str or list of str

The two sentences, phrases or documents to compare

stopwords: list of str (optional)

List of words to ignore

metric: str

Metric to use to compute distances between words (see Representation.pairwise_distances())

return_flow: boolean (optional)

If True, returns the flow matrix and the corresponding words with the distance

emd: None (default), “pot”, “pyemd” or callable

Implementation of Earth Mover’s distance to use. If None, try to import POT or pyemd and use it. If neither POT nor pyemd is installed, use the implementation in this module. You can also use your own implementation (see _earth_mover_distance())

Returns
float or (float, dict)

Returns the computed word mover distance. If return_flow, also returns a dictionary with the outgoing flows from the words in sentence1 to the words of sentence2

References

Matt Kusner et al. “From Word Embeddings To Document Distances”

POT: Python Optimal Transport : http://pot.readthedocs.io pyemd : https://github.com/wmayner/pyemd

Examples

>>> representation = mangoes.Embedding(...)
>>> sentence_obama = 'Obama speaks to the media in Illinois'
>>> sentence_president = 'The president greets the press in Chicago'
>>> import nltk.corpus
>>> stopwords = nltk.corpus.stopwords.words('english')
>>> distance = mangoes.utils.metrics.word_mover_distance(representation,
>>>                                                      sentence_obama, sentence_president, stopwords=stopwords)