mangoes.evaluation package¶

mangoes.evaluation.analogy module¶

Classes and functions to evaluate embeddings according to the “Analogy” task.

The Analogy task tries to predict the answer of the question of the form : a is to b as c is to … It uses both 3CosAdd [2] and 3CosMul [3] methods to solve them

Datasets available in this module :

GOOGLE for the Mikolov et al.’s (2013) Google dataset [1] . Also partitionned into :
- GOOGLE_SEMANTIC for semantic analogies
- GOOGLE_SYNTACTIC for syntactic analogies
MSR for the Mikolov et al.’s (2013) Microsoft Research dataset [2]

References¶

1: Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
2(1,2): Mikolov, T., Yih, W. T., & Zweig, G. (2013, June). Linguistic regularities in continuous space word representations. In hlt-Naacl (Vol. 13, pp. 746-751).
3: Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180).

class mangoes.evaluation.analogy.Dataset(name, data)¶

Bases: mangoes.evaluation.base.BaseDataset

Class to create a Dataset of analogies, to be used in Evaluation class

Examples

>>> from mangoes.evaluation.analogy import Dataset
>>> user_dataset = Dataset("user dataset", ['paris france london england', 'get gets do does'])
>>> capitals = Dataset("google", "../resources/en/analogy/google/semantic/capital-world.txt")

2 analogy datasets are available in this module:

the GOOGLE dataset, also split in GOOGLE_SEMANTIC and GOOGLE_SYNTACTIC :

>>> import mangoes.evaluation.analogy
>>> google = mangoes.evaluation.analogy.GOOGLE
>>> google_sem = mangoes.evaluation.analogy.GOOGLE_SEMANTIC
>>> google_syn = mangoes.evaluation.analogy.GOOGLE_SYNTACTIC

the MSR dataset :

>>> import mangoes.evaluation.analogy
>>> msr = mangoes.evaluation.analogy.MSR

Attributes

data

Methods

parse_question(question)

Parameters

get_subset
parse_file

classmethod parse_question(question)¶

Parameters

question: str: A splittable string with the 4 terms of the analogies

Returns

namedtuple

Examples

>>> Dataset.parse_question('paris france london england')
Analogy(abc='paris france london', gold='england')

class mangoes.evaluation.analogy.Evaluator(representation, threshold=300000)¶

Bases: mangoes.evaluation.base.BaseEvaluator

Methods

predict(analogies[, allowed_answers, …])

Predict the answer for the given analogy question(s).

predict(analogies, allowed_answers=1, epsilon=0.001, batch=1000)¶

Predict the answer for the given analogy question(s).

Parameters

analogies: str or list of str: an analogy or a list of analogies to resolve in the form ‘a b c’ : a is to b as c is to …
allowed_answers: number of answers to predict
epsilon: value to use as epsilon when computing 3CosMul
batch: As this function needs to compute the similarities between all the words in the analogies and all the words in the vocabulary, it can be memory-consuming. This parameter allowed to slice the list in batches. You can increase it to run faster or decrease it if you run out of memory.

Returns

namedtuple or dict: If the input is a single analogy, returns a tuple with both predictions using cosadd and cosmul. If the input is a list of analogies, returns a dictionary with analogies as keys and the predictions as values.

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['paris', 'france', 'london', 'england', 'belgium', 'germany'])
>>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # predict
>>> import mangoes.evaluation.analogy
>>> evaluator = mangoes.evaluation.analogy.Evaluator(representation)
>>> evaluator.predict('paris france london')
Prediction(using_cosadd=['england'], using_cosmul=['england'])

class mangoes.evaluation.analogy.Evaluation(representation, *datasets, lower=True, allowed_answers=1, epsilon=0.001, threshold=30000)¶

Bases: mangoes.evaluation.base.BaseEvaluation

Class to evaluate a representation on a dataset or a list of datasets

Parameters

representation: mangoes.Representation: The representation to evaluate
datasets: Dataset: The dataset(s) to use
lower: bool: Whether or not the analogies in the dataset should be lowered
allowed_answers: int: Nb of answers to consider when predicting an analogy (the analogy will be considered as correct if the expected answer is among the allowed_answers best answers)
epsilon: float: Value to be used as epsilon when computing 3CosMul
threshold: int: A threshold to reduce the size of vocabulary of the representation for fast approximate evaluation (default is 300000 as in word2vec)

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['paris', 'france', 'london', 'england', 'berlin', 'germany'])
>>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # evaluate
>>> import mangoes.evaluation.analogy
>>> dataset = Dataset("test", ['paris france london england', 'paris france berlin germany'])
>>> evaluation = mangoes.evaluation.analogy.Evaluation(representation, dataset)
>>> evaluation.get_score()
Score(cosadd=1.0, cosmul=0.5, nb=2)
>>> print(evaluation.get_report()) 
                                                            Nb questions      cosadd      cosmul
================================================================================================
test                                                                 2/2     100.00%      50.00%
------------------------------------------------------------------------------------------------

Methods

`get_report`([keep_duplicates, show_subsets, …])	Gets a PrintableReport for this evaluation
`get_score`([dataset, keep_duplicates])	Return the score(s) of the evauation

mangoes.evaluation.outlier module¶

Classes and functions to evaluate embeddings according to the “Outlier Detection” task.

This module implements the evaluation task defined in [1]

Datasets available in this module :

OD_8_8_8 [1]
WIKI_SEM_500 [2]

References¶

1(1,2): José Camacho-Collados and Roberto Navigli. Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations. In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany, August 12, 2016.
2

class mangoes.evaluation.outlier.Dataset(name, data)¶

Bases: mangoes.evaluation.base.BaseDataset

Class to create a Dataset for outlier detection task, to be used in Evaluation class

The outlier is the last word of the group

Examples

>>> from mangoes.evaluation.outlier import Dataset
>>> user_dataset = Dataset("user dataset", ['january february march saturn', 'monday tuesday friday phone'])
>>> cats_dataset = Dataset("cats", "../resources/en/outlier_detection/8-8-8/Big_cats.txt")

2 analogy datasets are available in this module:

the 8-8-8 dataset :

>>> import mangoes.evaluation.outlier
>>> _8_8_8 = mangoes.evaluation.outlier._8_8_8

the Wiki Sem 500 dataset :

>>> import mangoes.evaluation.outlier
>>> msr = mangoes.evaluation.outlier.WIKI_SEM_500

Attributes

data

Methods

parse_question(question)

Parameters

get_subset
parse_file

classmethod parse_question(question)¶

Parameters

question: str: A splittable string with the group of words, outlier in last position

Returns

namedtuple

Examples

>>> Dataset.parse_question('january february march saturn')
'january february march saturn'

classmethod parse_file(file_content)¶

class mangoes.evaluation.outlier.Evaluator(representation)¶

Bases: mangoes.evaluation.base.BaseEvaluator

Evaluator to detect outliers in a group of words according to the given representation

Parameters

representation: mangoes.Representation: The Representation to use

Methods

predict(data)

Given a group of words or a set of group of words, predict the “outlier position” within each group

predict(data)¶

Given a group of words or a set of group of words, predict the “outlier position” within each group

The “outlier position” (OP) refers to [1] :

Given a set W of n + 1 words, OP is defined as the position of the outlier w_{n+1} according to the compactness score, which ranges from 0 to n (position 0 indicates the lowest overall score among all words in W, and position n indicates the highest overall score).

Parameters

data: str or iterable of str

Returns

int or dict: If a string was given, the outlier position according to the compactness score. If a list of string was given, a dict with strings as keys and outlier positions as values

References

1: José Camacho-Collados and Roberto Navigli. Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations. In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany, August 12, 2016.

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['january', 'february', 'march', 'pluto', 'mars', 'saturn'])
>>> matrix = np.array([[1.0, 0.2], [0.9, 0.1], [1.1, 0.1], [0.3, 0.9], [0.2, 1.0], [0.1, 0.9]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # predict
>>> import mangoes.evaluation.outlier
>>> evaluator = mangoes.evaluation.outlier.Evaluator(representation)
>>> evaluator.predict('january february march saturn')
4
>>> evaluator.predict(['january february march saturn', 'pluto saturn march'])
{'january february march saturn': 4, 'pluto saturn march': 3}

class mangoes.evaluation.outlier.Evaluation(representation, *datasets, lower=True)¶

Bases: mangoes.evaluation.base.BaseEvaluation

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['january', 'february', 'march', 'pluto', 'mars', 'saturn'])
>>> matrix = np.array([[1.0, 0.2], [0.9, 0.1], [1.1, 0.1], [0.3, 0.9], [0.2, 1.0], [0.1, 0.9]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> import mangoes.evaluation.outlier
>>> # evaluate
>>> dataset = Dataset("test", ['january february march pluto', 'mars saturn pluto march'])
>>> evaluation = mangoes.evaluation.outlier.Evaluation(representation, dataset)
>>> print(evaluation.get_score())
Score(opp=1.0, accuracy=1.0, nb=2)
>>> print(evaluation.get_report()) 
                                                            Nb questions         OPP    accuracy
================================================================================================
test                                                                 2/2     100.00%     100.00%
------------------------------------------------------------------------------------------------

Methods

`get_report`([keep_duplicates, show_subsets, …])	Gets a PrintableReport for this evaluation
`get_score`([dataset, keep_duplicates])	Return the score(s) of the evauation

mangoes.evaluation.similarity module¶

Classes and functions to evaluate embeddings according to the “Similarity” task.

The Similarity task computes the correlation between the similarities of word pairs according to their representation and according to human-assigned scores.

Datasets available in this module :

WS353 for the WordSim353 dataset (Finkelstein et al., 2002) [1]. Also partitioned by [2] into :
- WS_SIM : WordSim Similarity
- WS_REL : WordSim Relatedness
RG65 for Rubenstein and Goodenough (1965) dataset [3]
RAREWORD for the Luong et al.’s (2013) Rare Word (RW) Similarity Dataset [4]
MEN for the Bruni et al.’s (2012) MEN dataset [5]
MTURK for the Radinsky et al.’s (2011) Mechanical Turk dataset [6]

References¶

1: Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2001, April). Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web (pp. 406-414). ACM.
2: Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, Aitor Soroa, A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches, In Proceedings of NAACL-HLT 2009.
3: Rubenstein, Herbert, and John B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.
4: Luong, T., Socher, R., & Manning, C. D. (2013, August). Better word representations with recursive neural networks for morphology. In CoNLL (pp. 104-113).
5: Bruni, E., Boleda, G., Baroni, M., & Tran, N. K. (2012, July). Distributional semantics in technicolor. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 (pp. 136-145). Association for Computational Linguistics.
6: Radinsky, K., Agichtein, E., Gabrilovich, E., & Markovitch, S. (2011, March). A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on World wide web (pp. 337-346). ACM.

class mangoes.evaluation.similarity.Dataset(name, data)¶

Bases: mangoes.evaluation.base.BaseDataset

Class to create a Dataset of word pairs similarities, to be used in Evaluation class

Examples

>>> from mangoes.evaluation.similarity import Dataset
>>> user_dataset = Dataset("user dataset", ['lion tiger 0.8', 'sun phone 0.1'])

Predefined datasets are available in this module:

>>> import mangoes.evaluation.similarity
>>> ws353 = mangoes.evaluation.similarity.WS353

Attributes

data

Methods

parse_question(question)

Parameters

get_subset
parse_file

classmethod parse_question(question)¶

Parameters

question: str: A splittable string with the word pair and a score

Returns

namedtuple

Examples

>>> Dataset.parse_question('lion tiger 0.8')
Similarity(word_pair=('lion', 'tiger'), gold=0.8)

class mangoes.evaluation.similarity.Evaluator(representation)¶

Bases: mangoes.evaluation.base.BaseEvaluator

Methods

predict(word_pairs[, metric])

Predict the similarity scores for the given word pair(s).

predict(word_pairs, metric=<function rowwise_cosine_similarity>)¶

Predict the similarity scores for the given word pair(s).

Parameters

word_pairs: tuple of 2 str or list of tuples of 2 str: a word pair or a list of word pairs
metric: the metric to use to compute the similarity (default : cosine)

Returns

dict: A dictionary with analogies as keys and the Predictions as values

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['lion', 'tiger', 'sun', 'moon', 'phone', 'germany'])
>>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # predict
>>> import mangoes.evaluation.similarity
>>> evaluator = mangoes.evaluation.similarity.Evaluator(representation)
>>> evaluator.predict(('lion', 'tiger'))
array([ 0.98058068])
>>> evaluator.predict([('lion', 'tiger'), ('sun', 'phone')])
{('lion', 'tiger'): 0.98058067569092011, ('sun', 'phone'): 0.70710678118654757}

class mangoes.evaluation.similarity.Evaluation(representation, *datasets, lower=True, metric=<function rowwise_cosine_similarity>)¶

Bases: mangoes.evaluation.base.BaseEvaluation

Class to evaluate a representation on a dataset or a list of datasets

Both Pearson and Spearman coefficient are given.

Parameters

representation: mangoes.Representation: The representation to evaluate
datasets: Dataset: The dataset(s) to use
lower: bool: Whether or not the analogies in the dataset should be lowered
metric: the metric to use to compute the similarity (default : cosine)

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['lion', 'tiger', 'sun', 'moon', 'phone', 'germany'])
>>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # evaluate
>>> import mangoes.evaluation.similarity
>>> dataset = Dataset("test", ['lion tiger 0.8', 'sun moon 0.8', 'phone germany 0.3'])
>>> evaluation = mangoes.evaluation.similarity.Evaluation(representation, dataset)
>>> evaluation.get_score() 
Score(pearson=Coeff(coeff=-0.40705977800644011, pvalue=0.73310813349301363),
      spearman=Coeff(coeff=0.0, pvalue=1.0), nb=3)
>>> print(evaluation.get_report()) 
                                                                          pearson       spearman
                                                      Nb questions        (p-val)        (p-val)
================================================================================================
test                                                           3/3  -0.407(7e-01)     0.0(1e+00)
------------------------------------------------------------------------------------------------

Methods

`get_report`([keep_duplicates, show_subsets, …])	Gets a PrintableReport for this evaluation
`get_score`([dataset, keep_duplicates])	Return the score(s) of the evauation