mangoes.evaluation package

mangoes.evaluation.analogy module

Classes and functions to evaluate embeddings according to the “Analogy” task.

The Analogy task tries to predict the answer of the question of the form : a is to b as c is to … It uses both 3CosAdd [2] and 3CosMul [3] methods to solve them

Datasets available in this module :

  • GOOGLE for the Mikolov et al.’s (2013) Google dataset [1] . Also partitionned into :

    • GOOGLE_SEMANTIC for semantic analogies

    • GOOGLE_SYNTACTIC for syntactic analogies

  • MSR for the Mikolov et al.’s (2013) Microsoft Research dataset [2]

References

1

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

2(1,2)

Mikolov, T., Yih, W. T., & Zweig, G. (2013, June). Linguistic regularities in continuous space word representations. In hlt-Naacl (Vol. 13, pp. 746-751).

3

Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180).

class mangoes.evaluation.analogy.Dataset(name, data)

Bases: mangoes.evaluation.base.BaseDataset

Class to create a Dataset of analogies, to be used in Evaluation class

Examples

>>> from mangoes.evaluation.analogy import Dataset
>>> user_dataset = Dataset("user dataset", ['paris france london england', 'get gets do does'])
>>> capitals = Dataset("google", "../resources/en/analogy/google/semantic/capital-world.txt")

2 analogy datasets are available in this module:

  • the GOOGLE dataset, also split in GOOGLE_SEMANTIC and GOOGLE_SYNTACTIC :

>>> import mangoes.evaluation.analogy
>>> google = mangoes.evaluation.analogy.GOOGLE
>>> google_sem = mangoes.evaluation.analogy.GOOGLE_SEMANTIC
>>> google_syn = mangoes.evaluation.analogy.GOOGLE_SYNTACTIC
  • the MSR dataset :

>>> import mangoes.evaluation.analogy
>>> msr = mangoes.evaluation.analogy.MSR
Attributes
data

Methods

parse_question(question)

Parameters

get_subset

parse_file

classmethod parse_question(question)
Parameters
question: str

A splittable string with the 4 terms of the analogies

Returns
namedtuple

Examples

>>> Dataset.parse_question('paris france london england')
Analogy(abc='paris france london', gold='england')
class mangoes.evaluation.analogy.Evaluator(representation, threshold=300000)

Bases: mangoes.evaluation.base.BaseEvaluator

Methods

predict(analogies[, allowed_answers, …])

Predict the answer for the given analogy question(s).

predict(analogies, allowed_answers=1, epsilon=0.001, batch=1000)

Predict the answer for the given analogy question(s).

Parameters
analogies: str or list of str

an analogy or a list of analogies to resolve in the form ‘a b c’ : a is to b as c is to …

allowed_answers

number of answers to predict

epsilon

value to use as epsilon when computing 3CosMul

batch

As this function needs to compute the similarities between all the words in the analogies and all the words in the vocabulary, it can be memory-consuming. This parameter allowed to slice the list in batches. You can increase it to run faster or decrease it if you run out of memory.

Returns
namedtuple or dict

If the input is a single analogy, returns a tuple with both predictions using cosadd and cosmul. If the input is a list of analogies, returns a dictionary with analogies as keys and the predictions as values.

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['paris', 'france', 'london', 'england', 'belgium', 'germany'])
>>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # predict
>>> import mangoes.evaluation.analogy
>>> evaluator = mangoes.evaluation.analogy.Evaluator(representation)
>>> evaluator.predict('paris france london')
Prediction(using_cosadd=['england'], using_cosmul=['england'])
class mangoes.evaluation.analogy.Evaluation(representation, *datasets, lower=True, allowed_answers=1, epsilon=0.001, threshold=30000)

Bases: mangoes.evaluation.base.BaseEvaluation

Class to evaluate a representation on a dataset or a list of datasets

Parameters
representation: mangoes.Representation

The representation to evaluate

datasets: Dataset

The dataset(s) to use

lower: bool

Whether or not the analogies in the dataset should be lowered

allowed_answers: int

Nb of answers to consider when predicting an analogy (the analogy will be considered as correct if the expected answer is among the allowed_answers best answers)

epsilon: float

Value to be used as epsilon when computing 3CosMul

threshold: int

A threshold to reduce the size of vocabulary of the representation for fast approximate evaluation (default is 300000 as in word2vec)

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['paris', 'france', 'london', 'england', 'berlin', 'germany'])
>>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # evaluate
>>> import mangoes.evaluation.analogy
>>> dataset = Dataset("test", ['paris france london england', 'paris france berlin germany'])
>>> evaluation = mangoes.evaluation.analogy.Evaluation(representation, dataset)
>>> evaluation.get_score()
Score(cosadd=1.0, cosmul=0.5, nb=2)
>>> print(evaluation.get_report()) 
                                                            Nb questions      cosadd      cosmul
================================================================================================
test                                                                 2/2     100.00%      50.00%
------------------------------------------------------------------------------------------------

Methods

get_report([keep_duplicates, show_subsets, …])

Gets a PrintableReport for this evaluation

get_score([dataset, keep_duplicates])

Return the score(s) of the evauation

mangoes.evaluation.outlier module

Classes and functions to evaluate embeddings according to the “Outlier Detection” task.

This module implements the evaluation task defined in [1]

Datasets available in this module :

  • OD_8_8_8 [1]

  • WIKI_SEM_500 [2]

References

1(1,2)

José Camacho-Collados and Roberto Navigli. Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations. In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany, August 12, 2016.

2
class mangoes.evaluation.outlier.Dataset(name, data)

Bases: mangoes.evaluation.base.BaseDataset

Class to create a Dataset for outlier detection task, to be used in Evaluation class

The outlier is the last word of the group

Examples

>>> from mangoes.evaluation.outlier import Dataset
>>> user_dataset = Dataset("user dataset", ['january february march saturn', 'monday tuesday friday phone'])
>>> cats_dataset = Dataset("cats", "../resources/en/outlier_detection/8-8-8/Big_cats.txt")

2 analogy datasets are available in this module:

  • the 8-8-8 dataset :

>>> import mangoes.evaluation.outlier
>>> _8_8_8 = mangoes.evaluation.outlier._8_8_8
  • the Wiki Sem 500 dataset :

>>> import mangoes.evaluation.outlier
>>> msr = mangoes.evaluation.outlier.WIKI_SEM_500
Attributes
data

Methods

parse_question(question)

Parameters

get_subset

parse_file

classmethod parse_question(question)
Parameters
question: str

A splittable string with the group of words, outlier in last position

Returns
namedtuple

Examples

>>> Dataset.parse_question('january february march saturn')
'january february march saturn'
classmethod parse_file(file_content)
class mangoes.evaluation.outlier.Evaluator(representation)

Bases: mangoes.evaluation.base.BaseEvaluator

Evaluator to detect outliers in a group of words according to the given representation

Parameters
representation: mangoes.Representation

The Representation to use

Methods

predict(data)

Given a group of words or a set of group of words, predict the “outlier position” within each group

predict(data)

Given a group of words or a set of group of words, predict the “outlier position” within each group

The “outlier position” (OP) refers to [1] :

Given a set W of n + 1 words, OP is defined as the position of the outlier w_{n+1} according to the compactness score, which ranges from 0 to n (position 0 indicates the lowest overall score among all words in W, and position n indicates the highest overall score).

Parameters
data: str or iterable of str
Returns
int or dict

If a string was given, the outlier position according to the compactness score. If a list of string was given, a dict with strings as keys and outlier positions as values

References

1

José Camacho-Collados and Roberto Navigli. Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations. In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany, August 12, 2016.

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['january', 'february', 'march', 'pluto', 'mars', 'saturn'])
>>> matrix = np.array([[1.0, 0.2], [0.9, 0.1], [1.1, 0.1], [0.3, 0.9], [0.2, 1.0], [0.1, 0.9]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # predict
>>> import mangoes.evaluation.outlier
>>> evaluator = mangoes.evaluation.outlier.Evaluator(representation)
>>> evaluator.predict('january february march saturn')
4
>>> evaluator.predict(['january february march saturn', 'pluto saturn march'])
{'january february march saturn': 4, 'pluto saturn march': 3}
class mangoes.evaluation.outlier.Evaluation(representation, *datasets, lower=True)

Bases: mangoes.evaluation.base.BaseEvaluation

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['january', 'february', 'march', 'pluto', 'mars', 'saturn'])
>>> matrix = np.array([[1.0, 0.2], [0.9, 0.1], [1.1, 0.1], [0.3, 0.9], [0.2, 1.0], [0.1, 0.9]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> import mangoes.evaluation.outlier
>>> # evaluate
>>> dataset = Dataset("test", ['january february march pluto', 'mars saturn pluto march'])
>>> evaluation = mangoes.evaluation.outlier.Evaluation(representation, dataset)
>>> print(evaluation.get_score())
Score(opp=1.0, accuracy=1.0, nb=2)
>>> print(evaluation.get_report()) 
                                                            Nb questions         OPP    accuracy
================================================================================================
test                                                                 2/2     100.00%     100.00%
------------------------------------------------------------------------------------------------

Methods

get_report([keep_duplicates, show_subsets, …])

Gets a PrintableReport for this evaluation

get_score([dataset, keep_duplicates])

Return the score(s) of the evauation

mangoes.evaluation.similarity module

Classes and functions to evaluate embeddings according to the “Similarity” task.

The Similarity task computes the correlation between the similarities of word pairs according to their representation and according to human-assigned scores.

Datasets available in this module :

  • WS353 for the WordSim353 dataset (Finkelstein et al., 2002) [1]. Also partitioned by [2] into :

    • WS_SIM : WordSim Similarity

    • WS_REL : WordSim Relatedness

  • RG65 for Rubenstein and Goodenough (1965) dataset [3]

  • RAREWORD for the Luong et al.’s (2013) Rare Word (RW) Similarity Dataset [4]

  • MEN for the Bruni et al.’s (2012) MEN dataset [5]

  • MTURK for the Radinsky et al.’s (2011) Mechanical Turk dataset [6]

References

1

Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2001, April). Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web (pp. 406-414). ACM.

2

Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, Aitor Soroa, A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches, In Proceedings of NAACL-HLT 2009.

3

Rubenstein, Herbert, and John B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.

4

Luong, T., Socher, R., & Manning, C. D. (2013, August). Better word representations with recursive neural networks for morphology. In CoNLL (pp. 104-113).

5

Bruni, E., Boleda, G., Baroni, M., & Tran, N. K. (2012, July). Distributional semantics in technicolor. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 (pp. 136-145). Association for Computational Linguistics.

6

Radinsky, K., Agichtein, E., Gabrilovich, E., & Markovitch, S. (2011, March). A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on World wide web (pp. 337-346). ACM.

class mangoes.evaluation.similarity.Dataset(name, data)

Bases: mangoes.evaluation.base.BaseDataset

Class to create a Dataset of word pairs similarities, to be used in Evaluation class

Examples

>>> from mangoes.evaluation.similarity import Dataset
>>> user_dataset = Dataset("user dataset", ['lion tiger 0.8', 'sun phone 0.1'])

Predefined datasets are available in this module:

>>> import mangoes.evaluation.similarity
>>> ws353 = mangoes.evaluation.similarity.WS353
Attributes
data

Methods

parse_question(question)

Parameters

get_subset

parse_file

classmethod parse_question(question)
Parameters
question: str

A splittable string with the word pair and a score

Returns
namedtuple

Examples

>>> Dataset.parse_question('lion tiger 0.8')
Similarity(word_pair=('lion', 'tiger'), gold=0.8)
class mangoes.evaluation.similarity.Evaluator(representation)

Bases: mangoes.evaluation.base.BaseEvaluator

Methods

predict(word_pairs[, metric])

Predict the similarity scores for the given word pair(s).

predict(word_pairs, metric=<function rowwise_cosine_similarity>)

Predict the similarity scores for the given word pair(s).

Parameters
word_pairs: tuple of 2 str or list of tuples of 2 str

a word pair or a list of word pairs

metric

the metric to use to compute the similarity (default : cosine)

Returns
dict

A dictionary with analogies as keys and the Predictions as values

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['lion', 'tiger', 'sun', 'moon', 'phone', 'germany'])
>>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # predict
>>> import mangoes.evaluation.similarity
>>> evaluator = mangoes.evaluation.similarity.Evaluator(representation)
>>> evaluator.predict(('lion', 'tiger'))
array([ 0.98058068])
>>> evaluator.predict([('lion', 'tiger'), ('sun', 'phone')])
{('lion', 'tiger'): 0.98058067569092011, ('sun', 'phone'): 0.70710678118654757}
class mangoes.evaluation.similarity.Evaluation(representation, *datasets, lower=True, metric=<function rowwise_cosine_similarity>)

Bases: mangoes.evaluation.base.BaseEvaluation

Class to evaluate a representation on a dataset or a list of datasets

Both Pearson and Spearman coefficient are given.

Parameters
representation: mangoes.Representation

The representation to evaluate

datasets: Dataset

The dataset(s) to use

lower: bool

Whether or not the analogies in the dataset should be lowered

metric

the metric to use to compute the similarity (default : cosine)

Examples

>>> # create a representation
>>> import numpy as np
>>> import mangoes
>>> vocabulary = mangoes.Vocabulary(['lion', 'tiger', 'sun', 'moon', 'phone', 'germany'])
>>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]])
>>> representation = mangoes.Embeddings(vocabulary, matrix)
>>> # evaluate
>>> import mangoes.evaluation.similarity
>>> dataset = Dataset("test", ['lion tiger 0.8', 'sun moon 0.8', 'phone germany 0.3'])
>>> evaluation = mangoes.evaluation.similarity.Evaluation(representation, dataset)
>>> evaluation.get_score() 
Score(pearson=Coeff(coeff=-0.40705977800644011, pvalue=0.73310813349301363),
      spearman=Coeff(coeff=0.0, pvalue=1.0), nb=3)
>>> print(evaluation.get_report()) 
                                                                          pearson       spearman
                                                      Nb questions        (p-val)        (p-val)
================================================================================================
test                                                           3/3  -0.407(7e-01)     0.0(1e+00)
------------------------------------------------------------------------------------------------

Methods

get_report([keep_duplicates, show_subsets, …])

Gets a PrintableReport for this evaluation

get_score([dataset, keep_duplicates])

Return the score(s) of the evauation