mangoes.evaluation package¶
mangoes.evaluation.analogy module¶
Classes and functions to evaluate embeddings according to the “Analogy” task.
The Analogy task tries to predict the answer of the question of the form : a is to b as c is to … It uses both 3CosAdd [2] and 3CosMul [3] methods to solve them
Datasets available in this module :
GOOGLE for the Mikolov et al.’s (2013) Google dataset [1] . Also partitionned into :
GOOGLE_SEMANTIC for semantic analogies
GOOGLE_SYNTACTIC for syntactic analogies
MSR for the Mikolov et al.’s (2013) Microsoft Research dataset [2]
References¶
- 1
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
- 2(1,2)
Mikolov, T., Yih, W. T., & Zweig, G. (2013, June). Linguistic regularities in continuous space word representations. In hlt-Naacl (Vol. 13, pp. 746-751).
- 3
Levy, O., Goldberg, Y., & Ramat-Gan, I. (2014). Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL (pp. 171–180).
-
class
mangoes.evaluation.analogy.
Dataset
(name, data)¶ Bases:
mangoes.evaluation.base.BaseDataset
Class to create a Dataset of analogies, to be used in Evaluation class
Examples
>>> from mangoes.evaluation.analogy import Dataset >>> user_dataset = Dataset("user dataset", ['paris france london england', 'get gets do does']) >>> capitals = Dataset("google", "../resources/en/analogy/google/semantic/capital-world.txt")
2 analogy datasets are available in this module:
the GOOGLE dataset, also split in GOOGLE_SEMANTIC and GOOGLE_SYNTACTIC :
>>> import mangoes.evaluation.analogy >>> google = mangoes.evaluation.analogy.GOOGLE >>> google_sem = mangoes.evaluation.analogy.GOOGLE_SEMANTIC >>> google_syn = mangoes.evaluation.analogy.GOOGLE_SYNTACTIC
the MSR dataset :
>>> import mangoes.evaluation.analogy >>> msr = mangoes.evaluation.analogy.MSR
- Attributes
- data
Methods
parse_question
(question)- Parameters
get_subset
parse_file
-
classmethod
parse_question
(question)¶ - Parameters
- question: str
A splittable string with the 4 terms of the analogies
- Returns
- namedtuple
Examples
>>> Dataset.parse_question('paris france london england') Analogy(abc='paris france london', gold='england')
-
class
mangoes.evaluation.analogy.
Evaluator
(representation, threshold=300000)¶ Bases:
mangoes.evaluation.base.BaseEvaluator
Methods
predict
(analogies[, allowed_answers, …])Predict the answer for the given analogy question(s).
-
predict
(analogies, allowed_answers=1, epsilon=0.001, batch=1000)¶ Predict the answer for the given analogy question(s).
- Parameters
- analogies: str or list of str
an analogy or a list of analogies to resolve in the form ‘a b c’ : a is to b as c is to …
- allowed_answers
number of answers to predict
- epsilon
value to use as epsilon when computing 3CosMul
- batch
As this function needs to compute the similarities between all the words in the analogies and all the words in the vocabulary, it can be memory-consuming. This parameter allowed to slice the list in batches. You can increase it to run faster or decrease it if you run out of memory.
- Returns
- namedtuple or dict
If the input is a single analogy, returns a tuple with both predictions using cosadd and cosmul. If the input is a list of analogies, returns a dictionary with analogies as keys and the predictions as values.
Examples
>>> # create a representation >>> import numpy as np >>> import mangoes >>> vocabulary = mangoes.Vocabulary(['paris', 'france', 'london', 'england', 'belgium', 'germany']) >>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]]) >>> representation = mangoes.Embeddings(vocabulary, matrix) >>> # predict >>> import mangoes.evaluation.analogy >>> evaluator = mangoes.evaluation.analogy.Evaluator(representation) >>> evaluator.predict('paris france london') Prediction(using_cosadd=['england'], using_cosmul=['england'])
-
-
class
mangoes.evaluation.analogy.
Evaluation
(representation, *datasets, lower=True, allowed_answers=1, epsilon=0.001, threshold=30000)¶ Bases:
mangoes.evaluation.base.BaseEvaluation
Class to evaluate a representation on a dataset or a list of datasets
- Parameters
- representation: mangoes.Representation
The representation to evaluate
- datasets: Dataset
The dataset(s) to use
- lower: bool
Whether or not the analogies in the dataset should be lowered
- allowed_answers: int
Nb of answers to consider when predicting an analogy (the analogy will be considered as correct if the expected answer is among the allowed_answers best answers)
- epsilon: float
Value to be used as epsilon when computing 3CosMul
- threshold: int
A threshold to reduce the size of vocabulary of the representation for fast approximate evaluation (default is 300000 as in word2vec)
Examples
>>> # create a representation >>> import numpy as np >>> import mangoes >>> vocabulary = mangoes.Vocabulary(['paris', 'france', 'london', 'england', 'berlin', 'germany']) >>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]]) >>> representation = mangoes.Embeddings(vocabulary, matrix) >>> # evaluate >>> import mangoes.evaluation.analogy >>> dataset = Dataset("test", ['paris france london england', 'paris france berlin germany']) >>> evaluation = mangoes.evaluation.analogy.Evaluation(representation, dataset) >>> evaluation.get_score() Score(cosadd=1.0, cosmul=0.5, nb=2) >>> print(evaluation.get_report()) Nb questions cosadd cosmul ================================================================================================ test 2/2 100.00% 50.00% ------------------------------------------------------------------------------------------------
Methods
get_report
([keep_duplicates, show_subsets, …])Gets a PrintableReport for this evaluation
get_score
([dataset, keep_duplicates])Return the score(s) of the evauation
mangoes.evaluation.outlier module¶
Classes and functions to evaluate embeddings according to the “Outlier Detection” task.
This module implements the evaluation task defined in [1]
Datasets available in this module :
References¶
- 1(1,2)
José Camacho-Collados and Roberto Navigli. Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations. In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany, August 12, 2016.
- 2
-
class
mangoes.evaluation.outlier.
Dataset
(name, data)¶ Bases:
mangoes.evaluation.base.BaseDataset
Class to create a Dataset for outlier detection task, to be used in Evaluation class
The outlier is the last word of the group
Examples
>>> from mangoes.evaluation.outlier import Dataset >>> user_dataset = Dataset("user dataset", ['january february march saturn', 'monday tuesday friday phone']) >>> cats_dataset = Dataset("cats", "../resources/en/outlier_detection/8-8-8/Big_cats.txt")
2 analogy datasets are available in this module:
the 8-8-8 dataset :
>>> import mangoes.evaluation.outlier >>> _8_8_8 = mangoes.evaluation.outlier._8_8_8
the Wiki Sem 500 dataset :
>>> import mangoes.evaluation.outlier >>> msr = mangoes.evaluation.outlier.WIKI_SEM_500
- Attributes
- data
Methods
parse_question
(question)- Parameters
get_subset
parse_file
-
classmethod
parse_question
(question)¶ - Parameters
- question: str
A splittable string with the group of words, outlier in last position
- Returns
- namedtuple
Examples
>>> Dataset.parse_question('january february march saturn') 'january february march saturn'
-
classmethod
parse_file
(file_content)¶
-
class
mangoes.evaluation.outlier.
Evaluator
(representation)¶ Bases:
mangoes.evaluation.base.BaseEvaluator
Evaluator to detect outliers in a group of words according to the given representation
- Parameters
- representation: mangoes.Representation
The Representation to use
Methods
predict
(data)Given a group of words or a set of group of words, predict the “outlier position” within each group
-
predict
(data)¶ Given a group of words or a set of group of words, predict the “outlier position” within each group
The “outlier position” (OP) refers to [1] :
Given a set W of n + 1 words, OP is defined as the position of the outlier w_{n+1} according to the compactness score, which ranges from 0 to n (position 0 indicates the lowest overall score among all words in W, and position n indicates the highest overall score).
- Parameters
- data: str or iterable of str
- Returns
- int or dict
If a string was given, the outlier position according to the compactness score. If a list of string was given, a dict with strings as keys and outlier positions as values
References
- 1
José Camacho-Collados and Roberto Navigli. Find the word that does not belong: A Framework for an Intrinsic Evaluation of Word Vector Representations. In Proceedings of the ACL Workshop on Evaluating Vector Space Representations for NLP, Berlin, Germany, August 12, 2016.
Examples
>>> # create a representation >>> import numpy as np >>> import mangoes >>> vocabulary = mangoes.Vocabulary(['january', 'february', 'march', 'pluto', 'mars', 'saturn']) >>> matrix = np.array([[1.0, 0.2], [0.9, 0.1], [1.1, 0.1], [0.3, 0.9], [0.2, 1.0], [0.1, 0.9]]) >>> representation = mangoes.Embeddings(vocabulary, matrix) >>> # predict >>> import mangoes.evaluation.outlier >>> evaluator = mangoes.evaluation.outlier.Evaluator(representation) >>> evaluator.predict('january february march saturn') 4 >>> evaluator.predict(['january february march saturn', 'pluto saturn march']) {'january february march saturn': 4, 'pluto saturn march': 3}
-
class
mangoes.evaluation.outlier.
Evaluation
(representation, *datasets, lower=True)¶ Bases:
mangoes.evaluation.base.BaseEvaluation
Examples
>>> # create a representation >>> import numpy as np >>> import mangoes >>> vocabulary = mangoes.Vocabulary(['january', 'february', 'march', 'pluto', 'mars', 'saturn']) >>> matrix = np.array([[1.0, 0.2], [0.9, 0.1], [1.1, 0.1], [0.3, 0.9], [0.2, 1.0], [0.1, 0.9]]) >>> representation = mangoes.Embeddings(vocabulary, matrix) >>> import mangoes.evaluation.outlier >>> # evaluate >>> dataset = Dataset("test", ['january february march pluto', 'mars saturn pluto march']) >>> evaluation = mangoes.evaluation.outlier.Evaluation(representation, dataset) >>> print(evaluation.get_score()) Score(opp=1.0, accuracy=1.0, nb=2) >>> print(evaluation.get_report()) Nb questions OPP accuracy ================================================================================================ test 2/2 100.00% 100.00% ------------------------------------------------------------------------------------------------
Methods
get_report
([keep_duplicates, show_subsets, …])Gets a PrintableReport for this evaluation
get_score
([dataset, keep_duplicates])Return the score(s) of the evauation
mangoes.evaluation.similarity module¶
Classes and functions to evaluate embeddings according to the “Similarity” task.
The Similarity task computes the correlation between the similarities of word pairs according to their representation and according to human-assigned scores.
Datasets available in this module :
WS353 for the WordSim353 dataset (Finkelstein et al., 2002) [1]. Also partitioned by [2] into :
WS_SIM : WordSim Similarity
WS_REL : WordSim Relatedness
RG65 for Rubenstein and Goodenough (1965) dataset [3]
RAREWORD for the Luong et al.’s (2013) Rare Word (RW) Similarity Dataset [4]
MEN for the Bruni et al.’s (2012) MEN dataset [5]
MTURK for the Radinsky et al.’s (2011) Mechanical Turk dataset [6]
References¶
- 1
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., & Ruppin, E. (2001, April). Placing search in context: The concept revisited. In Proceedings of the 10th international conference on World Wide Web (pp. 406-414). ACM.
- 2
Eneko Agirre, Enrique Alfonseca, Keith Hall, Jana Kravalova, Marius Pasca, Aitor Soroa, A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches, In Proceedings of NAACL-HLT 2009.
- 3
Rubenstein, Herbert, and John B. Goodenough. Contextual correlates of synonymy. Communications of the ACM, 8(10):627–633, 1965.
- 4
Luong, T., Socher, R., & Manning, C. D. (2013, August). Better word representations with recursive neural networks for morphology. In CoNLL (pp. 104-113).
- 5
Bruni, E., Boleda, G., Baroni, M., & Tran, N. K. (2012, July). Distributional semantics in technicolor. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1 (pp. 136-145). Association for Computational Linguistics.
- 6
Radinsky, K., Agichtein, E., Gabrilovich, E., & Markovitch, S. (2011, March). A word at a time: computing word relatedness using temporal semantic analysis. In Proceedings of the 20th international conference on World wide web (pp. 337-346). ACM.
-
class
mangoes.evaluation.similarity.
Dataset
(name, data)¶ Bases:
mangoes.evaluation.base.BaseDataset
Class to create a Dataset of word pairs similarities, to be used in Evaluation class
Examples
>>> from mangoes.evaluation.similarity import Dataset >>> user_dataset = Dataset("user dataset", ['lion tiger 0.8', 'sun phone 0.1'])
Predefined datasets are available in this module:
>>> import mangoes.evaluation.similarity >>> ws353 = mangoes.evaluation.similarity.WS353
- Attributes
- data
Methods
parse_question
(question)- Parameters
get_subset
parse_file
-
classmethod
parse_question
(question)¶ - Parameters
- question: str
A splittable string with the word pair and a score
- Returns
- namedtuple
Examples
>>> Dataset.parse_question('lion tiger 0.8') Similarity(word_pair=('lion', 'tiger'), gold=0.8)
-
class
mangoes.evaluation.similarity.
Evaluator
(representation)¶ Bases:
mangoes.evaluation.base.BaseEvaluator
Methods
predict
(word_pairs[, metric])Predict the similarity scores for the given word pair(s).
-
predict
(word_pairs, metric=<function rowwise_cosine_similarity>)¶ Predict the similarity scores for the given word pair(s).
- Parameters
- word_pairs: tuple of 2 str or list of tuples of 2 str
a word pair or a list of word pairs
- metric
the metric to use to compute the similarity (default : cosine)
- Returns
- dict
A dictionary with analogies as keys and the Predictions as values
Examples
>>> # create a representation >>> import numpy as np >>> import mangoes >>> vocabulary = mangoes.Vocabulary(['lion', 'tiger', 'sun', 'moon', 'phone', 'germany']) >>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]]) >>> representation = mangoes.Embeddings(vocabulary, matrix) >>> # predict >>> import mangoes.evaluation.similarity >>> evaluator = mangoes.evaluation.similarity.Evaluator(representation) >>> evaluator.predict(('lion', 'tiger')) array([ 0.98058068]) >>> evaluator.predict([('lion', 'tiger'), ('sun', 'phone')]) {('lion', 'tiger'): 0.98058067569092011, ('sun', 'phone'): 0.70710678118654757}
-
-
class
mangoes.evaluation.similarity.
Evaluation
(representation, *datasets, lower=True, metric=<function rowwise_cosine_similarity>)¶ Bases:
mangoes.evaluation.base.BaseEvaluation
Class to evaluate a representation on a dataset or a list of datasets
Both Pearson and Spearman coefficient are given.
- Parameters
- representation: mangoes.Representation
The representation to evaluate
- datasets: Dataset
The dataset(s) to use
- lower: bool
Whether or not the analogies in the dataset should be lowered
- metric
the metric to use to compute the similarity (default : cosine)
Examples
>>> # create a representation >>> import numpy as np >>> import mangoes >>> vocabulary = mangoes.Vocabulary(['lion', 'tiger', 'sun', 'moon', 'phone', 'germany']) >>> matrix = np.array([[1, 0], [1, 0.2], [0, 1], [0, 1.2], [0.7, 0.7], [0.7, 0.8]]) >>> representation = mangoes.Embeddings(vocabulary, matrix) >>> # evaluate >>> import mangoes.evaluation.similarity >>> dataset = Dataset("test", ['lion tiger 0.8', 'sun moon 0.8', 'phone germany 0.3']) >>> evaluation = mangoes.evaluation.similarity.Evaluation(representation, dataset) >>> evaluation.get_score() Score(pearson=Coeff(coeff=-0.40705977800644011, pvalue=0.73310813349301363), spearman=Coeff(coeff=0.0, pvalue=1.0), nb=3) >>> print(evaluation.get_report()) pearson spearman Nb questions (p-val) (p-val) ================================================================================================ test 3/3 -0.407(7e-01) 0.0(1e+00) ------------------------------------------------------------------------------------------------
Methods
get_report
([keep_duplicates, show_subsets, …])Gets a PrintableReport for this evaluation
get_score
([dataset, keep_duplicates])Return the score(s) of the evauation