mangoes.context module¶

Definitions a the context of a word in a sentence

This module defines classes to be used as context parameter in the mangoes.counting.count_cooccurrence() function.

Examples¶

>>> window_3 = mangoes.context.Window(window_half_size=3)
>>> contexts = window_3(['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog'])
>>> contexts[0]
['quick', 'brown', 'fox']
>>> contexts[4]
['quick', 'brown', 'fox', 'over', 'the', 'lazy']
>>> contexts[5]
['brown', 'fox', 'jumps', 'the', 'lazy', 'dog']
>>> contexts[7]
['jumps', 'over', 'the', 'dog']

class mangoes.context.Context(vocabulary=None)¶

Bases: object

Base callable class to define the context of a word in a sentence

Parameters

vocabulary: mangoes.Vocabulary or list of string: vocabulary of the words to consider in the context. Other words are ignored.

See also

mangoes.counting.count_cooccurrence()

Attributes

params: Parameters of the context

Methods

__call__(sentence[, mask])

Returns the elements in the defined context for each word in sentence

property params¶: Parameters of the context

class mangoes.context.Sentence(vocabulary=None)¶

Bases: mangoes.context.Context

Implements sentence context

This context extracts the list of tokens in a sentence, around each word, barring the word itself.

Attributes

params: Parameters of the context

Methods

__call__(sentence[, mask])

Returns the elements in the defined context for each word in sentence

class mangoes.context.Window(vocabulary=None, size=1, dirty=False, dynamic=False, n_grams=1, distance=False, symmetric=None, window_half_size=None)¶

Bases: mangoes.context.Context

Implements window-type context

This context extracts the list of elements found in a window defined around each word, barring the word itself.

Parameters

size: int for symmetric window, tuple of 2 int if not: size of the search space to the left and to the right of each word (default = 1). If an integer is given, the window will be centered around the word; asymmetric if a couple of int
dirty: boolean (def = False): if True and some of the words in the window aren’t correct ids (ex : -1), they will not be fetched, but the window will be extended further so as to still be able to meet the quota of 2*’window_half_size’ (if symmetric) words to fetch.
dynamic: boolean: if True, the size of the actual window is sampled between 1 and window_half_size
n_grams: int (default: 1): if n_grams > 1, an element of the list will be a n-grams of words instead of a single word.
distance: boolean (default: False): if True, the distance from the target word will be added

Examples

>>> import mangoes.context
>>> sentence = 'Beautiful is better than ugly'.split()
>>> Window()(sentence)
[['is'], ['Beautiful', 'better'], ['is', 'than'], ['better', 'ugly'], ['than']]
>>> Window(size=2)(sentence) 
[['is', 'better'],
 ['Beautiful', 'better', 'than'],
 ['Beautiful', 'is', 'than', 'ugly'],
 ['is', 'better', 'ugly'],
 ['better', 'than']]
>>> Window(size=3, n_grams=2)(sentence) 
[[('is', 'better'), ('better', 'than')],
 [('better', 'than'), ('than', 'ugly')],
 [('Beautiful', 'is'), ('than', 'ugly')],
 [('Beautiful', 'is'), ('is', 'better')],
 [('is', 'better'), ('better', 'than')]]
>>> Window(size=2, distance=True)(sentence) 
 [[('is', 1), ('better', 2)],
  [('Beautiful', -1), ('better', 1), ('than', 2)],
  [('Beautiful', -2), ('is', -1), ('than', 1), ('ugly', 2)],
  [('is', -2), ('better', -1), ('ugly', 1)],
  [('better', -2), ('than', -1)]]

Attributes

params: Parameters of the context

Methods

__call__(sentence[, mask])

Returns the elements in the defined context for each word in sentence

call_on_encoded

call_on_encoded(encoded_sentence, mask=False)¶

class mangoes.context.DependencyBasedContext(vocabulary=None, entity='form', dependencies='universal-dependencies', collapse=False, labels=False, depth=1)¶

Bases: mangoes.context.Context

Implements Dependency-Based context

Returns the modifiers and the head of each element of a sentence.

Parameters

dependencies: {‘universal-dependencies’, ‘stanford-dependencies’} or callable: Representation used for dependencies annotation. Default is ‘universal-dependencies’. You can also provide your own parser
collapse: bool: Whether or not the preposition relations should be collapsed. Default is False.
labels: bool: Whether or not the labels should be added to the output contexts. Default is False

References

1: Levy, O., & Goldberg, Y. (2014, June). Dependency-Based Word Embeddings. In ACL (2) (pp. 302-308).

Examples

>>> source = mangoes.corpus.CONLLU(["1      australian      australian      ADJ     JJ      _       2       amod    _       _",
>>>                                 "2      scientist       scientist       NOUN    NN      _       3       nsubj   _       _",
>>>                                 "3      discovers       discover        VERB    VBZ     _       0       root    _       _",
>>>                                 "4      star    star    NOUN    NN      _       3       dobj    _       _",
>>>                                 "5      with    with    ADP     IN      _       6       case    _       _",
>>>                                 "6      telescope       telescope       NOUN    NN      _       3       nmod    _       _"])
>>> sentence = source.sentences().__next__()
>>> context = mangoes.context.DependencyBasedContext(labels=True)
>>> context(sentence)[1] # scientist
{"australian/amod", "discovers/nsubj-"}

Attributes

collapse
depth
labels
params: Parameters of the context

Methods

`__call__`(sentence[, mask])	Returns the elements in the defined context for each word in sentence
`stanford_dependencies_sentence_parser`(sentence)	Returns an adjacency list from a sentence annotated with Stanford Dependencies
`ud_sentence_parser`(sentence[, collapse])	Returns an adjacency list from a sentence annotated with Universal Dependencies

add_children

property collapse¶

property labels¶

property depth¶

static add_children(sentence_tree)¶

static ud_sentence_parser(sentence, collapse=False)¶

Returns an adjacency list from a sentence annotated with Universal Dependencies

Parameters

sentence: list of Tokens
collapse: boolean: Whether or not to collapse prepositions

Returns

list of same size as sentence: Returns the dependents of each token in the sentence.

Examples

>>> source = mangoes.corpus.CONLLU(["1  australian      australian      ADJ     JJ      _       2       amod    _       _",
>>>                                 "2  scientist       scientist       NOUN    NN      _       3       nsubj   _       _",
>>>                                 "3  discovers       discover        VERB    VBZ     _       0       root    _       _",
>>>                                 "4  star    star    NOUN    NN      _       3       dobj    _       _",
>>>                                 "5  with    with    ADP     IN      _       6       case    _       _",
>>>                                 "6  telescope       telescope       NOUN    NN      _       3       nmod    _       _"])
>>> sentence = source.sentences().__next__()
>>> mangoes.context.DependencyBasedContext.ud_sentence_parser(sentence)
[set(),
 {(0, 'amod')},
 {(1, 'nsubj'), (3, 'dobj'), (5, 'nmod')},
 set(),
 set(),
 {(4, 'case')}]

static stanford_dependencies_sentence_parser(sentence, collapse=False)¶

Returns an adjacency list from a sentence annotated with Stanford Dependencies

Parameters

sentence: list of Tokens
collapse: boolean: Whether or not to collapse prepositions

Returns

list of same size as sentence: Returns the dependents of each token in the sentence.