mangoes.context module

Definitions a the context of a word in a sentence

This module defines classes to be used as context parameter in the mangoes.counting.count_cooccurrence() function.

Examples

>>> window_3 = mangoes.context.Window(window_half_size=3)
>>> contexts = window_3(['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog'])
>>> contexts[0]
['quick', 'brown', 'fox']
>>> contexts[4]
['quick', 'brown', 'fox', 'over', 'the', 'lazy']
>>> contexts[5]
['brown', 'fox', 'jumps', 'the', 'lazy', 'dog']
>>> contexts[7]
['jumps', 'over', 'the', 'dog']
class mangoes.context.Context(vocabulary=None)

Bases: object

Base callable class to define the context of a word in a sentence

Parameters
vocabulary: mangoes.Vocabulary or list of string

vocabulary of the words to consider in the context. Other words are ignored.

Attributes
params

Parameters of the context

Methods

__call__(sentence[, mask])

Returns the elements in the defined context for each word in sentence

property params

Parameters of the context

class mangoes.context.Sentence(vocabulary=None)

Bases: mangoes.context.Context

Implements sentence context

This context extracts the list of tokens in a sentence, around each word, barring the word itself.

Attributes
params

Parameters of the context

Methods

__call__(sentence[, mask])

Returns the elements in the defined context for each word in sentence

class mangoes.context.Window(vocabulary=None, size=1, dirty=False, dynamic=False, n_grams=1, distance=False, symmetric=None, window_half_size=None)

Bases: mangoes.context.Context

Implements window-type context

This context extracts the list of elements found in a window defined around each word, barring the word itself.

Parameters
size: int for symmetric window, tuple of 2 int if not

size of the search space to the left and to the right of each word (default = 1). If an integer is given, the window will be centered around the word; asymmetric if a couple of int

dirty: boolean (def = False)

if True and some of the words in the window aren’t correct ids (ex : -1), they will not be fetched, but the window will be extended further so as to still be able to meet the quota of 2*’window_half_size’ (if symmetric) words to fetch.

dynamic: boolean

if True, the size of the actual window is sampled between 1 and window_half_size

n_grams: int (default: 1)

if n_grams > 1, an element of the list will be a n-grams of words instead of a single word.

distance: boolean (default: False)

if True, the distance from the target word will be added

Examples

>>> import mangoes.context
>>> sentence = 'Beautiful is better than ugly'.split()
>>> Window()(sentence)
[['is'], ['Beautiful', 'better'], ['is', 'than'], ['better', 'ugly'], ['than']]
>>> Window(size=2)(sentence) 
[['is', 'better'],
 ['Beautiful', 'better', 'than'],
 ['Beautiful', 'is', 'than', 'ugly'],
 ['is', 'better', 'ugly'],
 ['better', 'than']]
>>> Window(size=3, n_grams=2)(sentence) 
[[('is', 'better'), ('better', 'than')],
 [('better', 'than'), ('than', 'ugly')],
 [('Beautiful', 'is'), ('than', 'ugly')],
 [('Beautiful', 'is'), ('is', 'better')],
 [('is', 'better'), ('better', 'than')]]
>>> Window(size=2, distance=True)(sentence) 
 [[('is', 1), ('better', 2)],
  [('Beautiful', -1), ('better', 1), ('than', 2)],
  [('Beautiful', -2), ('is', -1), ('than', 1), ('ugly', 2)],
  [('is', -2), ('better', -1), ('ugly', 1)],
  [('better', -2), ('than', -1)]]
Attributes
params

Parameters of the context

Methods

__call__(sentence[, mask])

Returns the elements in the defined context for each word in sentence

call_on_encoded

call_on_encoded(encoded_sentence, mask=False)
class mangoes.context.DependencyBasedContext(vocabulary=None, entity='form', dependencies='universal-dependencies', collapse=False, labels=False, depth=1)

Bases: mangoes.context.Context

Implements Dependency-Based context

Returns the modifiers and the head of each element of a sentence.

Parameters
dependencies: {‘universal-dependencies’, ‘stanford-dependencies’} or callable

Representation used for dependencies annotation. Default is ‘universal-dependencies’. You can also provide your own parser

collapse: bool

Whether or not the preposition relations should be collapsed. Default is False.

labels: bool

Whether or not the labels should be added to the output contexts. Default is False

References

1

Levy, O., & Goldberg, Y. (2014, June). Dependency-Based Word Embeddings. In ACL (2) (pp. 302-308).

Examples

>>> source = mangoes.corpus.CONLLU(["1      australian      australian      ADJ     JJ      _       2       amod    _       _",
>>>                                 "2      scientist       scientist       NOUN    NN      _       3       nsubj   _       _",
>>>                                 "3      discovers       discover        VERB    VBZ     _       0       root    _       _",
>>>                                 "4      star    star    NOUN    NN      _       3       dobj    _       _",
>>>                                 "5      with    with    ADP     IN      _       6       case    _       _",
>>>                                 "6      telescope       telescope       NOUN    NN      _       3       nmod    _       _"])
>>> sentence = source.sentences().__next__()
>>> context = mangoes.context.DependencyBasedContext(labels=True)
>>> context(sentence)[1] # scientist
{"australian/amod", "discovers/nsubj-"}
Attributes
collapse
depth
labels
params

Parameters of the context

Methods

__call__(sentence[, mask])

Returns the elements in the defined context for each word in sentence

stanford_dependencies_sentence_parser(sentence)

Returns an adjacency list from a sentence annotated with Stanford Dependencies

ud_sentence_parser(sentence[, collapse])

Returns an adjacency list from a sentence annotated with Universal Dependencies

add_children

property collapse
property labels
property depth
static add_children(sentence_tree)
static ud_sentence_parser(sentence, collapse=False)

Returns an adjacency list from a sentence annotated with Universal Dependencies

Parameters
sentence: list of Tokens
collapse: boolean

Whether or not to collapse prepositions

Returns
list of same size as sentence

Returns the dependents of each token in the sentence.

Examples

>>> source = mangoes.corpus.CONLLU(["1  australian      australian      ADJ     JJ      _       2       amod    _       _",
>>>                                 "2  scientist       scientist       NOUN    NN      _       3       nsubj   _       _",
>>>                                 "3  discovers       discover        VERB    VBZ     _       0       root    _       _",
>>>                                 "4  star    star    NOUN    NN      _       3       dobj    _       _",
>>>                                 "5  with    with    ADP     IN      _       6       case    _       _",
>>>                                 "6  telescope       telescope       NOUN    NN      _       3       nmod    _       _"])
>>> sentence = source.sentences().__next__()
>>> mangoes.context.DependencyBasedContext.ud_sentence_parser(sentence)
[set(),
 {(0, 'amod')},
 {(1, 'nsubj'), (3, 'dobj'), (5, 'nmod')},
 set(),
 set(),
 {(4, 'case')}]
static stanford_dependencies_sentence_parser(sentence, collapse=False)

Returns an adjacency list from a sentence annotated with Stanford Dependencies

Parameters
sentence: list of Tokens
collapse: boolean

Whether or not to collapse prepositions

Returns
list of same size as sentence

Returns the dependents of each token in the sentence.