mangoes.context module¶
Definitions a the context of a word in a sentence
This module defines classes to be used as context parameter in the
mangoes.counting.count_cooccurrence()
function.
Examples¶
>>> window_3 = mangoes.context.Window(window_half_size=3)
>>> contexts = window_3(['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog'])
>>> contexts[0]
['quick', 'brown', 'fox']
>>> contexts[4]
['quick', 'brown', 'fox', 'over', 'the', 'lazy']
>>> contexts[5]
['brown', 'fox', 'jumps', 'the', 'lazy', 'dog']
>>> contexts[7]
['jumps', 'over', 'the', 'dog']
-
class
mangoes.context.
Context
(vocabulary=None)¶ Bases:
object
Base callable class to define the context of a word in a sentence
- Parameters
- vocabulary: mangoes.Vocabulary or list of string
vocabulary of the words to consider in the context. Other words are ignored.
- Attributes
params
Parameters of the context
Methods
__call__
(sentence[, mask])Returns the elements in the defined context for each word in sentence
-
property
params
¶ Parameters of the context
-
class
mangoes.context.
Sentence
(vocabulary=None)¶ Bases:
mangoes.context.Context
Implements sentence context
This context extracts the list of tokens in a sentence, around each word, barring the word itself.
- Attributes
params
Parameters of the context
Methods
__call__
(sentence[, mask])Returns the elements in the defined context for each word in sentence
-
class
mangoes.context.
Window
(vocabulary=None, size=1, dirty=False, dynamic=False, n_grams=1, distance=False, symmetric=None, window_half_size=None)¶ Bases:
mangoes.context.Context
Implements window-type context
This context extracts the list of elements found in a window defined around each word, barring the word itself.
- Parameters
- size: int for symmetric window, tuple of 2 int if not
size of the search space to the left and to the right of each word (default = 1). If an integer is given, the window will be centered around the word; asymmetric if a couple of int
- dirty: boolean (def = False)
if True and some of the words in the window aren’t correct ids (ex : -1), they will not be fetched, but the window will be extended further so as to still be able to meet the quota of 2*’window_half_size’ (if symmetric) words to fetch.
- dynamic: boolean
if True, the size of the actual window is sampled between 1 and window_half_size
- n_grams: int (default: 1)
if n_grams > 1, an element of the list will be a n-grams of words instead of a single word.
- distance: boolean (default: False)
if True, the distance from the target word will be added
Examples
>>> import mangoes.context >>> sentence = 'Beautiful is better than ugly'.split() >>> Window()(sentence) [['is'], ['Beautiful', 'better'], ['is', 'than'], ['better', 'ugly'], ['than']] >>> Window(size=2)(sentence) [['is', 'better'], ['Beautiful', 'better', 'than'], ['Beautiful', 'is', 'than', 'ugly'], ['is', 'better', 'ugly'], ['better', 'than']] >>> Window(size=3, n_grams=2)(sentence) [[('is', 'better'), ('better', 'than')], [('better', 'than'), ('than', 'ugly')], [('Beautiful', 'is'), ('than', 'ugly')], [('Beautiful', 'is'), ('is', 'better')], [('is', 'better'), ('better', 'than')]] >>> Window(size=2, distance=True)(sentence) [[('is', 1), ('better', 2)], [('Beautiful', -1), ('better', 1), ('than', 2)], [('Beautiful', -2), ('is', -1), ('than', 1), ('ugly', 2)], [('is', -2), ('better', -1), ('ugly', 1)], [('better', -2), ('than', -1)]]
- Attributes
params
Parameters of the context
Methods
__call__
(sentence[, mask])Returns the elements in the defined context for each word in sentence
call_on_encoded
-
call_on_encoded
(encoded_sentence, mask=False)¶
-
class
mangoes.context.
DependencyBasedContext
(vocabulary=None, entity='form', dependencies='universal-dependencies', collapse=False, labels=False, depth=1)¶ Bases:
mangoes.context.Context
Implements Dependency-Based context
Returns the modifiers and the head of each element of a sentence.
- Parameters
- dependencies: {‘universal-dependencies’, ‘stanford-dependencies’} or callable
Representation used for dependencies annotation. Default is ‘universal-dependencies’. You can also provide your own parser
- collapse: bool
Whether or not the preposition relations should be collapsed. Default is False.
- labels: bool
Whether or not the labels should be added to the output contexts. Default is False
References
- 1
Levy, O., & Goldberg, Y. (2014, June). Dependency-Based Word Embeddings. In ACL (2) (pp. 302-308).
Examples
>>> source = mangoes.corpus.CONLLU(["1 australian australian ADJ JJ _ 2 amod _ _", >>> "2 scientist scientist NOUN NN _ 3 nsubj _ _", >>> "3 discovers discover VERB VBZ _ 0 root _ _", >>> "4 star star NOUN NN _ 3 dobj _ _", >>> "5 with with ADP IN _ 6 case _ _", >>> "6 telescope telescope NOUN NN _ 3 nmod _ _"]) >>> sentence = source.sentences().__next__() >>> context = mangoes.context.DependencyBasedContext(labels=True) >>> context(sentence)[1] # scientist {"australian/amod", "discovers/nsubj-"}
- Attributes
- collapse
- depth
- labels
params
Parameters of the context
Methods
__call__
(sentence[, mask])Returns the elements in the defined context for each word in sentence
stanford_dependencies_sentence_parser
(sentence)Returns an adjacency list from a sentence annotated with Stanford Dependencies
ud_sentence_parser
(sentence[, collapse])Returns an adjacency list from a sentence annotated with Universal Dependencies
add_children
-
property
collapse
¶
-
property
labels
¶
-
property
depth
¶
-
static
add_children
(sentence_tree)¶
-
static
ud_sentence_parser
(sentence, collapse=False)¶ Returns an adjacency list from a sentence annotated with Universal Dependencies
- Parameters
- sentence: list of Tokens
- collapse: boolean
Whether or not to collapse prepositions
- Returns
- list of same size as sentence
Returns the dependents of each token in the sentence.
Examples
>>> source = mangoes.corpus.CONLLU(["1 australian australian ADJ JJ _ 2 amod _ _", >>> "2 scientist scientist NOUN NN _ 3 nsubj _ _", >>> "3 discovers discover VERB VBZ _ 0 root _ _", >>> "4 star star NOUN NN _ 3 dobj _ _", >>> "5 with with ADP IN _ 6 case _ _", >>> "6 telescope telescope NOUN NN _ 3 nmod _ _"]) >>> sentence = source.sentences().__next__() >>> mangoes.context.DependencyBasedContext.ud_sentence_parser(sentence) [set(), {(0, 'amod')}, {(1, 'nsubj'), (3, 'dobj'), (5, 'nmod')}, set(), set(), {(4, 'case')}]
-
static
stanford_dependencies_sentence_parser
(sentence, collapse=False)¶ Returns an adjacency list from a sentence annotated with Stanford Dependencies
- Parameters
- sentence: list of Tokens
- collapse: boolean
Whether or not to collapse prepositions
- Returns
- list of same size as sentence
Returns the dependents of each token in the sentence.