mangoes.utils.reader module¶
-
class
mangoes.utils.reader.
SentenceGenerator
(source, lower=False, digit=False, ignore_punctuation=False)¶ Bases:
object
Base class for sentences generators
A sentence generator yields sentence from a source, that can be an iterable or a set of files.
- Parameters
- sourcea string or an iterable
An iterable of sentences or a path to a file or a repository
- lowerboolean, optional
If True (default), converts sentences to lower case
- digitboolean, optional
If True (default), replace numeric values with DIGIT_TOKEN in sentences
- ignore_punctuation: boolean, optional
If True, the punctuation will be ignored when reading the corpus. Default : False
Warning
This class should not be used directly. Use derived classes instead.
See also
TextGenerator
BrownGenerator
XmlGenerator
ConllGenerator
Methods
Yields sentences from the source
-
abstract
sentences
()¶ Yields sentences from the source
- Yields
- list of str
-
class
mangoes.utils.reader.
TextSentenceGenerator
(source, lower=False, digit=False, ignore_punctuation=False)¶ Bases:
mangoes.utils.reader.SentenceGenerator
Sentence generator for simple text source
See also
Methods
Yields sentences from the source
-
sentences
()¶ Yields sentences from the source
- Yields
- list of str
-
-
class
mangoes.utils.reader.
AnnotatedSentenceGenerator
(source, lower=False, digit=False, ignore_punctuation=True)¶ Bases:
mangoes.utils.reader.SentenceGenerator
Base class for sentences generators from annotated source
A sentence generator yields sentence from a source, that can be an iterable or a set of files.
Warning
This class should not be used directly. Use derived classes instead.
See also
Methods
Token
(form, lemma, POS)- Attributes
Yields sentences from the source
-
FIELDS
= ('form', 'lemma', 'POS')¶
-
NUM_TAG
= 'NUM'¶
-
PUNCTUATION_TAG
= 'PUNCT'¶
-
class
Token
(form, lemma, POS)¶ Bases:
mangoes.utils.reader.Token
- Attributes
POS
Alias for field number 2
form
Alias for field number 0
lemma
Alias for field number 1
Methods
count
(value, /)Return number of occurrences of value.
index
(value[, start, stop])Return first index of value.
lower
replace
-
lower
()¶
-
replace
(value)¶
-
abstract
sentences
()¶ Yields sentences from the source
- Yields
- list of str
-
class
mangoes.utils.reader.
BrownSentenceGenerator
(source, lower=False, digit=False, ignore_punctuation=True)¶ Bases:
mangoes.utils.reader.AnnotatedSentenceGenerator
Sentence generator for text source annotated in Brown format
See also
Methods
Token
(form, lemma, POS)- Attributes
Yields sentences from the source
-
sentences
()¶ Yields sentences from the source
- Yields
- list of str
-
class
mangoes.utils.reader.
XmlSentenceGenerator
(source, lower=False, digit=False, ignore_punctuation=False)¶ Bases:
mangoes.utils.reader.AnnotatedSentenceGenerator
Sentence generator for an XML source
See also
Methods
Token
(id, form, lemma, POS, features, head, …)- Attributes
sentences
()Yields sentences from the source
-
FIELDS
= ('id', 'form', 'lemma', 'POS', 'features', 'head', 'dependency_relation')¶
-
class
Token
(id, form, lemma, POS, features, head, dependency_relation)¶ Bases:
mangoes.utils.reader.Token
- Attributes
POS
Alias for field number 3
dependency_relation
Alias for field number 6
features
Alias for field number 4
form
Alias for field number 1
head
Alias for field number 5
id
Alias for field number 0
lemma
Alias for field number 2
Methods
count
(value, /)Return number of occurrences of value.
index
(value[, start, stop])Return first index of value.
lower
replace
-
lower
()¶
-
replace
(value)¶
-
class
mangoes.utils.reader.
ConllSentenceGenerator
(source, lower=False, digit=False, ignore_punctuation=True)¶ Bases:
mangoes.utils.reader.AnnotatedSentenceGenerator
Sentence generator for source annotated in Conll format
See also
Methods
Token
(id, form, lemma, POS, NER, head, …)- Attributes
Yields sentences from the source
-
FIELDS
= ('id', 'form', 'lemma', 'POS', 'NER', 'head', 'dependency_relation')¶
-
class
Token
(id, form, lemma, POS, NER, head, dependency_relation)¶ Bases:
mangoes.utils.reader.Token
- Attributes
NER
Alias for field number 4
POS
Alias for field number 3
dependency_relation
Alias for field number 6
form
Alias for field number 1
head
Alias for field number 5
id
Alias for field number 0
lemma
Alias for field number 2
Methods
count
(value, /)Return number of occurrences of value.
index
(value[, start, stop])Return first index of value.
lower
replace
-
lower
()¶
-
replace
(value)¶
-
sentences
()¶ Yields sentences from the source
- Yields
- list of str
-
class
mangoes.utils.reader.
ConllUSentenceGenerator
(source, lower=False, digit=False, ignore_punctuation=True)¶ Bases:
mangoes.utils.reader.ConllSentenceGenerator
Methods
Token
(id, form, lemma, POS, xpostag, feats, …)- Attributes
sentences
()Yields sentences from the source
-
FIELDS
= ('id', 'form', 'lemma', 'POS', 'xpostag', 'feats', 'head', 'dependency_relation', 'deps', 'misc')¶
-
class
Token
(id, form, lemma, POS, xpostag, feats, head, dependency_relation, deps, misc)¶ Bases:
mangoes.utils.reader.Token
- Attributes
POS
Alias for field number 3
dependency_relation
Alias for field number 7
deps
Alias for field number 8
feats
Alias for field number 5
form
Alias for field number 1
head
Alias for field number 6
id
Alias for field number 0
lemma
Alias for field number 2
misc
Alias for field number 9
xpostag
Alias for field number 4
Methods
count
(value, /)Return number of occurrences of value.
index
(value[, start, stop])Return first index of value.
lower
replace
-
lower
()¶
-
replace
(value)¶