mangoes.reduction module¶
Dimensionality reduction to apply the co-occurrence count matrix.
This module provides reductions that can be used in the transformations parameter of
the mangoes.create_representation()
function to create an Embeddings from a CooccurrenceCount.
Examples¶
- import mangoes.base >>> pca = mangoes.reduction.PCA(dimensions=50)
>>> embeddings = mangoes.base.create_representation(cc, transformations=pca)
See Also¶
mangoes.create_representation()
mangoes.Transformation
-
class
mangoes.reduction.
PCA
(dimensions)¶ Bases:
mangoes.base.Transformation
Defines a transformation that apply PCA
- Parameters
- dimensions: int
desired dimensionality of the returned matrix
- Attributes
- dimensions
- params
Methods
__call__
(matrix)Performs the PCA on the matrix and returns representations for target words
-
property
dimensions
¶
-
class
mangoes.reduction.
SVD
(dimensions, weight=1, add_context_vectors=False, symmetric=False)¶ Bases:
mangoes.base.Transformation
Defines a transformation to reduce dimensionality using SVD
Given a count matrix M, the SVD decomposition gives :
M_d = U_d.\Sigma_d.V_d^\top
by keeping the top d eigenvalues in \Sigma where d = dimensions
If parameter `add_context_vectors` is False (default) :
We get W_d, a matrix representing the original target words (rows of the matrix) reduced to given dimensions :
W_d = U_d.\Sigma_d^{weight}
The function will return W_d
If parameter `add_context_vectors` is True :
If the same vocabulary as been used as rows and columns to construct the matrix, you can also set add_context_vectors to True to add the “context vectors” C_d to W_d.
The symmetric parameter define the way to construct C_d :
If symmetric = False :
C_d = V_d.\Sigma_d^{1-weight}
So weight = 1 corresponds to the traditional SVD factorization W_d = U_d.\Sigma_d, C_d = V_d
If symmetric = True :
C_d = V_d.\Sigma_d^{weight}
So with weight = 0 : W_d = U_d, C_d = V_d
The function will return W_d + C_d
Warning
add_context_vectors should be used only if represented words and words used as contexts are the same
weight should be a value between 0 and 1
Notes
dimensions has to be lower than both dimensions of the input matrix
- Attributes
- dimensions: int
desired dimensionality of the returned matrix. Must be less than both dimensions of the original.
- weight: {1, 0, 0.5}
a parameter that defines the way to compute the matrix (see above)
- add_context_vectors: boolean
Use the context vectors in addition to the words vectors (should be used only if represented words and words used as contexts are the same). Default = False.
- symmetric: boolean
if True, the true matrices W_d and C_d will be built symmetrically (see above)
Methods
__call__
(matrix)Performs the reduction
-
property
dimensions
¶
-
property
weight
¶
-
property
add_context_vectors
¶
-
property
symmetric
¶