mangoes.reduction module

Dimensionality reduction to apply the co-occurrence count matrix.

This module provides reductions that can be used in the transformations parameter of the mangoes.create_representation() function to create an Embeddings from a CooccurrenceCount.

Examples

import mangoes.base >>> pca = mangoes.reduction.PCA(dimensions=50)
>>> embeddings = mangoes.base.create_representation(cc, transformations=pca)

See Also

mangoes.create_representation() mangoes.Transformation

class mangoes.reduction.PCA(dimensions)

Bases: mangoes.base.Transformation

Defines a transformation that apply PCA

Parameters
dimensions: int

desired dimensionality of the returned matrix

Attributes
dimensions
params

Methods

__call__(matrix)

Performs the PCA on the matrix and returns representations for target words

property dimensions
class mangoes.reduction.SVD(dimensions, weight=1, add_context_vectors=False, symmetric=False)

Bases: mangoes.base.Transformation

Defines a transformation to reduce dimensionality using SVD

Given a count matrix M, the SVD decomposition gives :

M_d = U_d.\Sigma_d.V_d^\top

by keeping the top d eigenvalues in \Sigma where d = dimensions

If parameter `add_context_vectors` is False (default) :

We get W_d, a matrix representing the original target words (rows of the matrix) reduced to given dimensions :

W_d = U_d.\Sigma_d^{weight}

The function will return W_d

If parameter `add_context_vectors` is True :

If the same vocabulary as been used as rows and columns to construct the matrix, you can also set add_context_vectors to True to add the “context vectors” C_d to W_d.

The symmetric parameter define the way to construct C_d :

If symmetric = False :

C_d = V_d.\Sigma_d^{1-weight}

So weight = 1 corresponds to the traditional SVD factorization W_d = U_d.\Sigma_d, C_d = V_d

If symmetric = True :

C_d = V_d.\Sigma_d^{weight}

So with weight = 0 : W_d = U_d, C_d = V_d

The function will return W_d + C_d

Warning

  • add_context_vectors should be used only if represented words and words used as contexts are the same

  • weight should be a value between 0 and 1

Notes

dimensions has to be lower than both dimensions of the input matrix

Attributes
dimensions: int

desired dimensionality of the returned matrix. Must be less than both dimensions of the original.

weight: {1, 0, 0.5}

a parameter that defines the way to compute the matrix (see above)

add_context_vectors: boolean

Use the context vectors in addition to the words vectors (should be used only if represented words and words used as contexts are the same). Default = False.

symmetric: boolean

if True, the true matrices W_d and C_d will be built symmetrically (see above)

Methods

__call__(matrix)

Performs the reduction

property dimensions
property weight
property add_context_vectors
property symmetric