Home
API
Examples
Github
Home
API
Examples
Github
  • API

    • cluster

      • KMeans
    • datasets

      • Boston
      • HeartDisease
      • Iris
    • decomposition

      • PCA
    • ensemble

      • BaggingClassifier
      • RandomForestClassifier
    • feature_extraction

      • CountVectorizer
    • linear_model

      • Lasso
      • LinearRegression
      • LogisticRegression
      • Ridge
      • SGDClassifier
      • SGDRegressor
    • metrics

      • accuracyScore
      • confusion_matrix
      • mean_absolute_error
      • mean_squared_error
      • mean_squared_log_error
      • zeroOneLoss
    • model_selection

      • KFold
      • train_test_split
    • naive_bayes

      • GaussianNB
      • MultinomialNB
    • neighbors

      • KNeighborsClassifier
    • preprocessing

      • Binarizer
      • Imputer
      • LabelEncoder
      • MinMaxScaler
      • OneHotEncoder
      • PolynomialFeatures
      • add_dummy_feature
      • normalize
    • svm

      • BaseSVM
      • NuSVC
      • NuSVR
      • OneClassSVM
      • SVC
      • SVR
    • tree

      • DecisionTreeClassifier

feature_extraction.CountVectorizer

Usage

import { CountVectorizer } from 'machinelearn/feature_extraction';

const corpus = ['deep learning ian good fellow learning jason shin shin', 'yoshua bengio'];
const vocabCounts = cv.fit_transform(corpus);
console.log(vocabCounts); // [ [ 0, 1, 1, 1, 1, 1, 2, 2, 0 ], [ 1, 0, 0, 0, 0, 0, 0, 0, 1 ] ]
console.log(cv.vocabulary); // { bengio: 0, deep: 1, fellow: 2, good: 3, ian: 4, jason: 5, learning: 6, shin: 7, yoshua: 8 }
console.log(cv.getFeatureNames()); // [ 'bengio', 'deep', 'fellow', 'good', 'ian', 'jason', 'learning', 'shin', 'yoshua' ]

const newVocabCounts = cv.transform(['ian good fellow jason duuog']);
console.log(newVocabCounts); // [ [ 0, 0, 1, 1, 1, 1, 0, 0, 0 ] ]

Constructors

  • constructor

Properties

  • vocabulary

Methods

  • fit

  • fit_transform

  • getFeatureNames

  • transform

Constructors


constructor

⊕ CountVectorizer()

Defined in

Parameters:

ParamTypeDefaultDescription

Returns: CountVectorizer

Properties


▸ vocabulary

Defined in feature_extraction/text.ts:26

Methods


λ fit

Learn a vocabulary dictionary of all tokens in the raw documents.

Defined in feature_extraction/text.ts:35

Parameters:

ParamTypeDefaultDescription
docstring[]nullAn array of strings

Returns:

this

λ fit_transform

fit transform applies

Defined in feature_extraction/text.ts:46

Parameters:

ParamTypeDefaultDescription
docstring[]nullAn array of strings

Returns:

number[][]

λ getFeatureNames

Array mapping from feature integer indices to feature name

Defined in feature_extraction/text.ts:70

Returns:

object

λ transform

Transform documents to document-term matrix. Extract token counts out of raw text documents using the vocabulary fitted with fit or the one provided to the constructor.

Defined in feature_extraction/text.ts:61

Parameters:

ParamTypeDefaultDescription
docstring[]nullAn array of strings

Returns:

number[][]