feature_extraction.CountVectorizer
Usage
import { CountVectorizer } from 'machinelearn/feature_extraction';
const corpus = ['deep learning ian good fellow learning jason shin shin', 'yoshua bengio'];
const vocabCounts = cv.fit_transform(corpus);
console.log(vocabCounts); // [ [ 0, 1, 1, 1, 1, 1, 2, 2, 0 ], [ 1, 0, 0, 0, 0, 0, 0, 0, 1 ] ]
console.log(cv.vocabulary); // { bengio: 0, deep: 1, fellow: 2, good: 3, ian: 4, jason: 5, learning: 6, shin: 7, yoshua: 8 }
console.log(cv.getFeatureNames()); // [ 'bengio', 'deep', 'fellow', 'good', 'ian', 'jason', 'learning', 'shin', 'yoshua' ]
const newVocabCounts = cv.transform(['ian good fellow jason duuog']);
console.log(newVocabCounts); // [ [ 0, 0, 1, 1, 1, 1, 0, 0, 0 ] ]
Constructors
Properties
Methods
Constructors
constructor
⊕ CountVectorizer()
Defined in
Parameters:
Param | Type | Default | Description |
---|
Returns: CountVectorizer
Properties
▸ vocabulary
Defined in feature_extraction/text.ts:26
Methods
λ fit
Learn a vocabulary dictionary of all tokens in the raw documents.
Defined in feature_extraction/text.ts:35
Parameters:
Param | Type | Default | Description |
---|---|---|---|
doc | string[] | null | An array of strings |
Returns:
this
λ fit_transform
fit transform applies
Defined in feature_extraction/text.ts:46
Parameters:
Param | Type | Default | Description |
---|---|---|---|
doc | string[] | null | An array of strings |
Returns:
number[][]
λ getFeatureNames
Array mapping from feature integer indices to feature name
Defined in feature_extraction/text.ts:70
Returns:
object
λ transform
Transform documents to document-term matrix. Extract token counts out of raw text documents using the vocabulary fitted with fit or the one provided to the constructor.
Defined in feature_extraction/text.ts:61
Parameters:
Param | Type | Default | Description |
---|---|---|---|
doc | string[] | null | An array of strings |
Returns:
number[][]