DO NOT USE ! This is an unsupported fork of https://github.com/bitanath/pca

- Original version has an issue with generating lots of logs, which makes it expensive to run in some cloud environments. https://github.com/bitanath/pca/issues/14
- This fork is fixing just that, and will be removed once original is fixed.

DO NOT USE ! This is an unsupported fork of https://github.com/bitanath/pca

Principal Components Analysis in Javascript!

A JS library to compute Principal Components from a given matrix of data. Use in either node.js or the browser. Look below for the API and some ideas 💡.

CDN: https://cdn.jsdelivr.net/npm/pca-js@1.0.0/pca.min.js

NPM: npm install --save pca-js

Usage:

Node 🛠: var PCA = require('pca-js')
Browser 🌎: PCA (global)

How to use the API

All methods are exposed through PCA global variable

Say you have data for marks of a class 4 students in 3 examinations on the same subject:

``Student 1: 40,50,60 Student 2: 50,70,60 Student 3: 80,70,90 Student 4: 50,60,80`

You want to examine whether it is possible to come up with a single descriptive set of scores which explains performance across the class. Alternatively, whether it would make sense to replace 3 exams with just one (and reduce stress on students).

First get the set of eigenvectors and eigenvalues (principal components and adjusted loadings)`js var data = [[40,50,60],[50,70,60],[80,70,90],[50,60,80]]; var vectors = PCA.getEigenVectors(data); //Outputs // [{ // "eigenvalue": 520.0992658908312, // "vector": [0.744899700771276, 0.2849796479974595, 0.6032503924724023] // }, { // "eigenvalue": 78.10455398035167, // "vector": [0.2313199078283626, 0.7377809866160473, -0.6341689964277106] // }, { // "eigenvalue": 18.462846795484058, // "vector": [0.6257919271076777, -0.6119361208615616, -0.4836513702572988] // }]`

Now you'd need to find a set of eigenvectors that would explain a decent amount of variance across your exams (thus telling you if 1 test or 2 tests would suffice instead of three)

`js var first = PCA.computePercentageExplained(vectors,vectors[0]) // 0.8434042149581044 var topTwo = PCA.computePercentageExplained(vectors,vectors[0],vectors[1]) // 0.9700602484397556`

So if you wanted to have 97% certainty, that someone wouldn't just flunk out accidentally, you'd take 2 exams. But let's say you just wanted to take 1, explaining 84% of variance is good enough. And instead of taking the examination again, you just wanted a normalized score

`js var adData = PCA.computeAdjustedData(data,vectors[0]) // { // "adjustedData": [ // [-22.27637101744241, -9.127781049780463, 31.316721747529886, 0.08743031969298887] // ], // "formattedAdjustedData": [ // [-22.28, -9.13, 31.32, 0.09] // ], // "avgData": [ // [-55, -62.5, -72.5], // [-55, -62.5, -72.5], // [-55, -62.5, -72.5], // [-55, -62.5, -72.5] // ], // "selectedVectors": [ // [0.744899700771276, 0.2849796479974595, 0.6032503924724023] // ] // }`

The adjustedData is centered (mean = 0), but you could always set the mean to something like 50, to get scores of [-22.27637101744241, -9.127781049780463, 31.316721747529886, 0.08743031969298887].map(score=>Math.round(score+50)) equal to [28, 41, 81, 50] , and that's how well your students would have done, in the order of students.

`$3`

#### Compression (lossy):`js var compressed = adData.formattedAdjustedData; //[ // [-22.28, -9.13, 31.32, 0.09] // ] var uncompressed = PCA.computeOriginalData(compressed,adData.selectedVectors,adData.avgData); //uncompressed.formattedOriginalData (lossy since 2 eigenvectors are removed) // [ // [38.4, 56.15, 59.06], // [48.2, 59.9, 66.99], // [78.33, 71.43, 91.39], // [55.07, 62.53, 72.55] // ]`

Compare this to the original data to understand just how lossy the compression was`//Original Data [ [40, 50, 60], [50, 70, 60], [80, 70, 90], [50, 60, 80] ] //Uncompressed Data [ [38.4, 56.15, 59.06], [48.2, 59.9, 66.99], [78.33, 71.43, 91.39], [55.07, 62.53, 72.55] ]`

`List of Methods`

#### computeDeviationMatrix(data) Find centered matrix from original data

#### computeDeviationScores(centeredMatrix) Find deviation from mean for values in matrix

#### computeSVD(deviationScores) Singular Value Decomposition of matrix

#### computePercentageExplained(allvectors, ...selected) Find percentage explained variance by selected vectors as opposed to the whole

#### computeOriginalData(compressedData,selectedVectors,avgData) Get original data from the adjusted data after selecting a few eigenvectors

#### computeVarianceCovariance(devSumOfSquares,isSample) Get variance covariance matrix from the data, adjust n by one if the data is from a sample

#### computeAdjustedData(initialData, ...selectedVectors) Get adjusted data using principal components as selected

#### getEigenVectors(initialData) Get the principal components of data using the steps outlined above.

#### analyseTopResult(initialData) Same as computeAdjustedData(initialData,vectors[0]). Selecting only the top eigenvector which explains the most variance.

#### transpose(A) Utility function to transpose a matrix A to A(T)

#### multiply(A,B) Utility function to multiply AXB

#### clone(A) Utility function to clone a matrix A

#### scale(A,n) Utility function to scale all elements in A by a factor of n

LICENSE: MIT`