TFLite support for Tensorflow.js

_WORK IN PROGRESS_

This package enables users to run arbitary TFLite models on the web. Users can
load a TFLite model from a URL, use TFJS tensors to set the model's input
data, run inference, and get the output back in TFJS tensors. Under the hood,
the TFLite C++ runtime is packaged in a set of WASM modules, and the one with
the best performance will be automatically loaded based on user's current
environment (e.g. whether WebAssembly SIMD and/or multi-threading is supported
or not).

Check out this [demo][demo] where we use this package to run a
[CartoonGAN][model] TFLite model on the web.

Usage

Import the packages

To use this package, you will need a TFJS backend installed. We recommend the
CPU backend. You will also need to import @tensorflow/tfjs-core for
manipulating tensors.

$3

``js // Adds the CPU backend. import '@tensorflow/tfjs-backend-cpu'; // Import @tensorflow/tfjs-core import * as tf from '@tensorflow/tfjs-core'; // Import @tensorflow/tfjs-tflite. import * as tflite from '@tensorflow/tfjs-tflite';`

`$3`

`html`

`Set WASM modules location (optional)`

By default, it will try to load the WASM modules from the same location where the package or your own script is served. UsesetWasmPathto set your own location. Seesrc/tflite_web_api_client.d.ts for more details.

`js tflite.setWasmPath('https://your-server/path');`

`Load a TFLite model`

js
const tfliteModel = await tflite.loadTFLiteModel('url/to/your/model.tflite');


Run inference

js
// Prepare input tensors.
const img = tf.browser.fromPixels(document.querySelector('img'));
const input = tf.sub(tf.div(tf.expandDims(img), 127.5), 1);

// Run inference and get output tensors. let outputTensor = tfliteModel.predict(input) as tf.Tensor; console.log(outputTensor.dataSync());`

`Performance`

This package uses [XNNPACK][xnnpack] to accelerate inference for floating-point and quantized models. See [XNNPACK documentation][xnnpack doc] for the full list of supported floating-point and quantized operators.

To achieve the best performance, use a browser that supports "WebAssembly SIMD" and "WebAssembly threads". In Chrome 92+, these features are enabled by default. In older versions of Chrome, they can be enabled inchrome://flags/.

Starting from Chrome 92, cross-origin isolation needs to be set up in your site in order to take advantage of the multi-threading support. Without this, it will fallback to the WASM binary with SIMD-only support (or the vanila version if SIMD is not enabled). Without multi-threading support, certain models might not achieve the best performance. See [here][cross origin setup steps] for the high-level steps to set up the cross-origin isolation.

By default, the runtime uses the number of physical cores as the thread count. You can tune this number by setting thenumThreadsoption when loading the TFLite model:

`js const tfliteModel = await tflite.loadTFLiteModel( 'path/to/your/my_model.tflite', {numThreads: navigator.hardwareConcurrency / 2});`

`Profiling`

Profiling can be enabled by setting the enableProfilingoption to true when loading the TFLite model:

`js const tfliteModel = await tflite.loadTFLiteModel( 'path/to/your/my_model.tflite', {enableProfiling: true});`

Once it is enabled, the runtime will record per-op latency data when thepredict method is called. The profiling results can be retrieved in two ways:

- tfliteModel.getProfilingResults(): this method will return an array of{nodeType, nodeName, execTimeInMs}. -tfliteModel.getProfilingSummary(): this method will return a human-readable profiling result summary that looks like [this][profiling summary].

`Development`

`Building`

`sh $ yarn $ yarn build`

`Testing`

`sh $ yarn test`

`Deployment`

sh
$ yarn build-npm
(TODO): publish

[demo]: https://storage.googleapis.com/tfweb/demos/cartoonizer/index.html
[model]: https://blog.tensorflow.org/2020/09/how-to-create-cartoonizer-with-tf-lite.html
[xnnpack]: https://github.com/google/XNNPACK
[xnnpack doc]: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/delegates/xnnpack/README.md#limitations-and-supported-operators
[cross origin setup steps]: https://github.com/tensorflow/tfjs/tree/master/tfjs-backend-wasm#setting-up-cross-origin-isolation
[profiling summary]: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/tools/benchmark/README.md#profiling-model-operators