TFLite is the deployment format for edge ML. Today you'll convert, optimize, and run .tflite models on Raspberry Pi and Arduino Nano 33.
TensorFlow Lite has two runtimes: TFLite for Android/iOS/Linux (Raspberry Pi, Coral) — full interpreter, supports most ops, C++ and Python APIs. TFLite Micro (TFLM) — microcontroller runtime, no dynamic memory allocation, 16–256KB footprint, runs on Arduino Nano 33 BLE Sense, STM32, and ESP32-S3. The workflow: train in TensorFlow/Keras → convert to FlatBuffer format (.tflite) → deploy to edge device.
tf.lite.TFLiteConverter.from_keras_model(model) converts a Keras model. For optimization, set converter.optimizations = [tf.lite.Optimize.DEFAULT] (applies INT8 quantization). Provide a representative dataset function for full integer quantization. The output is a .tflite file — a single FlatBuffer file containing the model graph and quantized weights. Use xxd -i model.tflite > model_data.h to embed the model as a C array for TFLM.
Python TFLite interpreter: interpreter = tf.lite.Interpreter('model.tflite'), interpreter.allocate_tensors(), set input tensor, interpreter.invoke(), get output tensor. In C++ (TFLM): define an arena (static memory buffer), create resolver with needed ops, load model from flash, create interpreter, allocate, run. Total C++ code: ~30 lines. The model runs with zero heap allocation.
# TFLite: convert, quantize, and run inference
# pip install tensorflow numpy Pillow
import tensorflow as tf
import numpy as np
import time
# ── 1. Build and train a simple model ────────────────────
def build_model():
return tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation='softmax')
])
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0
model = build_model()
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=3, validation_split=0.1, verbose=0)
print(f"Base accuracy: {model.evaluate(x_test, y_test, verbose=0)[1]:.3f}")
# ── 2. Convert to TFLite (float32) ───────────────────────
converter = tf.lite.TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
with open('model_fp32.tflite', 'wb') as f: f.write(tflite_model)
print(f"FP32 size: {len(tflite_model)/1024:.1f} KB")
# ── 3. Convert with INT8 quantization ────────────────────
converter_q = tf.lite.TFLiteConverter.from_keras_model(model)
converter_q.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_data():
for i in range(100):
yield [x_train[i:i+1].astype(np.float32)]
converter_q.representative_dataset = representative_data
converter_q.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter_q.inference_input_type = tf.uint8
converter_q.inference_output_type = tf.uint8
tflite_quant = converter_q.convert()
with open('model_int8.tflite', 'wb') as f: f.write(tflite_quant)
print(f"INT8 size: {len(tflite_quant)/1024:.1f} KB")
# ── 4. Run inference with TFLite interpreter ─────────────
def tflite_predict(model_path, input_data):
interp = tf.lite.Interpreter(model_path=model_path)
interp.allocate_tensors()
in_idx = interp.get_input_details()[0]['index']
out_idx = interp.get_output_details()[0]['index']
interp.set_tensor(in_idx, input_data)
interp.invoke()
return interp.get_tensor(out_idx)
# Benchmark
sample = x_test[:1].astype(np.float32)
t0 = time.perf_counter()
for _ in range(1000): tflite_predict('model_fp32.tflite', sample)
fp32_ms = (time.perf_counter()-t0)
print(f"
FP32 inference: {fp32_ms:.1f}ms for 1000 runs")
print(f"INT8 model is {len(tflite_model)/len(tflite_quant):.1f}x smaller")
interpreter.get_tensor_details() to find the peak activation memory during inference.pip3 install tflite-runtime (lighter than full TF).Deploy a keyword spotting model (TensorFlow Speech Commands dataset) to an Arduino Nano 33 BLE Sense. The board has a microphone and runs TFLM. Use the pre-built hello_edge_impulse or Arduino_TensorFlowLite library. Train a model to recognize 'yes' and 'no'. What is the false positive rate? How does adding more keywords affect accuracy?