Machine Learning APIPackaging models
Machine Learning API~ 6 min read

Packaging models

Any machine-learning model built using one of the mainstream data-science frameworks, e.g. Scikit-Learn, TensorFlow or PyTorch, can be served using Flama. This, indeed, is what we have been explaining in the previous sections on Flama CLI commands run, serve, and start. For this to happen, we needed either of the following two options:

  • A model packaged as a binary file (.flm files)
  • A model embedded in a Flama App

The second option will be explained in detail in the following sections: add models, model resource, and model components. The first option (which is the one we are going to discuss in what follows) requires us to save the models following a certain procedure. For the sake of convenience and speeding up the process of integrating these models into an API, Flama comes with the functionality required to serialise and package them, automatically adding important metadata which make the resulting files operational.

FLM files

The binary files needed by the Flama CLI are typically named with the suffix .flm. We call them flama files for the sake of simplicity, but FLM stands for Flama Lightweight Model. This comes from the fact that, FLM files are a lightweight representation of ML models, which come with useful metadata needed for later purposes, e.g. building a wrapper Flama app containing the model.

FLM file structure

The structure of an FLM file is thought to be as simple as possible, and aims at keeping in a single file all the information needed to load and use the model. The structure of an FLM file is as follows:

β”œβ”€β”€ model.flm
β”‚ └── model
β”‚ β”œβ”€β”€ model (python object)
β”‚ └── meta
β”‚ β”œβ”€β”€ id
β”‚ β”œβ”€β”€ timestamp
β”‚ β”œβ”€β”€ framework
β”‚ β”œβ”€β”€ model
β”‚ β”‚ β”œβ”€β”€ obj
β”‚ β”‚ β”œβ”€β”€ info
β”‚ β”‚ β”œβ”€β”€ params
β”‚ β”‚ └── metrics
β”‚ └── extra
└── artifacts
β”œβ”€β”€ foo.json
└── bar.csv

Dump & load

Let's consider the following familiar situation, which is the day-to-day routine of many data scientists. After careful experimentation, cross-validation, testing, and so on, we have found the optimal ML model for our problem. Great job! Now, we want to take our model out of our Jupyter Notebook, and offer it as a service to make predictions on demand. The first thing we think about is pickling (i.e., using pickle.dump) the model, and pass the resulting file to the corresponding team/colleague to develop the wrapper API which will have to eventually unpickle (i.e., using pickle.load) the object, and expose the predict method. It seems like a very repetitive and boring task, doesn't it?

As we have seen already when we introduced serve and start,

Flama comes equipped with a very convenient CLI which does all the boring part for you seamlessly, just with a single line of code. For this, we only need our models to be packaged with the Flama counterparts of pickle's dump and load commands, namely: flama.dump and flama.load.

Dump method

Flama dump method uses optimal compression with the aim of making the packing process more efficient, and faster. The packing step can live completely out of any Flama application. Indeed, the natural place to package your models will be at the model-building stage, which will be very likely happening on your Jupyter notebook. An example of usage of this method:

import flama
timestamp=datetime.datetime(2023, 3, 10, 11, 30, 0),
params={"optimizer": "adams"},
metrics={"recall": "0.95"},
"model_version": "1.0.0",
"model_description": "This is a test model",
"model_author": "John Doe",
"model_license": "MIT",
"tags": ["test", "example"],
artifacts={"foo.json": "path/to/artifact.json"},

The first two parameters are the model object itself, and the path where the resulting file will be stored. The remaining parameters are optional, and are used to add metadata to the resulting file which might be quite useful for model management purposes:

  • model_id: a unique identifier for the model. If not provided, a random UUID will be generated.
  • compression: the compression level to be used. It can be one of the following: "fast", "standard", or "high". The default value is "standard".
  • timestamp: the timestamp of the model. If not provided, the current timestamp will be used.
  • params: a dictionary containing hyper-parameters used to train the model.
  • metrics: a dictionary containing metrics of the model, e.g. accuracy, recall, precision, etc.
  • extra: a dictionary containing any other metadata you might want to add to the model. This is a good place to add information about model version, description, author, license, tags, etc.
  • artifacts: a dictionary containing any artifacts associated with the model. The keys are the names of the artifacts, and the values are the paths to the files containing the artifacts. These files will be automatically packed and unpacked when the model is loaded.

Load method

Flama load method is responsible for the efficient unpacking of the model file. The unpacking stage will typically happen within the context of a Flama application. If you're not planning the development of any because you'll be using Flama CLI for this, then you won't have to use the load methods at all. An example of usage of this method:

import flama
model_artifact = flama.load("path/to/file.flm")

The only parameter is the path to the file containing the model. The method returns a ModelArtifact object, which contains the attributes used with the dump method, plus the model itself. The model can be accessed through the model attribute of the ModelArtifact object. As you can easily check, this object contains the artifacts dictionary, which you can inspect to find the path where the artifacts were unpacked automatically. This is a very convenient feature, which allows you to keep track of the artifacts associated with your model, and access them easily, all within the same binary file.

Once we have introduced the methods which allow for packing (flama.dump) and loading (flama.load), we can proceed and introduce how the example files we've been using so far were generated. These files were:


Let's proceed showing how to pack scikit-learn, tensorflow, and pytorch models, respectively. The following examples don't intend to be complete nor functional pieces of code. The examples aim at showing the relevant steps for the purpose of packagin models, so they do not include the following natural stages: data loading and cleansing, training and testing.


import flama
from sklearn.neural_network import MLPClassifier
model = MLPClassifier(activation="tanh", max_iter=2000, hidden_layer_sizes=(10,))
np.array([[0, 0], [0, 1], [1, 0], [1, 1]]),
np.array([0, 1, 1, 0]),
flama.dump(model, "sklearn_model.flm")


import flama
import tensorflow as tf
model = tf.keras.models.Sequential(
tf.keras.layers.Dense(10, activation="tanh"),
tf.keras.layers.Dense(1, activation="sigmoid"),
model.compile(optimizer="adam", loss="mse")
np.array([[0, 0], [0, 1], [1, 0], [1, 1]]),
np.array([[0], [1], [1], [0]]),
flama.dump(model, "tensorflow_model.flm")


import flama
import torch
class Model(torch.nn.Module):
def __init__(self):
self.l1 = torch.nn.Linear(2, 10)
self.l2 = torch.nn.Linear(10, 1)
def forward(self, x):
x = torch.tanh(self.l1(x))
x = torch.sigmoid(self.l2(x))
return x
def _train(self, X, Y, loss, optimizer):
for m in self.modules():
if isinstance(m, torch.nn.Linear):, 1)
steps = X.size(0)
for i in range(2000):
for j in range(steps):
data_point = np.random.randint(steps)
x_var = torch.autograd.Variable(X[data_point], requires_grad=False)
y_var = torch.autograd.Variable(Y[data_point], requires_grad=False)
y_hat = model(x_var)
loss_result = loss.forward(y_hat, y_var)
return self
X = torch.Tensor([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = torch.Tensor([0, 1, 1, 0]).view(-1, 1)
model = Model()
model._train(X, Y, loss=torch.nn.BCELoss(), optimizer=torch.optim.Adam(model.parameters()))
flama.dump(model, "pytorch_model.flm")