Get
Fetching models from a hub
So far we have served models that were already packaged as .flm files. But where do those files come from when the
model lives on a remote hub rather than on your disk? The command get downloads a model from a supported source and
serialises it into the Flama .flm format in a single step, ready for serve or
model.
This command works for both traditional predictive models and generative models. The artifact family is declared explicitly and recorded in the manifest, so it drives runtime dispatch when the model is later loaded. To inspect the command options, run:
flama get --help
Usage: flama get [OPTIONS] MODEL_NAME Download and package a model as .flm.
Download a model from a supported source and serialize it into Flama's .flm format, ready for serving with 'flama serve' or interaction with 'flama model'. The artifact family must be declared explicitly via ``--family``: ML artifacts run through the framework recorded in the manifest, while LLM artifacts are dispatched to vLLM or MLX at load time depending on what is installed.
Example: flama get --source huggingface --family ml scikit-learn/Fish-Weight flama get --source huggingface --family llm Qwen/Qwen2.5-0.5Bāā Options āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā®ā --source Model source provider. āā --family Artifact family recorded in the āā manifest. Use 'ml' for traditional ML āā models and 'llm' for large language āā models. The choice is persisted in the āā .flm manifest and drives runtime āā dispatch at load time; it cannot be āā changed without repacking. āā -o, --output TEXT Output .flm path (default: āā <model-name>.flm). āā --max-concurrent INTEGER RANGE Maximum number of files to download āā concurrently. āā --help Show this message and exit. āā°āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāÆParameters
Selecting the model
The model to fetch is identified by a single positional argument together with two required options that describe where it comes from and how it should be treated:
- MODEL_NAME (required): the source-specific model identifier, e.g. a HuggingFace
repository such as
google/gemma-4-E2B-it. - source (required): the provider to download from. Currently
huggingfaceis supported. - family (required):
mlfor traditional predictive models, orllmfor generative models. This choice is persisted in the.flmmanifest and drives runtime dispatch, so it cannot be changed without repacking.
Output options
The remaining options control where the artifact is written and how many files are fetched at once:
- output (
-o): the destination.flmpath. Defaults to<model-name>.flmwith slashes replaced by underscores, sogoogle/gemma-4-E2B-itbecomesgoogle_gemma-4-E2B-it.flm. - max-concurrent: the maximum number of files to download in parallel (default: 8).
Examples
Predictive model
To download and package a predictive model from HuggingFace as an ml artifact:
flama get --source huggingface --family ml scikit-learn/Fish-Weight
Downloading āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā 100% 4.1 MB 12.0 MB/s 0:00:00Packaging...Model saved to scikit-learn_Fish-Weight.flmGenerative model
To download and package a generative model as an llm artifact:
flama get --source huggingface --family llm google/gemma-4-E2B-it
Downloading āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā 100% 5.1 GB 24.3 MB/s 0:00:00Packaging...Model saved to google_gemma-4-E2B-it.flmChoosing the output path
By default the artifact is named after the model. Pass --output to write it elsewhere:
flama get --source huggingface --family llm google/gemma-4-E2B-it --output models/assistant.flm
Downloading āāāāāāāāāāāāāāāāāāāāāāāāāāāāāā 100% 5.1 GB 31.0 MB/s 0:00:00Packaging...Model saved to models/assistant.flmOnce a model is packaged, verify it with model inspect and serve it with serve. For more on serving generative models, see the Generative AI section.