π The feature, motivation and pitch
Motivation
PTE files currently carry model weights and delegate data, but lack a standard way to embed model-level metadata β tokenizer config, chat templates, architecture info, etc. Users must ship these as separate files or hardcode them in application code.
A self-contained PTE file that includes all metadata needed to run inference would simplify deployment and reduce integration errors.
PTE already has the infrastructure to support this: NamedData.
Proposal
Use the existing NamedData mechanism (flatbuffer Program.named_data) to store model metadata as key-value pairs. No schema changes needed.
Key naming convention
Follow a namespace.field pattern:
general.name = "Llama-3.2-1B"
general.architecture = "llama"
tokenizer.model = "BPE"
tokenizer.vocab_size = 128256
tokenizer.bos_token_id = 128000
tokenizer.eos_token_id = 128001
tokenizer.chat_template = "{% for message in messages %}..."
context.length = 8192
All metadata keys are prefixed with metadata. in the NamedData store to avoid collision with backend entries (e.g., XNNPACK constant tensors).
Value encoding
Values are stored as raw bytes:
- string β UTF-8 encoded bytes
- int β int64 little-endian
- float β float64 little-endian
- bytes β raw bytes
Python API (export side)
from executorch.extension.llm.export.metadata import add_metadata
edge_manager = to_edge_transform_and_lower(exported_program, ...)
add_metadata(edge_manager, {
"tokenizer.model": "BPE",
"tokenizer.vocab_size": 128256,
"tokenizer.chat_template": template_str,
"general.architecture": "llama",
})
et_program = edge_manager.to_executorch()
C++ API (runtime side)
auto* map = program->get_named_data_map();
auto result = map->get_data("metadata.tokenizer.chat_template");
std::string_view chat_template(
static_cast<const char*>(result->data()), result->size());
Why NamedData over a new schema field?
- Zero schema changes β works with existing PTE format
- Already battle-tested β XNNPACK, AOTI, CoreML all use NamedData in production
- Linear key lookup is fine for ~20-50 metadata keys
- Prefix-based namespace isolation keeps metadata separate from backend data
Open questions
- Should we define a standard set of well-known keys, or leave it fully user-defined?
- Should
EdgeProgramManager expose a public add_metadata() method instead of going through _named_data_store?
- Do we need type tags in the stored bytes, or is type interpretation the caller's responsibility?
Alternatives
No response
Additional context
No response
RFC (Optional)
No response
cc @larryliu0820 @mergennachin @cccclai @helunwencser @jackzhxng
π The feature, motivation and pitch
Motivation
PTE files currently carry model weights and delegate data, but lack a standard way to embed model-level metadata β tokenizer config, chat templates, architecture info, etc. Users must ship these as separate files or hardcode them in application code.
A self-contained PTE file that includes all metadata needed to run inference would simplify deployment and reduce integration errors.
PTE already has the infrastructure to support this: NamedData.
Proposal
Use the existing
NamedDatamechanism (flatbufferProgram.named_data) to store model metadata as key-value pairs. No schema changes needed.Key naming convention
Follow a
namespace.fieldpattern:All metadata keys are prefixed with
metadata.in the NamedData store to avoid collision with backend entries (e.g., XNNPACK constant tensors).Value encoding
Values are stored as raw bytes:
Python API (export side)
C++ API (runtime side)
Why NamedData over a new schema field?
Open questions
EdgeProgramManagerexpose a publicadd_metadata()method instead of going through_named_data_store?Alternatives
No response
Additional context
No response
RFC (Optional)
No response
cc @larryliu0820 @mergennachin @cccclai @helunwencser @jackzhxng