Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 8 additions & 7 deletions DEVELOPERS.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,8 +118,7 @@ jax / pytorch / keras for NN deployments.

### Gemma struct contains all the state of the inference engine - tokenizer, weights, and activations

`Gemma(...)` - constructor, creates a gemma model object, which is a wrapper
around 3 things - the tokenizer object, weights, activations, and KV Cache.
`Gemma(...)` - constructor, creates a gemma model object.

In a standard LLM chat app, you'll probably use a Gemma object directly, in
more exotic data processing or research applications, you might decompose
Expand All @@ -129,11 +128,13 @@ only using a Gemma object.

### Use the tokenizer in the Gemma object (or interact with the Tokenizer object directly)

You pretty much only do things with the tokenizer, call `Encode()` to go from
string prompts to token id vectors, or `Decode()` to go from token id vector
outputs from the model back to strings.
The Gemma object contains contains a pointer to a Tokenizer object. The main
operations performed on the tokenizer are to load the tokenizer model from a
file (usually `tokenizer.spm`), call `Encode()` to go from string prompts to
token id vectors, or `Decode()` to go from token id vector outputs from the
model back to strings.

### The main entrypoint for generation is `GenerateGemma()`
### `GenerateGemma()` is the entrypoint for token generation

Calling into `GenerateGemma` with a tokenized prompt will 1) mutate the
activation values in `model` and 2) invoke StreamFunc - a lambda callback for
Expand All @@ -150,7 +151,7 @@ constrained decoding type of use cases where you want to force the generation
to fit a grammar. If you're not doing this, you can send an empty lambda as a
no-op which is what `run.cc` does.

### If you want to invoke the neural network forward function directly call the `Transformer()` function
### `Transformer()` implements the inference (i.e. `forward()` method in PyTorch or Jax) computation of the neural network

For high-level applications, you might only call `GenerateGemma()` and never
interact directly with the neural network, but if you're doing something a bit
Expand Down
7 changes: 6 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,12 @@ For production-oriented edge deployments we recommend standard deployment
pathways using Python frameworks like JAX, Keras, PyTorch, and Transformers
([all model variations here](https://www.kaggle.com/models/google/gemma)).

Community contributions large and small are welcome. This project follows
## Contributing

Community contributions large and small are welcome. See
[DEVELOPERS.md](https://github.com/google/gemma.cpp/blob/main/DEVELOPERS.md)
for additional notes contributing developers and [join the discord by following
this invite link](https://discord.gg/H5jCBAWxAe). This project follows
[Google's Open Source Community
Guidelines](https://opensource.google.com/conduct/).

Expand Down
7 changes: 7 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Examples

In this directory are some simple examples illustrating usage of `gemma.cpp` as
a library beyond the interactive `gemma` app implemented in `run.cc`.

- `hello_world/` - minimal/template project for using `gemma.cpp` as a library.
It sets up the model state and generates text for a single hard coded prompt.
Empty file added experimental/.gitkeep
Empty file.
3 changes: 3 additions & 0 deletions experimental/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Experimental

This directory is for experimental code and features.