Inference Configuration

LLMs have parameters that can be configured to change how the model behaves. This is called inference configuration or inference parameters. LLMs are actually predicting text based on the text input. This prediction is probabilistic, and can be tweaked by adjusting the inference configuration to allow for more creative or deterministic outputs. The proper configuration will depend on your use case.

Bedrock documentation on inference configuration

What is inference?

Inference refers to the process of using a model to generate or predict output based on input data. Inference is using a model after it has been trained on a data set.

Setting inference configuration

All generative AI routes in Amplify accept inference configuration as optional parameters. If you do not provide any inference configuration options, Bedrock will use default ones for that particular model.

a.generation({
  aiModel: a.ai.model("Claude 3.5 Haiku"),
  systemPrompt: `You are a helpful assistant`,
  inferenceConfiguration: {
    temperature: 0.2,
    topP: 0.2,
    maxTokens: 1000,

Definitions

Temperature

Affects the shape of the probability distribution for the predicted output and influences the likelihood of the model selecting lower-probability outputs. Temperature is usually* number from 0 to 1, where a lower value will influence the model to select higher-probability options. Another way to think about temperature is to think about creativity. A low number (close to zero) would produce the least creative and most deterministic response.

-* AI21 Labs Jamba models use a temperature range of 0 – 2.0

Top P

Top p refers to the percentage of token candidates the model can choose from for the next token in the response. A lower value will decrease the size of the pool and limit the options to more likely outputs. A higher value will increase the size of the pool and allow for lower-probability tokens.

Max Tokens

This parameter is used to limit the maximum response a model can give.

Default values

Model	Temperature	Top P	Max Tokens
AI21 Labs Jamba	1.0*	0.5	4096
Meta Llama	0.5	0.9	512
Amazon Titan	0.7	0.9	512
Anthropic Claude	1	0.999	512
Cohere Command R	0.3	0.75	512
Mistral Large	0.7	1	8192

Bedrock documentation on model default inference configuration

-* AI21 Labs Jamba models use a temperature range of 0 – 2.0