Inference Configuration
LLMs have parameters that can be configured to change how the model behaves. This is called inference configuration or inference parameters. LLMs are actually predicting text based on the text input. This prediction is probabilistic, and can be tweaked by adjusting the inference configuration to allow for more creative or deterministic outputs. The proper configuration will depend on your use case.
Bedrock documentation on inference configuration
What is inference?
Inference refers to the process of using a model to generate or predict output based on input data. Inference is using a model after it has been trained on a data set.
Setting inference configuration
All generative AI routes in Amplify accept inference configuration as optional parameters. If you do not provide any inference configuration options, Bedrock will use default ones for that particular model.
a.generation({ aiModel: a.ai.model("Claude 3.5 Haiku"), systemPrompt: `You are a helpful assistant`, inferenceConfiguration: { temperature: 0.2, topP: 0.2, maxTokens: 1000, }})
Definitions
Temperature
Affects the shape of the probability distribution for the predicted output and influences the likelihood of the model selecting lower-probability outputs. Temperature is usually* number from 0 to 1, where a lower value will influence the model to select higher-probability options. Another way to think about temperature is to think about creativity. A low number (close to zero) would produce the least creative and most deterministic response.
-* AI21 Labs Jamba models use a temperature range of 0 – 2.0
Top P
Top p refers to the percentage of token candidates the model can choose from for the next token in the response. A lower value will decrease the size of the pool and limit the options to more likely outputs. A higher value will increase the size of the pool and allow for lower-probability tokens.
Max Tokens
This parameter is used to limit the maximum response a model can give.
Default values
Model | Temperature | Top P | Max Tokens |
---|---|---|---|
AI21 Labs Jamba | 1.0* | 0.5 | 4096 |
Meta Llama | 0.5 | 0.9 | 512 |
Amazon Titan | 0.7 | 0.9 | 512 |
Anthropic Claude | 1 | 0.999 | 512 |
Cohere Command R | 0.3 | 0.75 | 512 |
Mistral Large | 0.7 | 1 | 8192 |
Bedrock documentation on model default inference configuration
-* AI21 Labs Jamba models use a temperature range of 0 – 2.0