Name:
interface
Value:
Amplify has re-imagined the way frontend developers build fullstack applications. Develop and deploy without the hassle.

Page updated Nov 13, 2024

Inference Configuration

LLMs have parameters that can be configured to change how the model behaves. This is called inference configuration or inference parameters. LLMs are actually predicting text based on the text input. This prediction is probabilistic, and can be tweaked by adjusting the inference configuration to allow for more creative or deterministic outputs. The proper configuration will depend on your use case.

Bedrock documentation on inference configuration

What is inference?

Inference refers to the process of using a model to generate or predict output based on input data. Inference is using a model after it has been trained on a data set.

Setting inference configuration

All generative AI routes in Amplify accept inference configuration as optional parameters. If you do not provide any inference configuration options, Bedrock will use default ones for that particular model.

a.generation({
aiModel: a.ai.model("Claude 3 Haiku"),
systemPrompt: `You are a helpful assistant`,
inferenceConfiguration: {
temperature: 0.2,
topP: 0.2,
maxTokens: 1000,
}
})

Definitions

Temperature

Affects the shape of the probability distribution for the predicted output and influences the likelihood of the model selecting lower-probability outputs. Temperature is usually* number from 0 to 1, where a lower value will influence the model to select higher-probability options. Another way to think about temperature is to think about creativity. A low number (close to zero) would produce the least creative and most deterministic response.

-* AI21 Labs Jamba models use a temperature range of 0 – 2.0

Top P

Top p refers to the percentage of token candidates the model can choose from for the next token in the response. A lower value will decrease the size of the pool and limit the options to more likely outputs. A higher value will increase the size of the pool and allow for lower-probability tokens.

Max Tokens

This parameter is used to limit the maximum response a model can give.

Default values

ModelTemperatureTop PMax Tokens
AI21 Labs Jamba1.0*0.54096
Meta Llama0.50.9512
Amazon Titan0.70.9512
Anthropic Claude10.999512
Cohere Command R0.30.75512
Mistral Large0.718192

Bedrock documentation on model default inference configuration

-* AI21 Labs Jamba models use a temperature range of 0 – 2.0