Select Language:
If you want to keep track of how much you’re using GPT-5.4, you can do so easily by checking the metrics through Amazon CloudWatch. The key is the bedrock-mantle endpoint, which provides customer-facing data on how many inferences you’ve made, how many tokens you’ve used, and any errors that might have occurred.
To monitor your usage for billing, there are some important metrics to look out for:
– TotalInputTokens and TotalOutputTokens show the total number of billable tokens processed during each period. These are available at the account, project, and model levels, giving you a big-picture view.
– InputTokens and OutputTokens give a more detailed count for individual inferences at the project and model level. This is helpful for understanding per-inference costs and statistics.
You should also keep an eye on your total inference volume and error rates:
– Inferences track the total number of completed requests.
– InferenceClientErrors show any client-side errors, helping you identify issues that might affect your usage.
The metrics are available at four levels: Account, Project, Model, and a combined Project+Model view. This flexibility makes it easier to see where your usage is coming from and how costs are distributed.
Cost attribution is straightforward with these metrics, as they include a Project dimension to help assign expenses properly. The data is emitted based on the region handling your requests, giving regional insight into your usage.
You can view and analyze these metrics using the CloudWatch console, AWS CLI, or APIs. This makes it simple to get detailed logs and metrics about your GPT-5.4 usage via the bedrock-mantle endpoint, so you can keep your billing aligned with your actual consumption.
For more details, you can refer to the official Amazon Bedrock documentation on monitoring mantle metrics and specific information about GPT-5.4.



