Logging Amazon Textract API Calls with AWS CloudTrail

Amazon Textract is integrated with AWS CloudTrail, a service that provides a record of actions taken by a user, role, or an AWS service in Amazon Textract. CloudTrail captures all API calls for Amazon Textract as events. The calls captured include calls from the Amazon Textract console and code calls to the Amazon Textract API operations.

If you create a trail, you can enable continuous delivery of CloudTrail events to an Amazon S3 bucket, including events for Amazon Textract. If you don't configure a trail, you can still view the most recent events in the CloudTrail console in Event history. Using the information collected by CloudTrail, you can determine the request that was made to Amazon Textract, the IP address that the request was made from, who made the request, when it was made, and additional details.

To learn more about CloudTrail, see the AWS CloudTrail User Guide.

Amazon Textract Information in CloudTrail

CloudTrail is enabled on your AWS account when you create the account. When activity occurs in Amazon Textract, that activity is recorded in a CloudTrail event along with other AWS service events in Event history. You can view, search, and download recent events in your AWS account. For more information, see Viewing Events with CloudTrail Event History.

For an ongoing record of events in your AWS account, including events for Amazon Textract, create a trail. A trail enables CloudTrail to deliver log files to an Amazon S3 bucket. By default, when you create a trail in the console, the trail applies to all AWS Regions. The trail logs events from all Regions in the AWS partition and delivers the log files to the Amazon S3 bucket that you specify. Additionally, you can configure other AWS services to further analyze and act upon the event data that's collected in CloudTrail logs. For more information, see the following:

All Amazon Textract operations are logged by CloudTrail and are documented in the API Reference. For example, calls to the DetectDocumentText, AnalyzeDocument, and GetDocumentText actions generate entries in the CloudTrail log files.

Every event or log entry contains information about who generated the request. The identity information helps you determine the following:

  • Whether the request was made with root or user credentials.

  • Whether the request was made with temporary security credentials for a role or federated user.

  • Whether the request was made by another AWS service.

For more information, see the CloudTrail userIdentity Element.

Request Parameters and Response Fields That Aren't Logged

For privacy purposes, certain request parameters and response fields aren't logged—for example, request image bytes or response bounding box information. Amazon S3 bucket names and file names supplied in request parameters are provided in CloudTrail log entries. No information about image bytes passed in a request is provided in a CloudTrail log. The following table shows the input parameters and response parameters that aren't logged for each Amazon Textract operation.

Operation Request Parameters Response Fields

AnalyzeDocument

Bytes

All

DetectDocumentText

Bytes

All

StartDocumentAnalysis

None

None

GetDocumentAnalysis

None

All

StartDocumentTextDetection

None

None

GetDocumentTextDetection

None

All

Understanding Amazon Textract Log File Entries

A trail is a configuration that enables delivery of events as log files to an Amazon S3 bucket that you specify. CloudTrail log files contain one or more log entries. An event represents a single request from any source and includes information about the requested operation, the date and time of the operation, request parameters, and so on. CloudTrail log files aren't an ordered stack trace of the public API calls, so they don't appear in any specific order.

The following example shows a CloudTrail log entry that demonstrates the AnalyzeDocument operation. The image bytes for the input document and the analysis results (responseElements) aren't logged.

{ "eventVersion": "1.05", "userIdentity": { "type": "IAMUser", "principalId": "AIDACKCEVSQ6C2EXAMPLE", "arn": "arn:aws:iam::111111111111:user/janedoe", "accountId": "111111111111", "accessKeyId": "AIDACKCEVSQ6C2EXAMPLE", "userName": "janedoe" }, "eventTime": "2019-04-03T23:56:31Z", "eventSource": "textract.amazonaws.com", "eventName": "AnalyzeDocument", "awsRegion": "us-east-1", "sourceIPAddress": "198.51.100.0", "userAgent": "", "requestParameters": { "document": {}, "featureTypes": [ "TABLES" ] }, "responseElements": null, "requestID": "e387676b-d1f0-4ea7-85d6-f5a344052dce", "eventID": "c5db79ce-e4ea-4401-8517-784481d559f7", "eventType": "AwsApiCall", "recipientAccountId": "111111111111" }

The following example shows a CloudTrail log entry for the StartDocumentAnalysis operation. The log entry includes the Amazon S3 bucket name and image file name in documentLocation. The log also includes the operation response.

{ "Records": [ { "eventVersion": "1.05", "userIdentity": { "type": "IAMUser", "principalId": "AIDACKCEVSQ6C2EXAMPLE", "arn": "arn:aws:iam::111111111111:user/janedoe", "accountId": "11111111111", "accessKeyId": "AKIAIOSFODNN7EXAMPLE", "userName": "janedoe" }, "eventTime": "2019-04-04T01:42:24Z", "eventSource": "textract.amazonaws.com", "eventName": "StartDocumentAnalysis", "awsRegion": "us-east-1", "sourceIPAddress": "198.51.100.0", "userAgent": "", "requestParameters": { "documentLocation": { "s3Object": { "bucket": "bucket", "name": "document.png" } }, "featureTypes": [ "TABLES" ] }, "responseElements": { "jobId": "f3c718b444fa603d5d625ab967008f4b620d4650c9db8ca1cae01ef7efe51373" }, "requestID": "9ae352e8-9de1-41ad-b77b-85aa348c2e82", "eventID": "f741bca0-c3cb-4805-82ea-baf76439deef", "eventType": "AwsApiCall", "recipientAccountId": "111111111111" } ] }