CloudFront is a proxy that sits between the users and the backend servers, called origins. When a request comes in, CloudFront forwards it to one of the origins.
Let’s see what parts of the distribution configuration decides how the routing happens!
Without CloudFront, each origin has its own name or IP address where it can be accessed and clients connect to them directly. For example, EC2 servers can have
Elastic IPs, an API Gateway has its own domain under
Visitor«EC2»Servers«S3Bucket»Static assets«APIGateway2»API10.0.0.1example.com…execute-api.amazonaws.comDirect access
But when these services are behind CloudFront, they use only one domain, either the default
<id>.cloudfront.net or a custom one. Because of this, host-based
routing is not possible.
Visitor«CloudFront»CloudFrontEdge locations«EC2»Servers«S3Bucket»Static assets«APIGateway2»APIexample.com/server//apiPath-based routing
Routing in CloudFront is based on the path of the request. A request that goes to
https://<id>.cloudfront.net/api/users has the path
This is the distinguishing factor that decides which backend server the request goes.
Cache behavior configuration
Cache behaviors are the unit of configuration that decides what happens with an incoming request. They define how to transform a request and the response,
how to cache, what to include or exclude, and most important, which origin to forward to.
Each behavior has a path pattern that defines what paths it can handle. This is a filter expression, an incoming request either matches this pattern or not.
A path pattern supports the
? wildcards, where the former matches 0 or more characters and the latter exactly one. This is not a regex engine
and don’t plan to write complicated patterns here.
Usually, path patterns fall into one of three categories:
- Exact matching:
- Start of the path:
- End of the path, usually the extension:
As there can be more than one cache behavior that matches a given path (
/api/image.jpg is matched by both
*.jpg), CloudFront needs
to break this tie. Because of this, there is an ordering between the behaviors.
There is exactly one that has the default (
*) path pattern, which it called the default cache behavior. This matches all the requests and it is always the
When a request reaches the distribution, CloudFront starts from the top and tries to match the path patterns for each cache behavior. The first one that
Each cache behavior defines an origin via its Origin ID. The first matching behavior’s origin will be used for the request.
To find the origin configuration, select the origin with the matching Origin ID. This contains the domain where CloudFront forwards the request.
Source: Advanced Web Machinery