Key Takeaways
- Every AWS account gets a default VPC, but production workloads need a custom VPC with explicit CIDR ranges, public/private subnet separation, and controlled routing.
- Security groups are stateful (resource-level); NACLs are stateless (subnet-level) — use both for layered defense.
- VPC peering works for 2–3 VPCs; Transit Gateway is required for hub-and-spoke architectures with 4+ VPCs or on-premises connectivity.
- S3 and DynamoDB Gateway Endpoints are free and eliminate data transfer costs for high-volume pipelines — always enable them.
- AI/ML workloads using Bedrock or SageMaker should use Interface Endpoints to keep inference traffic off the public internet.
What Is an AWS VPC?
A Virtual Private Cloud (VPC) is a logically isolated network environment inside AWS. Every resource you launch — EC2 instances, RDS databases, Lambda functions, EKS clusters — lives inside a VPC. AWS gives every account a default VPC in each region, but production systems almost always use a custom VPC for full control over IP ranges, subnet layout, routing rules, and security boundaries.
The core building block is a CIDR block. A /16 VPC gives 65,536 IP addresses, while a /24 subnet provides 251 usable IPs (AWS reserves 5 per subnet). The default limit is 5 VPCs per region, but this can be raised via a support request. Once created, a VPC's primary CIDR block cannot be changed — choose carefully upfront.
"A properly designed VPC is the foundational security and networking layer that everything else in AWS sits on top of."
Core VPC Components
Four components form the backbone of every VPC architecture. Understanding each one — and the relationship between them — is prerequisite knowledge for any AWS certification or production role.
Subnets
Subdivisions of your VPC CIDR range. Public subnets route through an Internet Gateway; private subnets don't. Each subnet lives in one Availability Zone.
Route Tables
Rules that determine where network traffic is directed. Every subnet is associated with exactly one route table. Adding a 0.0.0.0/0 → IGW route makes a subnet "public."
Internet Gateway
Horizontally scaled, redundant AWS-managed gateway that provides internet access for resources in public subnets. Attached at the VPC level — one IGW per VPC.
NAT Gateway
Allows private subnet resources to initiate outbound internet connections (for software updates, API calls) without accepting inbound connections. Charged per hour + per GB.
Default VPC vs Custom VPC
AWS creates a default VPC in every region with public subnets, an attached IGW, and a permissive security group. This is intentionally convenient for experimentation — everything works without configuration. Production workloads require a custom VPC for security, compliance, and network control.
Default VPC
- All subnets are public (auto-assign public IPs)
- IGW already attached and routed
- Great for learning and quick demos
- No subnet separation (public vs private)
- Can accidentally expose resources to the internet
- Permissive default security group (all traffic allowed internally)
Custom VPC
- Private subnets isolated by default (no IGW route)
- Explicit route table control per subnet tier
- Required for RDS, EKS, ECS in production
- Supports multi-AZ private subnet layouts
- Custom CIDR prevents overlap with on-premises networks
- Required for compliance (PCI-DSS, HIPAA, FedRAMP)
Subnet CIDR Planning
Proper CIDR planning at the start avoids painful re-architecting later. A standard 3-tier layout uses a /16 VPC subdivided into /24 subnets — one pair per tier (public, app, data) across two or three Availability Zones.
# VPC VPC CIDR: 10.0.0.0/16 # 65,536 IPs total # Public subnets (load balancers, NAT GW, bastion) Public-A: 10.0.1.0/24 # us-east-1a Public-B: 10.0.2.0/24 # us-east-1b # Private app subnets (EC2, Lambda, ECS, EKS) Private-App-A: 10.0.3.0/24 # us-east-1a Private-App-B: 10.0.4.0/24 # us-east-1b # Private data subnets (RDS, ElastiCache, OpenSearch) Private-Data-A: 10.0.5.0/24 # us-east-1a Private-Data-B: 10.0.6.0/24 # us-east-1b # Reserve 10.0.10-20.x for future services / peered VPCs
Security Groups vs NACLs
Both security groups and Network Access Control Lists (NACLs) control traffic flow, but they operate at different levels and with different behaviors. Most architectures rely primarily on security groups and use NACLs as a secondary defense layer.
| Dimension | Security Groups | NACLs |
|---|---|---|
| Attachment level | Resource (EC2, RDS, Lambda) | Subnet |
| Stateful? | Yes — return traffic auto-allowed | No — both directions explicit |
| Rule evaluation | All rules evaluated; most permissive wins | Rules evaluated by number; first match wins |
| Allow/Deny | Allow only (implicit deny) | Both Allow and Deny rules supported |
| Default inbound | All traffic denied | All traffic allowed (default VPC NACL) |
| Best for | Per-resource port controls | Blocking IP ranges at subnet boundary |
| Rule limit | 60 inbound + 60 outbound per SG | 20 inbound + 20 outbound per NACL |
"Security groups are the primary control. NACLs provide defense-in-depth — particularly for blocking specific IP ranges before traffic reaches any resource."
VPC Peering vs Transit Gateway
When multiple VPCs need to communicate — across accounts, across regions, or both — the choice between VPC Peering and Transit Gateway depends on the number of VPCs and whether on-premises connectivity is required.
| Dimension | VPC Peering | Transit Gateway |
|---|---|---|
| Architecture | Point-to-point between 2 VPCs | Hub-and-spoke (many VPCs) |
| Transitive routing | Not supported | Supported |
| On-premises (VPN/DX) | Not supported via peering | Supported natively |
| Cross-region | Supported (inter-region peering) | Supported (inter-region attachment) |
| Data transfer cost | Same-AZ: free; cross-AZ: $0.01/GB | $0.02/GB processed |
| Hourly cost | Free | $0.05/hr per attachment |
| Best for | 2–3 VPCs, simple connections | 4+ VPCs, multi-account, hybrid cloud |
VPC Endpoints for AWS Services
By default, AWS API calls from inside a VPC route over the public internet — even for services in the same region. VPC endpoints keep that traffic on the private AWS backbone, eliminating data transfer charges and improving security posture.
S3 Gateway (Free)
Route all S3 traffic through the private AWS network. No data transfer charges for S3 access from within the VPC. Always enable this — zero cost, immediate savings.
DynamoDB Gateway (Free)
Same as S3: free gateway endpoint keeps DynamoDB traffic private. No hourly cost, no GB charge. Every VPC should have both gateway endpoints enabled.
Secrets Manager Interface
$0.01/hr per AZ + $0.01/GB. Keeps secret retrieval off the public internet. Required for compliant architectures handling credentials in private subnets.
Bedrock / SageMaker Interface
$0.01/hr per AZ. Routes all AI inference traffic through the AWS private network. Essential for regulated industries (HIPAA, FedRAMP) using foundation models.
AI/ML Networking Best Practices
Teams building AI pipelines on AWS frequently underestimate networking costs and compliance requirements. The architecture decisions below apply whether the workload uses Bedrock, SageMaker, or custom model inference.
Cost Reduction
- Enable S3 and DynamoDB Gateway Endpoints immediately (free)
- Add Bedrock/SageMaker Interface Endpoints for high-volume inference ($0.01/hr)
- Deploy NAT Gateway per AZ to avoid cross-AZ charges
- Use S3 Transfer Acceleration only when actually needed — it adds cost
- Monitor VPC Flow Logs to identify unexpected inter-AZ traffic patterns
Security & Compliance
- Run SageMaker training jobs in private subnets with no internet access
- Store model artifacts in private S3 with VPC endpoint + bucket policy
- Use security groups on SageMaker endpoints (not just NACLs)
- Enforce VPC endpoint conditions in S3 bucket policies for sensitive datasets
- Enable VPC Flow Logs for network forensics and compliance audit trails
Frequently Asked Questions
What is an AWS VPC and why do I need one?
An AWS VPC is a logically isolated section of the AWS cloud where resources launch inside a network you define. Every account gets a default VPC, but production workloads should use a custom VPC to control IP ranges, subnet topology, routing, and security rules. Without explicit VPC design, EC2 instances, RDS databases, and Lambda functions share a flat network with unpredictable exposure.
What is the difference between a security group and a NACL?
Security groups are stateful firewalls attached to individual resources. If inbound traffic is allowed on port 443, return traffic is automatically allowed. NACLs are stateless firewalls attached to subnets — inbound and outbound rules must be configured independently. Security groups are the primary control in most architectures; NACLs provide a second layer, especially for blocking IP ranges at the subnet boundary.
When should I use VPC peering vs Transit Gateway?
VPC peering creates a direct point-to-point connection between two VPCs. It's simple and cost-effective for two or three VPCs. The limitation: peering is non-transitive, so if VPC A peers with B and B peers with C, traffic cannot flow A → C through B. Transit Gateway acts as a central hub routing traffic among many VPCs and on-premises networks. Four or more VPCs, or any hybrid-cloud connectivity, almost always justifies Transit Gateway despite its per-attachment hourly cost.
How do VPC endpoints save money and improve security?
Without endpoints, AWS API calls from inside a VPC route over the public internet — even for services in the same region, incurring data transfer charges. VPC endpoints route traffic through the AWS private backbone instead. S3 and DynamoDB Gateway Endpoints are free. Interface Endpoints for Bedrock, SageMaker, and Secrets Manager cost $0.01/hr per AZ. For AI/ML pipelines moving large datasets through S3, the transfer savings quickly exceed the endpoint cost — and the security benefit (traffic never leaves AWS) is equally important for regulated workloads.
VPC Architecture Verdict
The 3-tier layout (public → private app → private data) with a /16 CIDR, one NAT Gateway per AZ, both free gateway endpoints (S3 and DynamoDB) enabled, and Interface Endpoints for any AI/ML services is the right starting architecture for 95% of production workloads. Add Transit Gateway only when connecting four or more VPCs or integrating on-premises networks. Security groups handle per-resource firewall rules; NACLs add subnet-level defense for compliance requirements. Get this foundation right once — it rarely needs to change.
Build Production AWS Infrastructure in 2 Days
VPC design, IAM security, Lambda serverless, and AI/ML networking — hands-on in a live classroom.
Reserve Your Seat — $1,490VPC knowledge is where AI infrastructure meets real-world enterprise security requirements.
VPC is the piece of AWS that separates developers who can ship personal projects from developers who can work in regulated enterprise environments. Most tutorials treat VPC as a routing and networking concept — subnets, route tables, internet gateways. That understanding is necessary but insufficient for the context where VPC knowledge actually pays off: deploying AI services in environments with data residency requirements, HIPAA or FedRAMP compliance needs, or corporate network policies that require all traffic to stay within a private network perimeter. That describes most of the enterprise customers who are actually paying for AI infrastructure.
The VPC capability that matters most specifically for AI workloads in 2026 is VPC endpoints for AWS services. When your Lambda function calls Bedrock or your SageMaker notebook queries S3, the traffic by default traverses the public internet. In a regulated environment, that's often not acceptable — all traffic must stay within the AWS network. PrivateLink endpoints for Bedrock, S3, and other services solve this, but they add cost ($7.20/endpoint/month per AZ) and require NAT Gateway removal planning. Most AI infrastructure guides skip over this, but it's the first requirement that enterprise customers impose when moving AI pilots to production.
For cloud engineers learning AI infrastructure: understanding VPC endpoint configuration for Bedrock and SageMaker is the skill that gets you into enterprise-grade deployments. Add it to your hands-on practice list alongside the standard Lambda and S3 fundamentals.