AWS Solutions Architect interviews blend whiteboard architecture with deep service knowledge. These are the questions from real SAA-C03 interview panels at tech companies and consultancies.
Most AWS architect interviews have three layers: (1) service knowledge — what does X do and when do you use it?, (2) architecture design — given these requirements, how would you build it?, and (3) cost/reliability trade-offs — why not just use Y instead? Candidates who ace the first layer often stumble on layers 2 and 3. This guide focuses on what distinguishes good answers from great ones.
An IAM user is a permanent identity with long-term credentials (access key + secret). An IAM role is a temporary identity assumed by a principal (user, service, or account) that needs specific permissions for a specific duration. Roles issue temporary security credentials via STS (Security Token Service). Best practice: EC2 instances, Lambda functions, and ECS tasks should always use IAM roles (instance profiles / execution roles), never hard-coded access keys. Roles can also be assumed cross-account — a role in Account B can trust Account A, letting Account A's resources access Account B's resources without sharing credentials.
(1) Immediately deactivate (not delete) the access key in IAM Console → Users → Security Credentials. (2) Check CloudTrail for the last 24–72 hours of API calls using those credentials — look for unusual regions, new IAM users/roles created, new EC2 instances, S3 GetObject/PutObject on sensitive buckets. (3) Delete the key after verifying it's disabled. (4) Check GuardDuty for alerts related to those credentials. (5) Rotate any resources that used those keys (update environment variables, Secrets Manager). (6) Enable AWS Config rule access-keys-rotated and consider AWS Secrets Manager for future key management. Fast action on step 1 matters — automated scanners can detect and exploit exposed keys within minutes.
Security Groups are stateful — if you allow inbound traffic on port 80, the return traffic is automatically allowed outbound. They operate at the instance/ENI level and only support Allow rules (everything not allowed is denied implicitly). NACLs are stateless — you must explicitly allow both inbound and outbound traffic. They operate at the subnet level, support both Allow and Deny rules, and rules are evaluated in order by rule number (lowest first). Use NACLs for broad subnet-level controls (block a known malicious IP range); use Security Groups for fine-grained instance-level access control. In practice, most security lives in Security Groups; NACLs add a subnet-level backstop.
Internet Gateway (IGW): attached to a VPC, enables bidirectional internet access for resources with public IPs in public subnets. NAT Gateway: in a public subnet, enables instances in private subnets to initiate outbound internet connections without being reachable from the internet — it translates private IPs to a public IP. Transit Gateway (TGW): a regional router that connects multiple VPCs and on-premises networks through a single hub. Without TGW, connecting 10 VPCs requires 45 peering connections (n*(n-1)/2); with TGW, you need 10 attachments. TGW also supports VPN and Direct Connect attachments.
Backup and Restore (highest RTO/RPO, lowest cost): back up data to S3/Glacier, restore from scratch on disaster. Suitable for non-critical systems where hours of downtime are acceptable. Pilot Light: minimal version of the environment runs in DR region (database is replicated, core services running), scale up on disaster. RPO: minutes, RTO: tens of minutes. Warm Standby: scaled-down full environment runs in DR region (smaller instance types), scale to production on disaster. RTO: minutes. Active-Active / Multi-Site: full production in both regions simultaneously, Traffic Manager or Route 53 distributes load. RPO/RTO: near-zero. Cost: 2× infrastructure. Match the strategy to your SLA — active-active is overkill for non-critical internal apps.
Strong answer covers: Route 53 with latency-based or geolocation routing → CloudFront CDN (cache static assets, reduce origin load) → Application Load Balancer in at least two AZs → Auto Scaling Group of EC2 instances (or ECS Fargate for containers) across multiple AZs → RDS Aurora (Multi-AZ, with read replicas for read-heavy workloads) → ElastiCache (Redis) for session storage and frequently-read data → S3 for static assets. Key points to mention: stateless application tier (sessions in Redis, not local memory), health checks at every layer, CloudWatch alarms with Auto Scaling policies, and a blue/green deployment strategy. If pushed on cost, mention spot instances for stateless worker fleets with on-demand for the web tier.
Test yourself with InterviUni's AWS Cloud Engineer mock interview — real scenario questions, scored answers, hire verdict.
Practice AI mock interviews, check your ATS score, or start a cert course — free.