Platform Engineering: Building Secure Internal Developer Platforms

January 21, 2025 14 min read

A backend developer at a fast-growing fintech company needs to deploy a new microservice. They open a ticket with the DevOps team requesting a Kubernetes namespace, an RDS database, a CI/CD pipeline, and access to monitoring. The ticket sits in a queue for two weeks. When it finally gets processed, the DevOps engineer copy-pastes configuration from another service, forgets to enable encryption on the database, and grants overly permissive IAM roles because figuring out the correct permissions would take another day.

This scenario plays out thousands of times daily across the industry. Developers wait while operations teams become bottlenecks. Security gets cut corners because proper configuration is hard. Shadow IT emerges as frustrated developers spin up resources outside approved channels. Everyone loses.

Platform engineering offers a different path: build self-service infrastructure where doing the right thing is the easy thing. Instead of tickets and wait times, developers get a catalog of pre-approved, secure components they can provision themselves. Instead of copy-pasted configs that drift from security standards, they get golden paths—opinionated templates where security is built in from the start.

The Platform Engineering Revolution

Platform engineering emerged from a simple observation: the DevOps promise of "you build it, you run it" created cognitive overload. Developers are excellent at writing business logic. Expecting them to also be experts in Kubernetes, Terraform, observability, security, and compliance is unrealistic. The result was either teams that moved slowly while learning infrastructure, or teams that moved fast while cutting corners.

An Internal Developer Platform (IDP) solves this by providing a product-like experience for infrastructure. Think of it as the "Amazon shopping experience" for development resources: a curated catalog of products (infrastructure components), one-click provisioning, and standardized delivery (deployment pipelines)—all with guardrails that prevent unsafe choices.

The key insight is that abstraction is not about hiding complexity—it's about encoding expertise. When your platform team configures a database template, they embed decisions about encryption, backup retention, network isolation, and access patterns. Developers consuming that template get all those decisions for free, without needing to understand them.

Golden Paths: The Secure Way is the Easy Way

Golden paths are the core concept in platform engineering. They're not restrictions—they're paved roads. Just as a highway gets you to your destination faster than bushwhacking through the wilderness, golden paths get developers to production faster than ad-hoc configuration.

Consider the difference between these two experiences:

Without golden paths: A developer wants to create a new service. They look at an existing service's repository, copy files they think are relevant, modify them based on half-remembered conversations, open PRs to infrastructure repos they don't fully understand, wait for reviews from overloaded platform engineers, fix issues found in review, deploy to staging, discover they forgot to configure logging, fix that, redeploy, discover the service can't connect to the database because security groups are wrong, file a ticket, wait...

With golden paths: A developer opens the developer portal, clicks "Create New Service," fills in a form (service name, team owner, data classification), and clicks submit. Three minutes later, they have a repository with a working CI/CD pipeline, a Kubernetes namespace with appropriate resource limits, network policies configured, security scanning enabled, logging integrated, and documentation automatically generated. They write their business logic and deploy.

The magic is that the second approach isn't just faster—it's also more secure. Every security control is built into the template. Developers can't forget to enable encryption because the template doesn't allow unencrypted options.

Building a Secure Service Template with Backstage

Backstage, originally developed at Spotify, has become the de facto standard for developer portals. Its templating system lets you define golden paths that generate entire project scaffolds with a few form inputs.

Here's a template that creates a production-ready microservice with security built in from the start:

# Backstage template: Secure Microservice Generator
# This template creates a complete, production-ready service
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
  name: secure-microservice
  title: Secure Microservice
  description: |
    Creates a production-ready microservice with:
    - Hardened container configuration
    - Security scanning in CI/CD
    - Branch protection and code review requirements
    - Automatic registration in service catalog
spec:
  owner: platform-team
  type: service

  # Gather information from the developer
  parameters:
    - title: Service Details
      required: [name, owner]
      properties:
        name:
          title: Service Name
          type: string
          pattern: '^[a-z][a-z0-9-]*$'
          description: Lowercase letters, numbers, and hyphens only
        owner:
          title: Owning Team
          type: string
          ui:field: OwnerPicker
          description: The team responsible for this service
        dataClassification:
          title: Data Classification
          type: string
          enum: [public, internal, confidential, restricted]
          default: internal
          description: |
            Determines security controls applied:
            - Public: No sensitive data
            - Internal: Employee-only data
            - Confidential: Customer data, requires encryption
            - Restricted: Financial/health data, requires audit logging

  # Execute these steps to create the service
  steps:
    # Generate code from secure skeleton template
    - id: fetch
      name: Generate Secure Code Skeleton
      action: fetch:template
      input:
        url: ./skeleton
        values:
          name: ${{ parameters.name }}
          owner: ${{ parameters.owner }}
          dataClassification: ${{ parameters.dataClassification }}

    # Create repository with security settings enabled
    - id: publish
      name: Create Repository
      action: publish:github
      input:
        repoUrl: github.com?owner=myorg&repo=${{ parameters.name }}
        defaultBranch: main
        # Security: Require reviews before merging to main
        branchProtectionEnabled: true
        requireCodeOwnerReviews: true
        dismissStaleReviews: true
        # Security: Block merges until checks pass
        requiredStatusChecks:
          - security-scan    # Vulnerability scanning
          - secret-scan      # Check for leaked secrets
          - tests            # Unit/integration tests
          - lint             # Code quality

    # Add to service catalog for visibility and governance
    - id: register
      name: Register in Service Catalog
      action: catalog:register
      input:
        repoContentsUrl: ${{ steps.publish.output.repoContentsUrl }}
        catalogInfoPath: /catalog-info.yaml

    # Trigger security setup workflow
    - id: security
      name: Configure Security Scanning
      action: github:actions:dispatch
      input:
        repoUrl: github.com?owner=myorg&repo=${{ parameters.name }}
        workflowId: setup-security.yml
        workflowInputs:
          dataClassification: ${{ parameters.dataClassification }}

Notice how security is woven throughout: branch protection is enabled by default, code reviews are required, security scans must pass before merging, and data classification drives what controls get applied. A developer creating a service handling "restricted" data automatically gets stricter encryption requirements and audit logging—without needing to know those requirements exist.

The Secure Container: Dockerfile as a Golden Path

Container security is notoriously easy to get wrong. A typical internet tutorial produces a Dockerfile running as root, based on a full OS image with hundreds of unnecessary packages, and no health checks. Each of these is a security issue.

Golden path Dockerfiles encode security best practices:

# Golden Path Dockerfile - Security by Default
# This is what developers get when they use our template

# Build stage: Install dependencies and compile
FROM cgr.dev/chainguard/node:latest AS builder
# Why Chainguard? These images are rebuilt daily with latest patches,
# contain minimal packages (smaller attack surface), and include SBOMs

WORKDIR /app

# Install dependencies first (better layer caching)
COPY package*.json ./
RUN npm ci --only=production --ignore-scripts
# --ignore-scripts: Don't run arbitrary postinstall scripts
# This prevents supply chain attacks via malicious npm scripts

COPY . .
RUN npm run build

# Production stage: Minimal runtime image
FROM cgr.dev/chainguard/node:latest
# Using same base ensures compatibility and consistency

WORKDIR /app

# Copy only what's needed for production
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules

# Security: Run as non-root user
# Chainguard images use 'nonroot' by default, but be explicit
USER nonroot:nonroot

# Security: Drop all capabilities and prevent privilege escalation
# This is enforced at runtime via Kubernetes, but document the intent

# Observability: Health check for Kubernetes probes
HEALTHCHECK --interval=30s --timeout=3s --start-period=10s \
  CMD node healthcheck.js || exit 1

# Document the port (doesn't actually expose it)
EXPOSE 8080

# Use array syntax to prevent shell interpretation issues
CMD ["node", "dist/index.js"]

When developers use the golden path, they get all these security decisions for free. They don't need to know why Chainguard images are better than the official Node.js images, or why running as non-root matters. The platform team made those decisions once, and every service benefits.

Self-Service with Guardrails: Freedom Within Boundaries

The traditional approach to infrastructure security was gatekeeping: developers request resources, operations approve or deny. This doesn't scale. Platform engineering replaces gatekeeping with guardrails: developers can provision resources themselves, but the platform constrains what's possible.

Consider database provisioning. The old model: developer files a ticket, DBA reviews the request, provisions the database manually, hopefully remembers to enable encryption, sets up backups, configures access. Lots of human decisions, lots of opportunities for mistakes.

The platform engineering model: developer selects "PostgreSQL Database" from the catalog, specifies size and name, clicks create. The platform provisions a database that's automatically encrypted, backed up, isolated to the correct network, and accessible only from their services. No human decisions, no mistakes possible.

Crossplane: Kubernetes-Native Infrastructure Provisioning

Crossplane extends Kubernetes to provision infrastructure across any cloud provider. Combined with compositions, you can create self-service APIs for cloud resources with security built in:

# Crossplane Composition: Secure PostgreSQL Database
# This defines what developers get when they request a database
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
  name: secure-postgres-aws
  labels:
    provider: aws
    database: postgres
spec:
  compositeTypeRef:
    apiVersion: database.company.io/v1alpha1
    kind: PostgresInstance

  resources:
    # The actual RDS instance
    - name: rds-instance
      base:
        apiVersion: rds.aws.upbound.io/v1beta1
        kind: Instance
        spec:
          forProvider:
            engine: postgres
            engineVersion: "15"

            # These security settings are ENFORCED - developers can't change them
            storageEncrypted: true              # Data at rest encryption
            deletionProtection: true            # Prevent accidental deletion
            publiclyAccessible: false           # Never expose to internet
            autoMinorVersionUpgrade: true       # Automatic security patches

            # Backup policy - also enforced
            backupRetentionPeriod: 7            # Keep 7 days of backups
            backupWindow: "03:00-04:00"         # Backups during off-hours
            copyTagsToSnapshot: true            # Maintain tagging in backups

            # Monitoring - developers don't need to configure this
            enabledCloudwatchLogsExports:
              - postgresql
              - upgrade
            performanceInsightsEnabled: true
            performanceInsightsRetentionPeriod: 7

      # These fields can be customized by developers
      patches:
        - fromFieldPath: "spec.size"
          toFieldPath: "spec.forProvider.instanceClass"
          transforms:
            - type: map
              map:
                small: db.t3.micro
                medium: db.t3.small
                large: db.t3.medium
        - fromFieldPath: "spec.storageGB"
          toFieldPath: "spec.forProvider.allocatedStorage"

    # Security group - tightly controlled network access
    - name: security-group
      base:
        apiVersion: ec2.aws.upbound.io/v1beta1
        kind: SecurityGroup
        spec:
          forProvider:
            description: "Database SG - managed by platform"
            # Only allow connections from application VPC
            ingress:
              - fromPort: 5432
                toPort: 5432
                protocol: tcp
                # CIDR is patched based on environment
            # No egress - database doesn't need to initiate connections
            egress: []

Now developers can create databases with a simple Kubernetes manifest:

# What developers actually write - the platform handles the rest
apiVersion: database.company.io/v1alpha1
kind: PostgresInstance
metadata:
  name: orders-db
  namespace: orders-team
spec:
  size: medium
  storageGB: 50

That's it. From those four lines, they get an encrypted, backed-up, monitored, properly-networked PostgreSQL database. They can't accidentally make it public, skip encryption, or disable backups—those options simply don't exist in the API they're using.

Policy as Code: Automated Guardrails

Golden paths work well when developers use them. But what about resources created outside the golden path? Policy as code creates a safety net that catches misconfigurations regardless of how resources were created.

The key insight is that policies should be preventative, not detective. It's better to block a misconfigured resource from deploying than to detect it running in production. Kubernetes admission controllers make this possible—they intercept every resource creation request and can reject those that violate policy.

Gatekeeper: The Kubernetes Policy Enforcer

OPA Gatekeeper integrates Open Policy Agent with Kubernetes admission control. Here are policies that prevent common security mistakes:

# Policy: Containers Must Run as Non-Root
# Why: Running as root means a container escape gives attackers root on the host
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredSecurityContext
metadata:
  name: require-non-root
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    excludedNamespaces: ["kube-system", "gatekeeper-system"]
  parameters:
    runAsNonRoot: true
    allowPrivilegeEscalation: false
    requiredDropCapabilities: ["ALL"]
    readOnlyRootFilesystem: true

# If someone tries to deploy this:
#   securityContext:
#     runAsUser: 0  # root!
#
# Gatekeeper rejects it with:
#   "Container must not run as root user"

---
# Policy: Containers Must Use Approved Base Images
# Why: Prevent developers from using unscanned, vulnerable images
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sAllowedRepos
metadata:
  name: approved-images-only
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
  parameters:
    repos:
      # Only allow images from trusted, scanned registries
      - "gcr.io/distroless/"      # Google's minimal images
      - "cgr.dev/chainguard/"      # Chainguard hardened images
      - "ghcr.io/myorg/"           # Our organization's registry

# If someone tries to deploy this:
#   image: some-random-dockerhub/image:latest
#
# Gatekeeper rejects it with:
#   "Image from unauthorized repository"

---
# Policy: All Containers Must Have Resource Limits
# Why: Without limits, one container can starve others (DoS) or cause cost explosion
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sContainerLimits
metadata:
  name: require-resource-limits
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
  parameters:
    # Maximum allowed limits (prevents runaway costs)
    cpu: "2"
    memory: "4Gi"

These policies are evaluated on every pod creation—whether the pod comes from a golden path template, a manually-written manifest, or a third-party Helm chart. Nothing escapes the policy check.

Terraform Sentinel: Catching Infrastructure Misconfigurations

For infrastructure outside Kubernetes (S3 buckets, IAM roles, networking), Terraform Sentinel policies provide similar guardrails:

# Sentinel Policy: S3 Buckets Must Be Secure
import "tfplan/v2" as tfplan

# Find all S3 buckets being created or modified
s3_buckets = filter tfplan.resource_changes as _, rc {
    rc.type is "aws_s3_bucket" and
    rc.mode is "managed" and
    (rc.change.actions contains "create" or
     rc.change.actions contains "update")
}

# Rule: All buckets must have encryption enabled
encryption_enabled = rule {
    all s3_buckets as _, bucket {
        bucket.change.after.server_side_encryption_configuration is not null
    }
}

# Rule: All buckets must block public access
public_access_blocked = rule {
    all s3_buckets as _, bucket {
        bucket.change.after.block_public_acls is true and
        bucket.change.after.block_public_policy is true and
        bucket.change.after.ignore_public_acls is true and
        bucket.change.after.restrict_public_buckets is true
    }
}

# Rule: All buckets must have versioning (for recovery from ransomware/deletion)
versioning_enabled = rule {
    all s3_buckets as _, bucket {
        bucket.change.after.versioning[0].enabled is true
    }
}

# Main policy: All rules must pass
main = rule {
    encryption_enabled and
    public_access_blocked and
    versioning_enabled
}

# If someone tries to create a public bucket, Terraform plan fails:
# "Policy check failed: public_access_blocked rule returned false"

This policy runs during `terraform plan`, before any resources are created. A developer can't even see what a misconfigured bucket would look like—the plan itself is rejected.

Integrated Security Scanning: Shift Left, Shift Everywhere

Security scanning shouldn't be something developers think about—it should happen automatically, constantly, invisibly. Platform engineering integrates scanning into every stage of the software lifecycle.

The Unified Security Pipeline

Rather than asking each team to configure their own security tools, the platform provides a reusable workflow that teams can call with one line:

# Platform-provided reusable workflow: Comprehensive Security Scanning
# Teams include this in their CI with: uses: ./.github/workflows/security.yml
name: Platform Security Pipeline
on:
  workflow_call:
    inputs:
      image:
        required: true
        type: string
        description: Container image to scan

jobs:
  security-scan:
    runs-on: ubuntu-latest
    permissions:
      security-events: write  # For uploading to GitHub Security tab
      contents: read

    steps:
      - uses: actions/checkout@v4

      # Secret Detection: Find leaked credentials before they reach production
      - name: Scan for Secrets
        uses: trufflesecurity/trufflehog@main
        with:
          path: ./
          extra_args: --only-verified  # Reduce false positives

      # Static Analysis: Find security bugs in code
      - name: Static Application Security Testing (SAST)
        uses: returntocorp/semgrep-action@v1
        with:
          # Use curated rulesets for security issues
          config: >
            p/security-audit
            p/owasp-top-ten
            p/nodejs
            p/typescript

      # Dependency Scanning: Find vulnerable packages
      - name: Scan Dependencies (SCA)
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'  # Fail build on critical/high

      # Container Scanning: Find issues in the built image
      - name: Scan Container Image
        uses: aquasecurity/trivy-action@master
        with:
          image-ref: ${{ inputs.image }}
          severity: 'CRITICAL,HIGH'
          exit-code: '1'

      # Infrastructure as Code Scanning: Find misconfigurations
      - name: Scan IaC (Terraform/Kubernetes)
        uses: bridgecrewio/checkov-action@master
        with:
          directory: ./infrastructure
          framework: terraform,kubernetes,dockerfile
          soft_fail: false  # Fail on any issue

      # Generate SBOM for compliance and incident response
      - name: Generate Software Bill of Materials
        uses: anchore/sbom-action@v0
        with:
          image: ${{ inputs.image }}
          artifact-name: sbom-${{ github.run_id }}

Individual teams don't need to understand each scanning tool, configure rules, or handle updates. They call the platform workflow, and security happens:

# In a team's CI workflow - one line enables comprehensive security
jobs:
  security:
    uses: platform-team/workflows/.github/workflows/security.yml@v1
    with:
      image: ghcr.io/myorg/${{ github.repository }}:${{ github.sha }}

The Developer Experience: Making Security Invisible

The ultimate goal of platform engineering is to make security invisible—not absent, but so seamlessly integrated that developers don't have to think about it. Security becomes a property of the platform, not a task for developers.

This requires more than just automation; it requires excellent developer experience:

Security scan results appear in pull requests, not in separate dashboards developers never check
Remediation guidance is specific and actionable: "Upgrade lodash from 4.17.15 to 4.17.21 to fix CVE-2020-8203" instead of "Critical vulnerability found"
The service catalog shows security posture at a glance: Which services have open vulnerabilities? When was the last security scan?
Documentation is generated, not written: Golden paths produce consistent architecture diagrams, runbooks, and security documentation
Compliance evidence is collected automatically: SOC 2 auditors get reports, not interview requests

Security Visibility in the Developer Portal

Backstage plugins integrate security information directly into the developer experience:

# Service catalog entry with security annotations
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  annotations:
    # Integrations pull security data into the portal
    snyk.io/org-name: myorg
    snyk.io/project-ids: payment-service
    trivy.dev/image: ghcr.io/myorg/payment-service:latest
    sonarqube.org/project-key: payment-service

    # Compliance tracking
    compliance.company.io/frameworks: "pci-dss,soc2"
    compliance.company.io/data-classification: "restricted"
    compliance.company.io/last-audit: "2025-01-15"
    compliance.company.io/audit-status: "passed"

spec:
  type: service
  lifecycle: production
  owner: payments-team
  dependsOn:
    - resource:payments-database
    - component:auth-service

When developers view the payment-service in Backstage, they see a unified dashboard showing vulnerability counts from Snyk, code quality from SonarQube, container scan results from Trivy, compliance status, dependencies, and team ownership—all in one place.

Continuous Compliance: Security as a Feature

Compliance is often treated as periodic audit preparation—scramble to collect evidence, document controls, and pray you pass. Platform engineering transforms compliance from an event into a continuous process.

When security controls are built into golden paths, compliance evidence generates automatically:

Encryption at rest: Proven by Crossplane configurations that don't allow unencrypted databases
Access control: Proven by Gatekeeper policies that enforce RBAC
Vulnerability management: Proven by CI/CD scan results showing all deployments pass security checks
Change management: Proven by GitHub branch protection requiring code reviews
Audit logging: Proven by platform configuration that enables logging for all services

When auditors ask "how do you ensure databases are encrypted?", the answer isn't "we train developers to enable encryption." It's "our database provisioning API doesn't have an unencrypted option. Here's the Crossplane composition proving it."

Building Your Platform: Where to Start

Platform engineering is a journey, not a destination. You don't need to build everything at once. Start with the highest-impact, lowest-effort improvements:

Phase 1: Golden Paths for Common Patterns

Identify your organization's most common deployment patterns. What does 80% of your software look like? Build golden paths for those first. A template for "Node.js microservice with PostgreSQL" might cover half your organization's services.

Phase 2: Policy Guardrails

Deploy policy enforcement for the highest-risk misconfigurations. Running containers as root, using public S3 buckets, skipping encryption—these are the mistakes that cause breaches. Block them with policy before adding more sophisticated controls.

Phase 3: Self-Service Infrastructure

Replace tickets with APIs. If teams frequently request databases, message queues, or cache clusters, build self-service provisioning. Each ticket eliminated is developer time saved and standardization improved.

Phase 4: Developer Portal

Unify the experience with a developer portal. Backstage is the standard choice, but the key is having one place where developers can discover services, create new ones, view security status, and find documentation.

The Platform Mindset

The most important change in platform engineering isn't tooling—it's mindset. Platform teams are product teams. Their customers are internal developers. Their success metric is developer productivity, not infrastructure metrics.

When security is built into the platform, it stops being friction and becomes a feature. Developers don't complain about security requirements; they appreciate that the platform handles compliance for them. Security teams don't fight with developers; they collaborate on building better golden paths.

The goal isn't to control developers—it's to free them. Free them from wrestling with infrastructure. Free them from deciphering compliance requirements. Free them from configuring security tools. Free them to write the business logic that actually matters. That's what platform engineering is about.