Saying, "the tools don't matter" means - no tool can overcome a regressive culture or process. It does NOT mean all tools can deliver equal result. Used in a supporting culture and with effective team organization and empowerment, certain tools deliver dramatically better results (velocity, sustainability, reliability, cost).
Everything is an Architectural concern.
"Automation" is not the same thing as Infrastructure as Code ("IaC"). IaC is using test-driven development in the continuous integration and delivery of infrastructure. Use IaC if you want to deliver infrastructure that is resilient, secure and capable of safe continuous change (change at scale).
Architect your infrastructure code in modular, multi-repo, multi-pipeline Domain bounded patterns, striving to maintain loose coupling both between these pipelines and amongst the technologies and capabilities delivered. Most rationales for not doing so are nearly always anti-patterns (and expensive).
In building Platforms the following characteristics must be sustained...
- complexity constrained
- comprehension
- observability
- idp integrated rbac
- agility
Even though you can feel the impact of failing to maintain those traits in any architecture, within distributed compute there is an amplifier effect. Imagine the tech debt analogies, but with a 50% interest rate instead of %18.5 😏
Scope always includes (whether you want it to or not):
- Tests: with each commit and nightly (everything not tested should be considered broken)
- Security: Encryption at rest and in transit, Auth N and Z (Identity), secure secrets management including rotation, configuration hardening, audit logging
- Networking: VPC, subnets, eip, natgw, transitgw, firewalls, DNS, ssh access, VPN/direct connect
- Provisioning: managed services, instances, storage, load balancers, firewalls
- Software Installation and Configuration: deployments (zero-downtime, B/G, Canary), dependencies, environment config management
- Resiliency and Availability: Zones, regions, master/slave, scalability (both vertical and horizontal), disaster recovery
- Observability: Comprehensive aggregated logging, metrics covering - health, performance, events, tracing, including imeasures of the business purposes guiding investment, and with comprehensive monitoring/alerting.
- Documentation: Architecture, conventions, practices, incident runbooks, and knowledge transfer
- for each story: Definition of Done
- Preservation of Investment: Culture of refactoring, maintaining current version of software/tools/best practices, technical Debt is tracked(socialized)/estimated/prioritized (irresponsible not to)
- Backups: e.g., loose data, loose your job
- Cost Optimization: Justifying level of observability, instance sizing, reserved vs spot, utilization levels, cleanup of underused resources - arguably among the top priorities
to the un-initiated, refactoring looks like yak shaving, but it isn't.
"A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be made to work. You have to start over, beginning with a working simple system." - John Gall
“You don’t have to be an engineer to be be a racing driver, but you do have to have Mechanical Sympathy.” - Jackie Stewart
"infrastructure code without automated tests is broken"
every manager, "we don't have to have to that at the start." (But by the first milestone will simultaneously expect that it be done and not remember agreeing to defer...even if it is in writing.)
terraform
aws-cli
boto3
gcloud
backstage
github
circleci
buildkite
consul
secrethub
vault
packer
kubernetes
docker
kops
kubespray
istio
spiffe
datadog
honeycomb
prometheus
grafana
fluentd
cachet
auth0
styra
open policy agent
twistlock
snyk
sonabouy
kube-bench
docker-bench
hawkeye
kube-hunter
operator-sdk
awspec
inspec
bats
tflint
hadolint
ShellCheck
... too long, continue to local and pipeline tools for extensive language, GPL, DSL, and related tools
stern
homebrew
nevergreen
smashing
velero formerly ark
envconsul
krew kubectl plugin manager
(some useful plugins)
- access-matrix Show an access matrix for server resources
- config-cleanup Automatically clean up your kubeconfig
- exec-as Like kubectl exec, but offers a
user
flag to ... - get-all Like 'kubectl get all', but really everything
- iexec Interactive selection tool for
kubectl exec
- kubesec-scan Scan Kubernetes resources with kubesec.io.
- match-name Match names of pods and other API objects
- mtail Tail logs from multiple pods matching label sel...
- oidc-login Login for OpenID Connect authentication
- pod-logs Display a list of pods to get logs from
- pod-shell Display a list of pods to execute a shell in
- rbac-lookup Reverse lookup for RBAC
- rbac-view A tool to visualize your RBAC permissions.
- resource-capacity Provides an overview of resource requests, limi...
- restart Restarts a pod with the given name
- view-secret Decode secrets
- view-serviceaccount-kubeconfig Show a kubeconfig setting to access the apiserv...
- view-utilization Shows cluster cpu and memory utilization
- warp Sync and execute local files in Pod
- who-can like can-i but evaluates who at a permission level
evaluating the effectiveness/efficiency/quality, etc.
an evolving list...
brief comments about selected items above