Nas Taibi

Building resilient, secure cloud architectures for regulated industries.


The Cloud Security Gap: Too Many Tools, Not Enough Engineers

The Cloud Security Gap: Too Many Tools, Not Enough Engineers

If you look at most cloud security dashboards nowadays, you see the same thing: fancy responsive layouts, lots of widgets, lots of alerts, plenty of vendor integration, and they all claim to magically use AI.

Yet when you read the breach reports, the root causes are almost boring.

Exciting stuff like open buckets, over permissive IAM, exposed management interfaces, missing logs, hard coded secrets and plenty of internal ports exposed to the outside world. Now add a new category on top of all that: LLM features wired into production with no real security model, vibe coding being the primary culprit.

so clearly, the problem is not that teams do not build or buy enough tools. The problem is that the average engineering team is overly reliant on these tool and don’t necessarily posses a very good grasp on what makes a secure cloud architecture in practice.

How SaaS teams actually get breached

Without mentioning any recent major incidents/outages and if you strip away the brand names involved, reading the technical details is a sordid affair, they almost always fall into a small set of buckets.

1. Misconfigurations that never got fixed

The classic example is still the public storage bucket with sensitive data in it. Variations of this show up constantly.
These misconfiguration are pervasive if one cares to look with a security lens on. Below I share some of the examples I saw in at least four of the top Fortune 500 companies:

Small SaaS team with a busy roadmap and one overworked senior who “owns” the cloud account.

They ship a new feature. To get it working, someone opens up a security group to 0.0.0.0/0 “for a moment”. The moment lasts six months.

A bucket that holds support exports is made public “so the tool can read it”. Nobody writes down which tool. Nobody remembers to close it.

An internal admin panel for operations is exposed on the internet behind nothing but a weak login, because putting it behind VPN or SSO felt like a chore at the time.

None of this feels like a dramatic decision while it is happening. It feels like “we just need to get this out and we will tidy it later”.

Later never comes.

Misconfigurations look trivial in a report. In reality they are a symptom of something deeper. The team has no shared baseline for what “safe enough” looks like, no habit of reviewing changes, and no one whose job it is to say “no, we are not doing it that way”.

The result is a production account that works, mostly, and is quietly full of landmines. Not because the engineers are bad, but because nobody taught them a better way and nobody forced the conversation.

Other things I have found:

  • Storage readable from the public internet when it should be private
  • Some clever cookie decided to host their own git server and opened their production code to the entire world by not securing the ports properly
  • Security groups that allow the world to hit internal services
  • Management interfaces exposed to the internet
  • Test environments with production data and no hardening
  • Phishing emails that land on employees’ mailbox who have elevated permissions, a trojan horse from within

These are not glamorous bugs. They are not exotic exploits, no 0Days to be seen here. They are the sort of problem any reasonably trained engineer could catch if someone had actually trained them and if the team had a minimum baseline of secure defaults.

2. Identity Access Management that grew like a pile of cables

Identity and access management is where many cloud environments quietly rot.

some of the classic ones are the “god mode” roles used for everything or IAM policies that have the easy-to-remember "Action": "*" , "Resource": "*" or the long lived keys that never got rotated.

Designing clean IAM is not something you get from memorising exam facts. It is a design skill. It needs patterns, review, and repetition. merely going through a best-practice list from your favourite cloud provider is not enough.

Without that, teams end up with a tangle of permissions that nobody fully understands. At that point a single leaked credential becomes catastrophic.

3. SaaS and LLM features bolted on with no guardrails

Modern SaaS companies are built out of other SaaS products and internal services glued together.

Now layer in AI and the new fad, vibe coding … it’s not as cool as it sounds unfortunately.

Agents that can browse documentation, tickets, knowledge bases, even parts of production data, most of which will inevitably hold sensitive information, which then ends up in an embedding store that is maintained by yet another 3rd party provider.

there is an inheritant risk in letting a 3rd party blackbox LLM go through your internal systems reading internal files and access external tools with your api keys.

If these features are built on top of an already weak cloud foundation, you get:

  • Prompt injection paths where an attacker can get the model to reveal secrets or perform actions
  • LLM components that can be tricked into calling internal tools in unsafe ways
  • Data stores that hold sensitive content but are treated like harmless caches

In many teams, LLM features are shipped as experiments. That is fine from a product perspective. It is dangerous from a security perspective if the underlying cloud patterns were never strong in the first place.

4. Third party providers

Supply chain complexity has become one of the biggest cyber worries for large organisations. Many engineering teams rely on many third and fourth party providers, they have limited control and visibility over the security of the software and services they use.

Hidden vulnerabilities, weak security practices and attacks such as malware being pushed through trusted update channels is becoming a serious worry for CISOs, the 2024 incident from one of the major security companies listed on the NASDAQ is case in point.

5. Resilience and safety as an afterthought

Every major cloud outage is a real time exam in architecture.

When a large region or availability zone has trouble, you immediately see which teams actually run in more than one zone or region and which have thought about what their application does in partial failure.

You also see who assumed “the cloud” was a magic resilience box. This is not only about uptime. Weak resilience usually correlates with weak security. Both are symptoms of shallow understanding of the platform and its guarantees.

Why the current education market is not fixing this

The problem is not that people are refusing to learn. It is that almost everything around cloud and AI training is tuned for the wrong outcome.

1. Certification driven learning

Cloud certifications are useful as a rough map of a provider’s services, but they sit at the center of a huge industry of shallow material. Most exam prep pushes you to memorise service names, remember which product fits which textbook scenario, and recall default limits. That is enough to get through multiple choice questions.

It is nowhere near enough to design a secure VPC for a specific SaaS product, lock down IAM for a multi tenant system, or build logging that actually supports an incident investigation. You can easily end up with a team that looks impressive on paper and still runs a dangerously fragile environment.

2. Vendor centric training

Security and observability vendors often produce slick workshops, but the exercises revolve around their dashboards and features. You spend your time turning on options and enabling detections instead of thinking about the architecture underneath.

When you start from the tool, the conversation narrows to “what else can we tick on” instead of “what are we actually protecting and where are we exposed”. Teams still need to understand what is truly at risk in their system, which threats matter for their business, where sensitive data moves, and how their identity model really works. A two day product training does not create that understanding.

3. Fragmented free content

At the other end of the spectrum you have free talks, conference videos, blog posts and YouTube series. Some of them are excellent. Taken as a whole, they are a firehose with no structure. An engineer can watch a talk on S3, then a video on Kubernetes, then something about “AI security”, and come away with ten disconnected ideas and no coherent mental model.

What most people need is not another clever talk. They need a clear path through the material and repeated practice on realistic problems, with someone telling them whether they actually did it right.

What teams actually need instead

If you want to build education that is not slop, it has to line up with the real ways teams fail.

1. Opinionated secure defaults for common SaaS patterns

Most SaaS products fall into a few predictable shapes: a single tenant web app with some internal services, a multi tenant API with workers and a warehouse, an event driven system built on queues and functions.

For each of these you can define a baseline: how the network is laid out, where the trust boundaries sit, which IAM roles exist and what they can do, how keys are handled, what must be logged, how backups and recovery work.

When a team starts from that kind of baseline and understands the reasons behind it, the temptation to improvise unsafe shortcuts drops. Good education should teach those baselines and the trade offs behind them, not just show where to click in the console.

2. Hands on work in environments that can actually break

Real skill comes from touching systems that behave like production, even if they live in a lab account. People need to harden deliberately insecure environments, edit real infrastructure as code instead of screenshots, fix pipelines that fail because a policy is too open, and work through simulated incidents where the only way forward is to dig through logs and make decisions. This is harder to create than slide decks and quizzes, but it is the only way to build instincts that survive contact with a real outage or breach.

3. LLM and AI security as part of cloud security

“AI security” is often treated like a separate circus. In practice, if your LLM feature runs in your cloud account and talks to your services, it is just another piece of your system and should follow the same rules. It needs a clear identity, minimal permissions, tightened access to data, real logging of inputs, outputs and tool calls, and guardrails at the boundaries instead of magical prompts.

Training has to show engineers how a specific LLM feature changes their threat model and what to change in the design. Lists of attack names are not enough. They need concrete designs and countermeasures on the platform they already use.

4. Real feedback and accountability

Reading someone else’s solution is not the same as having your own work torn apart. Engineers get better at secure design when a reviewer points at a diagram or a Terraform file and says: this role is over privileged and here is the path an attacker would take, or you forgot this log source and you will regret it during an incident.

The same for LLM components that can still reach internal tools they should never see. Any education that claims to build real competence has to include this kind of review and iteration. Without it, you are just adding more words to a pile that nobody remembers.

To sum it all up…

The problem is not a lack of tools, content or features. It is the absence of deliberate practice, sharp constraints and honest review. Any training that refuses to confront that reality is just set dressing. Like a good soldier, training is a daily habit, it needs to be reinforced often and many. The whole point of a serious cloud security program is simple: fewer surprises, fewer excuses, and engineers who can look at a messy production account and know how to make it safer by tomorrow.