SQL Injection Is Still Everywhere in 2024

@safarslife·July 7, 2024·— views

Every few months there's another breach report where the root cause is SQL injection. Not a sophisticated zero-day. Not a supply chain attack. Someone put a single quote into a search field and got back a database dump. The vulnerability is older than most of the engineers who introduced it, the fix has been standard practice for two decades, and it keeps appearing in breach reports anyway.

I've been thinking about why this keeps happening, and I think the answer is more interesting than "developers don't know about it." Most developers do know about it. The problem is structural.

Why ORMs don't fully solve this

Modern web frameworks and ORMs handle SQL injection protection automatically when you use them correctly. If you're using Django's ORM, or ActiveRecord, or Hibernate, and you're querying through the ORM's query builder, your inputs are parameterized by default. The ORM constructs the query with placeholders and passes your values separately to the database driver, which handles the escaping. The user's input is never interpreted as SQL.

This is genuinely good protection for the common case. The problem is the edges.

The first edge is raw queries. Every ORM provides an escape hatch for complex queries that the query builder can't express - things like complex joins, window functions, or database-specific features. When you drop into raw SQL, you're responsible for parameterization yourself. The ORM won't protect you. If you write something like "SELECT * FROM orders WHERE status = '" + status + "'" instead of using a parameterized placeholder, you've introduced the vulnerability. The ORM is right there, but you bypassed it.

The second edge is dynamic table or column names. Parameterized queries protect values - the data you're filtering or inserting. They don't protect structural elements like table names or column names, because the database driver can't parameterize those. If you're building a query where the table name comes from user input - say, a reporting feature where users can select which table to query - you have to validate and whitelist that input yourself. Parameterization won't save you.

The third edge is stored procedures. If your database has stored procedures that concatenate user input into dynamic SQL internally, the vulnerability lives in the database layer, not the application layer. Your ORM can parameterize the call to the stored procedure perfectly, and the stored procedure can still be vulnerable. I've reviewed PRs where the application code looked clean and the vulnerability was buried in a stored procedure that nobody had looked at in years.

The internal tool problem

The pattern I see most often in breach reports isn't ignorance - it's inconsistency. The main application is built carefully, with code review, with security-conscious engineers who know the patterns. Then someone builds an internal admin tool in a hurry, or a quick reporting script, or a legacy integration that predates the current standards. That code doesn't get the same scrutiny. It works, nobody touches it, and the vulnerability sits there quietly.

The "it's internal" reasoning is part of this. Internal tools feel lower risk because they're not publicly accessible. But internal doesn't mean inaccessible - it means accessible to anyone who gets past the first layer of authentication. And internal tools typically have access to the most sensitive data, because that's what internal tools are for. The admin panel that the ops team uses to manage orders at Uzum has access to customer data, payment information, and order history. The fact that it requires a VPN and an internal account doesn't change what an attacker can do if they get in.

⚠️

"It's internal" is not a security argument. Internal tools typically have access to the most sensitive data in your system. An attacker who gets past authentication has the same access your ops team does.

I've pushed back on the "it's internal, it doesn't need the same security review" argument multiple times. The data doesn't know it's being accessed through an internal tool. The customer whose order data is in that database trusted us with it regardless of which interface we use to access it.

What this means for how I review work

I have a CS background and I've written production code, so I can read a PR diff. When I'm reviewing something that touches database queries, I'm looking for a few specific things. Are raw queries parameterized? Are there any places where user input is being concatenated into a query string rather than passed as a parameter? Are there stored procedures involved, and if so, have they been reviewed?

I'm also looking at the scope of what the query can access. A query that returns only the data the current user should see is different from a query that returns all records and filters in the application layer. The second pattern is more vulnerable to injection because a successful attack can bypass the application-layer filter and return everything.

The frustrating thing about SQL injection is that it's not a hard problem to solve. Parameterized queries are not complicated. Every major database library supports them. The pattern has been standard practice for over twenty years. The gap isn't knowledge - it's the consistency of applying that knowledge everywhere, including the internal tools and the legacy integrations and the quick scripts that someone wrote on a Friday afternoon and that have been running in production ever since.

It's 2024. The knowledge exists, the tools exist, the patterns are well established. The question is whether teams care enough to apply them consistently - not just to the parts of the product that users see, but to everything that touches the data users trust you with.