Recon Before You Hack: OSINT Basics

@safarslife·November 12, 2024·— views

I've been reading security research for a couple of years now, mostly because breach post-mortems keep showing up in my feed and I find them genuinely interesting. The pattern that keeps appearing is how much information organizations give away before an attacker does anything technically sophisticated. The reconnaissance phase - open-source intelligence gathering, OSINT - is often where the real work happens. By the time someone starts active exploitation, a thorough attacker already has a detailed picture of the target's infrastructure, technology stack, and people.

What strikes me about this from a product perspective is how much of it is just reading. No hacking required. The information is public. The skill is knowing where to look and what the pieces imply about each other.

What certificate transparency logs actually reveal

TLS certificates are public record. Every certificate issued for a domain gets logged in certificate transparency logs - this is a security feature, not a bug, designed to make it harder to issue fraudulent certificates without detection. The side effect is that every subdomain a company has ever gotten a certificate for is discoverable through public logs. Tools like crt.sh let you query these logs for any domain.

This matters because subdomains reveal infrastructure. The main domain is obvious. But api.company.com tells you there's an API. staging.company.com tells you there's a staging environment. admin.company.com tells you there's an admin panel. jenkins.company.com tells you they're using Jenkins for CI/CD. Each of these is a potential attack surface, and none of them required any active probing to discover - they're in the public certificate logs.

I looked up Uzum's certificate history once, just to understand what this looks like in practice. The number of subdomains that appear in the logs is significant. Most of them are legitimate and expected. But the exercise of looking at them through an attacker's lens - which of these might have weaker authentication? which might be running older software? which might have been set up for a project that ended and is now unmaintained? - is a useful mental model for thinking about attack surface.

Job postings as a technology inventory

Job postings are an underrated intelligence source. A company posting for "Senior Backend Engineer with experience in Kafka, PostgreSQL, and Kubernetes on AWS" has just told you their message queue, their database, their container orchestration platform, and their cloud provider. That's a significant amount of infrastructure context.

This matters for attackers because it narrows the search space. If you know a company is running PostgreSQL, you're not wasting time on MySQL-specific attack paths. If you know they're on AWS, you're thinking about S3 bucket misconfigurations and IAM policy issues, not Azure-specific problems. The job posting is essentially a technology inventory that the company published voluntarily.

From a product perspective, this changes how I think about what we put in job postings. There's a real tension here - you need to be specific enough to attract qualified candidates, but every specific technology you name is information you're giving to anyone who's interested. I'm not suggesting companies should write vague job postings. I'm suggesting that the people writing them should be aware of what they're publishing.

ℹ️

A job posting for "Senior Backend Engineer with Kafka, PostgreSQL, and Kubernetes on AWS" is also a technology inventory for anyone who wants to research your attack surface. The information is public by design. The question is whether you're aware of what you're publishing.

GitHub and the accidental exposure problem

GitHub is where the most damaging accidental exposures happen. Developers push things they shouldn't - API keys, internal hostnames, database connection strings in configuration files, private IP ranges in infrastructure-as-code. Not always, but often enough that there are automated tools that scan public repositories for exactly this kind of exposure, and attackers use them routinely.

The more subtle version is what the code itself reveals. A public repository for an open-source component of a product might contain comments that reference internal service names, error messages that reveal the underlying technology stack, or test fixtures that show the shape of internal data structures. None of this is a credential, but all of it is useful context for someone building a picture of the system.

I've started thinking about this when reviewing what we open-source or make public. What does this code reveal about our internal architecture? What do the error messages in this library tell someone about how our backend is structured? These aren't reasons not to open-source things - they're questions worth asking before you do.

How this changed how I write specs

The practical impact of reading about OSINT is that I think about information exposure differently when I'm writing product specs. What does the error message reveal? If a login form returns "user not found" versus "incorrect password," you've just told an attacker which usernames are valid. That's a small thing, but it's the kind of thing that comes from thinking about what information you're exposing, not just what functionality you're building.

What do our API responses reveal about our internal data model? If an API returns a numeric ID that increments sequentially, you've told someone how many records you have and made it trivial to enumerate them. If error responses include stack traces in production, you've given away your technology stack and potentially your internal file structure.

None of this is exotic security knowledge. It's the kind of thing that becomes obvious once you've spent time thinking about what an attacker learns from passive observation. The lesson I keep taking from security reading is that the most effective reconnaissance requires no technical sophistication at all - just patience, systematic thinking, and knowing where to look. That's a useful frame for thinking about what your product is telling the world before anyone tries to break in.