Section - Content Discovery

Notes (by task):

  • Task 1 - What is content discovery?

    • It’s really very broad, but any content you can find relevant to whatever you are doing
    • Could be:
      • Old pages left online but removed from navigation, config files, backups, screenshots someone posted publicly on accident, etc.
    • Three main ways of discovering said content
      • Manually
      • Automated
      • OSINT (Open-Source Intelligence)
  • Task 2 - Manual Discovery - Robots.txt

    • Common file that tells search engine crawlers what pages to avoid. We aren’t robots, so we can see what the crawlers are told not to look at.
  • Task 3 - Manual Discovery - Favicon

    • This is the little icon that displays in the browser tab.
    • What I never thought of till this lab was how sometimes frameworks have a default icon that is leftover
      • This icon can help identify a framework being used… and I can go from there
    • Neat url to lookup favicons by md5 hash
      curl https://wiki.owasp.org/index.php/OWASP_favicon_database | grep `https://static-labs.tryhackme.cloud/sites/favicon/images/favicon.ico | md5sum`
  • Task 4 - Manual Discovery - Sitemap.xml

    • The opposite of Robots.txt! This is what search engines should index.
    • Can sometimes contain pages that are still active but not necessarily meant to be.
    • Makes it easy to see a list of (some) urls/content
  • Task 5 - Manual Discovery - HTTP Headers

    • Often the headers will include the webserver software, or other identifying information if not hidden.
    • Good for finding versions running!
  • Task 6 - Framework Stack

    • If a framework is identified, perhaps it has a default admin page? Go from there!
  • Task 7 - OSINT - Google Hacking / Dorking

    • Google Dorking is just a fun name for using google’s more advanced search features to find content without scanning/tooling.
      • You can search for specific sites, words in urls, and a lot more
    • Placeholder - I’ll be making a personal cheatsheet for this!
  • Task 8 - OSINT - Wappalyzer

    • I’ve actually used this tool for a while, check it out here
  • Task 9 - OSINT - Wayback Machine

    • Never thought about using the wayback machine for OSINT but upon reading about it being used like this, it makes perfect sense
  • Task 10 - Github

    • People leave secrets in public repos all the time, and versioning keeps a record and snitches on them if they aren’t careful to remove something, and or change it later.
      • Hilariously, when I was first starting to code (7 years ago!) I left a discord bot token in a public repo. Never did that again ;)
  • Task 11 - Amazon S3 Buckets

    • These are storage servers/is a storage service by Amazon AWS.
    • Files are given permissions, and if they are set incorrectly, you could find a lot of information you aren’t meant to see.
      • the format for the bucket urls is http(s)://{name}.s3.amazonaws.com where {name} is whatever the org chose.
  • Task 12 - Automated Discovery

    • This is the process of using tools to automatically search for content, rather than doing it yourself. Never would have guessed that… ;)
      • Wordlists
      • Tools
        • There are a ton! Do some googling here, it’s what I did :)
          • ffuf
          • dirb
          • gobuster