AJ Nicoloff

Section - Content Discovery

Notes (by task):

Task 1 - What is content discovery?
- It’s really very broad, but any content you can find relevant to whatever you are doing
- Could be:
  - Old pages left online but removed from navigation, config files, backups, screenshots someone posted publicly on accident, etc.
- Three main ways of discovering said content
  - Manually
  - Automated
  - OSINT (Open-Source Intelligence)
Task 2 - Manual Discovery - Robots.txt
- Common file that tells search engine crawlers what pages to avoid. We aren’t robots, so we can see what the crawlers are told not to look at.
Task 3 - Manual Discovery - Favicon
- This is the little icon that displays in the browser tab.
- What I never thought of till this lab was how sometimes frameworks have a default icon that is leftover
  - This icon can help identify a framework being used… and I can go from there
- Neat url to lookup favicons by md5 hash
```
curl https://wiki.owasp.org/index.php/OWASP_favicon_database | grep `https://static-labs.tryhackme.cloud/sites/favicon/images/favicon.ico | md5sum`
```
Task 4 - Manual Discovery - Sitemap.xml
- The opposite of Robots.txt! This is what search engines should index.
- Can sometimes contain pages that are still active but not necessarily meant to be.
- Makes it easy to see a list of (some) urls/content
Task 5 - Manual Discovery - HTTP Headers
- Often the headers will include the webserver software, or other identifying information if not hidden.
- Good for finding versions running!
Task 6 - Framework Stack
- If a framework is identified, perhaps it has a default admin page? Go from there!
Task 7 - OSINT - Google Hacking / Dorking
- Google Dorking is just a fun name for using google’s more advanced search features to find content without scanning/tooling.
  - You can search for specific sites, words in urls, and a lot more
- Placeholder - I’ll be making a personal cheatsheet for this!
Task 8 - OSINT - Wappalyzer
- I’ve actually used this tool for a while, check it out here
Task 9 - OSINT - Wayback Machine
- Never thought about using the wayback machine for OSINT but upon reading about it being used like this, it makes perfect sense
Task 10 - Github
- People leave secrets in public repos all the time, and versioning keeps a record and snitches on them if they aren’t careful to remove something, and or change it later.
  - Hilariously, when I was first starting to code (7 years ago!) I left a discord bot token in a public repo. Never did that again ;)
Task 11 - Amazon S3 Buckets
- These are storage servers/is a storage service by Amazon AWS.
- Files are given permissions, and if they are set incorrectly, you could find a lot of information you aren’t meant to see.
  - the format for the bucket urls is http(s)://{name}.s3.amazonaws.com where {name} is whatever the org chose.
Task 12 - Automated Discovery
- This is the process of using tools to automatically search for content, rather than doing it yourself. Never would have guessed that… ;)
  - Wordlists
    - Lists of common words, sometimes passwords, etc
      - Popular lists (SecLists)
  - Tools
    - There are a ton! Do some googling here, it’s what I did :)
      - ffuf
      - dirb
      - gobuster