After hunting for security bugs I’ve realized clients I’m working with are not familiar enough (or at all) with basic “hacking” techniques. API keys, passwords, SSH encrypted keys, and certificates are all great mechanisms of protection, as long they are kept secret. Once they’re out in the wild, it doesn’t matter how complex the password is or what hash algorithm was used to encrypt it somewhere else. In this post, I’m going to share concepts, methods, and tools used by researchers both for finding secrets and exploiting them. I’ll also list mitigation action items that are simple to implement.
It’s important to mention that the attack & defend “game” is not an even one; an attacker only needs one successful attempt to get in, where as the defender has to succeed 100% of the time. The hard part is knowing where to look. Once you can list your virtual “gates” through which hackers can find their way in, you can protect them with rather simple mechanisms. I believe their simplicity sometimes shadows their importance and makes a reason to be overlooked by many teams.
So here’s a quick and simple, yet not one to overlook TL;DR:
- Enforce MFA everywhere - Google, GitHub, Cloud providers, VPNs anywhere possible. If it’s not optional, reconsider the system in use
- Rotate keys and passwords constantly, employ and enforce rotation policies
- Scan your code regularly. Preferably as part of the release process
- Delegate login profiles and access management to one central system where you control and monitor
These are the 20% actions for 80% effect to prevent leaks and access-control holes.
So, what do hackers do and use to find passwords and application secrets?
API keys are all over the internet exposed to the world. This is a fact. Often times for no good reason. Developers forget them all around:
- For debug purposes
- For local devlopement
- For future maintainers as comments
Blocks such as this one are all over the internet:
// DEBUG ONLY // TODO: remove --> API_KEY=t0psecr3tkey00237948
How do they do that? After using a scanner like “meg” they scan their findings for a string that matches different templates. An example of another great tool by the same author that does exactly that is gf which is just a better
grep. In this instance, using truffleHog or the
trufflehog option in the
gf tool can find the high-entropy string that most API keys identify with. The same goes for searching
API_KEY as a string that yields results (too) many times.
Often times, keys have a good reason to appear where they are, but they’re not protected from being used externally. One example is a client I’ve been working with lately, who, like many other platforms use maps as a third-party service. In order to fetch maps and manipulate them, they would call an API with a key and use it to get the relevant map back. What they forgot to do is configure their map provider to limit the origins from where incoming requests with that specific key can originate. It’s not hard to think of a simple attack that will drain their license quota, effectively costing them a lot of money, or “better” yet (in terms of the attack) bringing their map-oriented service down.
JS files are not only used to find secrets by hackers. This is your application code open to any prying eyes. An intelligent hacker might read the code thoroughly to understand naming conventions, API paths, and find informational comments. These are later on extrapolated to a list of words and paths and loaded into automated scanners. This is what’s referred to as an intelligent automated scan; one where the attacker combines automated processes and gathered organization-specific information.
A real comment left on a target’s front page, revealing a set of unprotected API endpoints leaking data.
/* Debug -> domain.com/api/v3 not yet in production and therefore not using auth guards yet use only for debugging purposes until approved */
What should you do then?
- Minify / Uglify - Adds a layer of obfuscation and utilization. While usually reversible it can help flying under the radar of many automatic scanners, reducing the attack surface
- Keep only the bare minimum keys and permissions - while some are essential, most are not. Leave only keys that have got to be part of the code
- Reduce the key permissions to the bare minimum necessary - as with the maps service example, make sure the key can only do what it’s intended to, and where it’s intended to operate from. Make sure you leave no room for exploitation
- Use the same tools attackers would use to automatically scan code on CI builds. Especially with string pattern matching tools that are quick to run. Utilize simple
gfto scan for patterns. Much like tests, these can help ensure developers don’t leave holes that can be exploited or used to breach the system
- Practice code review to have another eye on the code - all the scanners in the world cannot scan and detect 100% of the use cases. Another human eye is a great practice, both for quality and security
They take a look back at the Wayback machine
The internet archive, also known as the “Wayback Machine” holds periodic scans of websites all over the internet for years and years back. This is a mining field for hackers with a target. With tools like waybackcurls (based on waybackcurls.py) one can scan any target of old files. This means that even if you’ve found and removed a key but did not rotate it, a hacker might still find it in an old version of your website and use it against you.
Found a key laying around where it’s not supposed to?
- Create a replacement key
- Release a version that uses the new key and removes the clear text mentioning
- Delete the old one or deactivate it
The way WaybackMachine is not only good for finding keys
Old code reveals all kind of interesting information for exploiters:
- Secret API paths - Unprotected API endpoints that you thought would never be found. While the ones that are found may be unexploitable they still help attackers map the API structure and conventions in the system. When your code is out in the wild there’s no control over it, this is key to remember and put in the back of any developer’s mind
- Web administration panels, much like API endpoints, are left around for different purposes and serve as one of the common attack vectors hackers find and exploit. These are mostly found in large organizations and installed by IT teams. A good idea is to periodically review all administration panels in use and their access management. A recent major automotive manufacturer breach happened through such a panel that was bypassed by removing the
httpsprefix of the address. Yes: 🤦.
They use GitHub
GitHub is a goldmine for hackers. With a simple search, knowing where to look can yield interesting results. If your account is not enforcing MFA, each and every user in the organization is a walking security hole. It’s not far-fetched to assume that one of the collaborators in the organization is not using a unique password and that his password was once leaked through another system. A hacker that targets the organization can easily automate such a scan or even go manually through it. The list of employees can be generated with OSINT like searching for employees on Linkedin, or in the GitHub public users list.
For example, here’s a good starting point if you’re trying to probe Tesla:
Even if the company doesn’t use GitHub as their git provider, often the leaks won’t be caught there anyway. It’s enough to have one employee that uses GitHub for his personal projects and has a small leak in one of them (or their git history) to turn it into a breach.
Git’s nature is to track the entire history of changes in every project. In the security context of things, this fact becomes significant. In other words, every line of code every written (or removed) by any user with current access to any organizational system is jeopardizing the company.
Why does it happen?
- Companies don’t scan themselves for leaks
- Those that do, usually don’t consider going through their employees’ personal (yet publically available) accounts
- Those that do scan employees (a guesstimation of less than 1%) many times fail over reliance on automation and skipping commit history (not scanning the entire git tree but just the surface which is the current snapshot of the code)
- Lastly, companies don’t rotate keys or use 2FA often enough. Those two can eliminate most of the holes above
“Dorks” are search lines that utilize the search engine different features, with targeted search strings to pinpoint results. Here’s a fun list of Google searches from the exploit DB.
Before giving the gist of it, if you want to go deep here, and I personally recommend that you do, here’s an invaluable lesson from a talented researcher.. He discusses how to scan, how to use dorks, what to look for and where when going through a manual process.
GitHub dorks are less complex than Google simply because it lacks the complexity of features Google offers. Still, searching for the right strings in the right places can do wonders. Just go ahead and search one string of the next list on GitHub, you’re in for a treat:
password dbpassword dbuser access_key secret_access_key bucket_password redis_password root_password
If you try targeting the search to interesting files like
filename:.npmrc _auth or
filename:.htpasswd you can filter the type of leak you’re looking for. Read further SecurityTrails’ great post.
- Scan for leaks as part of any CI process, GitRob is a great tool
- Scan employees accounts; Gitrob does that for you unless disabled with
- Go deep into the history, Gitrob’s default is 500 commits, you can go further with
- Enforce GitHub two-factor authentication!
- Rotate access keys, secrets, and password of each and every system. A good practice would be to use federated access through one system like GSuite or ActiveDirectory and make sure they employ policies of password rotation and complexity
They use Google
Now that we’re generally familiar with dorks, taking them to Google reveals an entirely new field of features. Being the powerful search engine it is, Google offers inclusion & exclusion of strings, file format, domains, URL paths, etc. Consider this search line:
"MySQL_ROOT_PASSWORD:" "docker-compose" ext:yml
This is targeting a specific file format (
yml) and a vulnerable file (
docker-compose) where developers tend to store their not-so-unique passwords. Go ahead and run this search line, you’d be surprised to see what comes up.
Other interesting lines may include RSA keys or AWS credentials, here’s another example:
"-----BEGIN RSA PRIVATE KEY-----" ext:key
The options are endless and the level of creativity and width of familiarity with different systems will determine the quality of findings. Here’s a large list of dorks if you want to play a little.
They get to know your system
When a researcher (or a motivated hacker) gets “involved” with a system, he goes deep. He gets to know it; API endpoints, naming conventions, interactions, different versions of systems if they’re exposed.
A not-very-good approach to securing systems is introducing complexity and randomness to their access paths instead of real security mechanisms. Security researchers trying to come up with vulnerable paths and endpoints use “fuzzing” tools. These tools use lists of words, combining them into system paths and probing them to see if valid answers are being returned. These scanners will never find a completely random set of characters, but they are superb at identifying patterns and extracting endpoints you either forgot about or did not know that exist.
Remember, security through obscurity is not a good practice (although don’t ignore it completely)
That’s where Github dorks which we’ve discussed earlier come in; knowing a system’s endpoints naming convention, e.g.
api.mydomain.com/v1/payments/... can be very helpful. Searching the company’s Github repos (and their employees) for the basic API string can many times find those random endpoints names.
However, random strings still have a place when building systems; they are always a better option than incremental resource IDs, like users, or orders.
Here’s an incredible string lists repo called “SecLists”. It’s being used by almost everyone in the industry. Often with a personal twist and touch in the context of the target, it’s a massive source. Another powerful tool to leverage string lists is FFuf, an ultra-fast fuzz tool written in Go.
Wrapping it up
Security is often taken lightly in startups. Developers and managers tend to prioritize speed and delivery times over quality and security. Pushing clear text secret strings to code repos, using the same keys over and over in systems, using access keys when other options are available can sometimes seem faster, but they can be detrimental down the road. I’ve tried showing how those strings you think are protected by being in a private repo, can easily find their way to a public gist. Or an employee’s unintentional Git clone that was made public. If you set the ground for secure work like using password sharing tools, central secret store, policies for passwords and multi-factor authentication, you’d be able to keep making fast progress, without sacrificing security completely.
“Move fast and break things”, is not the best mantra in the context of information protection
Knowing how hackers work, is usually a very good first step in understanding security and applying it to systems as a protector. Consider the approaches above and the fact that this is a very limited list of paths hackers take when penetrating systems. A good thing to do is to keep in mind the security aspects of anything being deployed to a system, regardless of its customer-facing/internal nature.
Managing security can sometimes be a pain in the ass, but rest assured, the mayhem you’re avoiding by just taking care of the very basic elements, will keep you safe and sane.
Thank you for reading this far! I hope that I’ve helped to open some minds to risks that are out there and we all miss or overlook.
Feel free to reach out with any feedback or questions. Any form or shape or discussion is most welcome!