This article is about Google hacking overview.
Google Hacking is a term that encapsulates a wide range of techniques for querying Google to uncover vulnerable web applications and sometimes to determine vulnerabilities in specific web applications. In addition to exposing flaws in web applications, Google Hacking allows you to find sensitive data useful for the investigation phase of an attack, such as web-related emails, database dumps or other files with usernames and passwords, unprotected directories with sensitive files, login URLs portals, various types of system protocols such as firewall and access protocols, unprotected sites that contain sensitive information such as printers or cameras connected to the site with information about their use, status, location, and the like.
Advanced operators for querying Google OR Google hacking overview:
Advanced operators allow you to get more specific search results from your queries. They mostly allow you to see a list of the most relevant and useful results. For example, you can use advanced operators to retrieve only files of a certain type or filter so that your search results are limited to a specific site. If you simply use a Google search term, you will be shown all the results that match those terms. However, advanced operators allow you to retrieve a subset of the original results that match certain characteristics. This can be easily illustrated by querying Google for a domain and comparing it to querying the web operator for that domain. The first query would return results to all kinds of external websites that mention that domain, while the second would narrow down the results to those coming from the selected domain.
Advanced operators are usually of the form operator:search expression and are written directly in the query string. There should be no space between the operator and the search term, and the search term itself cannot contain spaces, otherwise the query will fail. To use spaces, we would have to enclose the phrase in quotation marks. The quotes are used to tell Google to look for an exact match. To test this, you can try searching Google for a phrase like there are plenty of fish in the sea and try the search again with the same phrase but enclosed in quotes – “there are plenty of fish in the sea”.
For example, when querying Google for site:infosecinstitute.com filetype:pdf, we use two advanced operators – the site operator, which limits the results to only those coming from that site, and the filetype operator, which returns results limited to a specific file type (in this case pdf).
Below is a table containing some commonly used Google operators and symbols for Google hacking:
|intitle:||Searches in the title of the pages (the <title> HTML element that is located in the <head> element of the page’s markup)||intitle:admin|
|inurl:||Searches with the URL of the crawled web pages.||inurl:wp-content/uploads filetype:sql|
inurl:.ssh intitle:index.of authorized_keys
|intext:||Searches within the text of the web pages (the text possibly seen by regular users browsing the web pages)||intext:”powered by webcamXP 5″|
intext:”Powered by net2ftp” inurl:ftp
inurl:”server-status” intext:”Apache Server Status”
|allintext:/allinurl:/allintitle:||All three operators work similarly to the ones mentioned above except they do not work with other operators and look for all words after them in the text/url/title of the web page.||allintext: “Please login to continue…” “ZTE Corporation. All rights reserved.”|
allintitle:Welcome to Windows XP Server Internet Services
|filetype:||Limits the results to web resources matching the desired file type (not always correct)||filetype:xls intext:email intext:password|
|site:||Limits the results to web resources within a given website||filetype:xls site:apple.com|
|Info:||Shows additional links/actions which can be followed such as showing Google’s cache of the website, visiting similar pages, viewing pages which link to the given page and so on.||info:apple.com|
|–||Excludes the term/operator from the results||inurl:citrix inurl:login.asp -site:citrix.com|
|“search-term”||Adding the phrase in quotation marks returns only results that are an exact match to what is sought for||inurl:”server-status” intext:”Apache Server Status”|
|*||A wildcard for any unknown/arbitrary words. It is not used for completing a word like foot* but pinpoints that anys word could be at that search position.||a * saved is a * earned|
|+||The phrase that follows the + modifier must exist within the results. It can be used to include an overly common word which Google typically neglects in queries.||“Machine gun” +uzi|
|.||A single-character wildcard, any single character can be in that place||inurl:.ssh intitle:index.of authorized_keys|
There are many cheats that show details of most of the advanced operators available for use in queries, such as the one posted by Google Guide.
Google also provides a web page with an interface to perform some advanced queries at https://www.google.com/advanced_search
Google Hacker Database
The Hacking Google database contains user-submitted queries divided into various categories – such as vulnerable files, files containing passwords, information about the server and the software on it, searching for online devices, and so on. Dork is just an already found Google query that is known to return useful results such as exploits or sensitive data. When browsing the dorks available in the Google hack database, you should look at the date they were submitted, as some dorks are old and may not prove useful. Old posts about exploits, vulnerabilities and other flaws of specific software versions can easily become irrelevant after some time. However, there are some crazy people working on ways to harvest information that still work regardless of the date of submission – like ways to find database dumps, find download pages, get unprotected directory dumps (to some extent), and so on. .
Basic penetration testing through Google hacking
As mentioned above, Google can be used for (passive) information gathering. It is a great tool for footprinting and allows for mobility and anonymity during the footprinting process. The information that Google Hacking results can display is generally publicly available and can be found manually if one has the time and resources to search for it. With Google Hacking, you do not actively engage the system, but you can easily collect information that is usually required in the investigation phase of an attack, such as error messages, passwords, usernames, sensitive directories, devices and hardware online, detect web servers and vulnerabilities. in them pages with access forms and sensitive information about electronic banking and electronic trading. You can directly find usernames and passwords that could be easily exploited to gain access, you can find possible devices and software that can be targeted, etc., which makes Google an invaluable tool. Google Hacking is actually a concept that you need to familiarize yourself with if you are planning to take an exam like the Certified Ethical Hacker (CEH) exam.
There are many ways to search for usernames and passwords through Google queries. For example, you can search for .sql files that contain statements from the databases of various websites. These databases usually contain most of the data related to a website – such as its users, passwords, user data, and so on. One query is: filetype:sql inurl:backup inurl:wp-content. This will search for database dumps on sites whose URL contains the words backup and wp-content. Wp-content is the folder where the user and some plugins upload their files to the popular WordPress CMS that many websites are built on, and the backup can potentially filter the results for people who have decided to put a copy of their database online in case something happens . .
Figure 2: Querying Google for database listings
The query returned many results, most of which were actual database dumps of WordPress installations. These database dumps contained information about WordPress administrative users, such as their username, email, hashed password, and other potentially useful information. WordPress administrative users themselves are usually found in the wp_users table (which may have a different prefix than wp – the prefix is set when WordPress is initially installed).
Figure 3: Finding an administrative user and their associated data in one of the database statements
Figure 4: Administrative user, his/her email, names and hashed password in another database dump
There are many files used by different kinds of software that contain lists of usernames and passwords. For example, .htpasswd can be used on websites to perform basic authentication. With basic authentication, browsers display login fields that can be checked for a match in the .htpasswd file on the server/website.
Figure 5: Basic web authentication. Your browsers display login fields that can be checked for a match in the .htpasswd file
There are many ways to find this particular file. The Google hacker database suggests simply typing htpasswd, but you can search for htpasswd.bak, filetype:htpasswd, and so on. As seen here, searching for one type of information can often reveal additional data that can be used in the pen testing process
Figure 6: Any username and password file found online
Figure 7: Any username and password file found online
Identification of system version information
As we saw in the table of operators, we can get directory listings by including “index.of” in our search. Queries such as intitle:index.of server.at can specify directory listings with some server information that is displayed by default on web servers such as Apache.
You can add the site: operator to this query to search for directory listings that leak server information on specific sites. For example, searching for intitle:index.of server.at site:somewebsite.edu revealed the specific server software (Apache), its version, and the operating system of the computer it’s on, as seen in the image below.
Figure 8: Polling for server information
Searching websites using vulnerable software
Another use for hacking Google is to identify systems that are running a known vulnerable version of the software.
Many web applications add a “Powered By” field somewhere on the page and sometimes list the software version. This means that if you find a vulnerability in, say, vBulletin, you can look for other sites that are also vulnerable to that vulnerability.
Figure 9: Example of a “Powered By” field that indicates the software version.
The image above shows vBulletin installed on a site that is marked with an information footer. If there was a vulnerability in this version of vBulletin, other vulnerable sites would be readily available.
Questions to start tests
Site:targetsite.com Intitle:index.of – When you start researching a website, it’s a good idea to first look at any directory listings. These can sometimes reveal information about the server and will certainly show files that can reveal more information. This operator will only show results from Apache-based servers and not from other servers, such as pages served by Node.js, although Apache is the web server that dominates the market.
site:targetsite.com intext:error|warning: – languages like PHP allow errors and warnings to be displayed directly on the page where they occur, which is useful for development purposes. However, there are many websites that are in production mode without hiding possible bugs. The actual error or warning is usually preceded by error: or warning: so you can look for it on a specific web page. Depending on the website and its subject, false positives may occur.
Figure 14: Finding errors and warnings on a specific web page
Figure 15: MySQL database user exposed from PHP warning found via Google
As you can see above, a simple search for errors and warnings on the website revealed a database error that indicated that the database user is artshis2, that the machine is running a MySQL database, and that the website is using an older PHP MySQL extension that may be vulnerable to SQL injections.
inurl:temp | inurl:tmp | inurl:backup | inurl:bak – Searching for temporary or backup files can be quite fruitful. This search will pick up files, directories and file extensions on the server containing one of the most common backup/temp names. You can add more parameters to your query to get more specific results. For example, adding inurl:wp-content to the query would show the backup files and directories that are in the public assets folder of the WordPress installation. You can also combine this with other searches like the filetype:sql we mentioned earlier.
Figure 16: Searching for backup or temporary files and folders within WordPress installations
The figure above shows that searching for temporary files and backups in WordPress installations can reveal quite a lot, in this case public backup copies of databases and entire WordPress installations.
Using web software with Google queries
When you get information about a given target and the software running on it, you can use additional Google queries to find leaks resulting from the software. For example, if you know that the website is built with PHP, you can use the error detection and warnings mentioned above. If you know that PHP creates .log files that can become public in certain cases, you can try other queries targeting the location of these logs, such as filetype:log “PHP Parse error”| “PHP warning.”
Google’s hacking database contains quite a bit of software that can be exploited in a variety of ways. For example, he used the BackupBuddy WordPress plugin to upload copies of the entire site to the public uploads directory, so any attacker could gain access to and possibly take control of the site’s data archive. A dork for finding potential backups can be found at https://www.exploit-db.com/ghdb/4306/.
Google hacking tools
In the past, there were many programs that could help you automate Google Hacking. Unfortunately, most of them are outdated and no longer work, like Metagoofil Metagoofil allows you to choose a domain, load a certain number of files extracted from Google into it, and immediately display juicy data from them – such as emails, machine usernames, servers, etc. on.
The current Metagoofil feed may no longer work. However, we are attaching a link to a modified version that should work properly. Be aware that some servers may deny Metagoofil access to files extracted by Metagoofil, resulting in an error for that particular file.
To install Metagoofil, all you need to do is download or clone the repository https://github.com/DimoffX/Metagoofil2016, open a command line/Terminal, cd to the root directory of the repository and type python metagoofil.py get help . To run it, you would need to have Python 2 installed on your computer, which you can get from https://www.python.org/downloads/.
A sample command you can try is:
python metagoofil.py -d example.com -t pdf -l 5 -n 5 -o bgsites -f “example.html”
This will scan the given web page for PDF files, download the first five files found, save them in the bgsites folder, and create an HTML message called “novinite.html”
Another useful tool currently available is the website https://www.shodan.io/, which is itself another search engine that allows us to find specific types of machines (webcams, routers, servers), which are connected to the Internet. along with metadata about them, such as the software behind them.
The Google cache is a great way to see websites that have changed or no longer exist. You can also use it to visit websites anonymously without establishing a connection to the website’s server because you’re just making an HTTP request to Google.
To do this, just visit the cache of the desired web page with &strip=1 added to the Google web cache URL and view the text-only version of the web page. If the strip parameter is not added, you will still request external resources from the site itself, such as images from the cached site itself.
Figure 10: An anonymous visit to a Wikipedia page
Anonymous Google is especially useful in combination with a proxy.
A simple way to open the text of the desired website from Google search results without accessing the normal cache is to click on the arrow next to the website URL, right-click on the cache from the navigation that appears, copy the link address, paste it into the address bar browser and add &strip=1 or %26strip%3D1 (an encoded form of the &strip=1 URL) to the webcache.googleusercontent.com URL.
Below are some explanatory images.
It needs a conclusion
Figure 11: Copying a cached web page URL
Figure 12: Paste the URL into the address bar and add &strip=1 to the cached web page
Figure 13: You must end up in the text version of this web page
Be aware that if you actually follow any cached link – you will end up on a real website without any cache and anonymity.
Another way to ensure anonymity is to view a web page through Google Translate, where the web page request would be made by Google’s servers instead of your browser.
Google Hacking is not only a great way to discover and browse websites without exposing them to targeted systems, but also a real way to uncover information in the typical information gathering attack phase. It is a must-know for most information security exams and can bear great fruit if implemented correctly. Many queries are shared publicly in the GHDB for discovery and exploration, while site-specific, personalized tests can be performed using advanced operators.