A Python crawler for extensions from the Chrome Web Store.

Go to file

Michael Herzberg c7a808db3f Don't process replies if there are none.		2017-08-30 15:15:12 +01:00
ExtensionCrawler	Don't process replies if there are none.	2017-08-30 15:15:12 +01:00
queries	Added sql query.	2017-08-16 21:32:09 +01:00
resources	Renamed hash to md5 in JSON file and added support for sha1 hashes.	2017-08-30 00:38:30 +01:00
scripts	Cleaned up scripts.	2017-08-23 20:38:27 +01:00
sge	Exclude archive dir when pushing to sharc.	2017-08-30 10:11:25 +01:00
.gitignore	Ignore vscode workspace configuration.	2017-08-30 00:52:47 +01:00
LICENSE	initial commit	2016-09-08 20:43:35 +02:00
README.md	Added new tool: crx-jsdecompose.	2017-08-30 00:04:32 +01:00
crawler	Improved logging.	2017-08-30 15:12:54 +01:00
create-db	Improved logging.	2017-08-30 15:12:54 +01:00
crx-extract	Renamed extract-crx to crx-extract.	2017-08-29 22:06:18 +01:00
crx-jsdecompose	Bug fix: printing of detection method.	2017-08-30 09:56:35 +01:00
crx-tool	Refactoring.	2017-07-29 09:05:16 +01:00
grepper	Improved grepper.	2017-08-23 16:52:18 +01:00
requirements.txt	Added new tool: crx-jsdecompose.	2017-08-30 00:04:32 +01:00
setup.py	Added new tool: crx-jsdecompose.	2017-08-30 00:04:32 +01:00

README.md

ExtensionCrawler

A collection of utilities for downloading and analyzing browser extension from the Chrome Web store.

crawler: A crawler for extensions from the Chrome Web Store.
crx-tool: A tool for analyzing and extracting *.crx files (i.e., Chrome extensions). Calling crx-tool.py <extension>.crx will check the integrity of the extension.
crx-extract: A simple tool for extracting *.crx files from the tar-based archive hierarchy.
crx-jsdecompose: Build a JavaScript inventory of a *.crx file.\
create-db: A tool for creating/initializing the database files from already existing extension archives.

The utilities store the extensions in the following directory hierarchy:

   archive
   ├── conf
   │   └── forums.conf
   ├── data
   │   └── ...
   └── log
       └── ...

The crawler downloads the most recent extension (i.e., the *.crx file as well as the overview page. In addition, the conf directory may contain one file, called forums.conf that lists the ids of extensions for which the forums and support pages should be downloaded as well. The data directory will contain the downloaded extensions as well as sqlite files containing the extracted meta data. The sqlite files can easily be re-generated using the create-db tool.

All utilities are written in Python 3.x. The required modules are listed in the file requirements.txt.

Installation

Clone and use pip to install as a package.

git clone git@logicalhacking.com:BrowserSecurity/ExtensionCrawler.git
pip install -e ExtensionCrawler

README.md

ExtensionCrawler

Installation

Team

Contributors

License