A Python crawler for extensions from the Chrome Web Store.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Jack Deadman b54803d834 Add installation instructions to README 1 year ago
ExtensionCrawler Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 1 year ago
queries Added sql query. 1 year ago
scripts Ad-hoc integration of a first analysis to be run after building the data base. 1 year ago
sge Fixed grepper sge. 1 year ago
.gitignore Added .swp file to gitignore. 1 year ago
LICENSE initial commit 2 years ago
README.md Add installation instructions to README 1 year ago
crawler Refactoring: Moved default configuration to config module. 1 year ago
create-db Fixed import. 1 year ago
crx-tool Refactoring. 1 year ago
extract-crx Improved error message in case CRX is not found. 1 year ago
grepper Updated greper. 1 year ago
requirements.txt Simplified requirements.txt using 'pipreqs --force .'. 1 year ago
setup.py Add setup.py 1 year ago

README.md

ExtensionCrawler

A collection of utilities for downloading and analyzing browser extension from the Chrome Web store.

  • crawler: A crawler for extensions from the Chrome Web Store.
  • crx-tool: A tool for analyzing and extracting *.crx files (i.e., Chrome extensions). Calling crx-tool.py <extension>.crx will check the integrity of the extension.
  • extract-crx: A simple tool for extracint *.crx files from the tar-based archive hierarchy.
  • create-db: A tool for creating/initializing the database files from already existing extension archives.

The utilities store the extensions in the following directory hierarchy:

   archive
   ├── conf
   │   └── forums.conf
   ├── data
   │   └── ...
   └── log
       └── ...

The crawler downloads the most recent extension (i.e., the *.crx file as well as the overview page. In addition, the conf directory may contain one file, called forums.conf that lists the ids of extensions for which the forums and support pages should be downloaded as well. The data directory will contain the downloaded extensions as well as sqlite files containing the extracted meta data. The sqlite files can easily be re-generated using the create-db tool.

All utilities are written in Python 3.x. The required modules are listed in the file requirements.txt.

Installation

Clone and use pip to install as a package.

git clone git@logicalhacking.com:BrowserSecurity/ExtensionCrawler.git
pip install -e ExtensionCrawler

Team

License

This project is licensed under the GPL 3.0 (or any later version).