A Python crawler for extensions from the Chrome Web Store.
Go to file
Achim D. Brucker 1c0e687069 Use /bin/bash instead of /usr/bin/bash. 2017-06-27 13:03:01 +01:00
ExtensionCrawler Make db path configurable. 2017-06-22 17:46:18 +01:00
scripts Use /bin/bash instead of /usr/bin/bash. 2017-06-27 13:03:01 +01:00
.gitignore Added .swp file to gitignore. 2017-06-20 08:09:40 +01:00
LICENSE initial commit 2016-09-08 20:43:35 +02:00
README.md Fixed typo. 2017-06-25 09:42:54 +01:00
crawler Added missing parallel parameter to second call of update_extensions. 2017-06-21 15:26:48 +01:00
create_db Make db path configurable. 2017-06-22 17:46:18 +01:00
crx-tool Renaming. 2017-03-19 16:34:45 +00:00

README.md

ExtensionCrawler

A collection of utilities for downloading and analyzing browser extension from the Chrome Web store.

  • crawler: A crawler for extensions from the Chrome Web Store.
  • crx-tool: A tool for analyzing and extracting *.crx files (i.e., Chrome extensions). Calling crx-tool.py <extension>.crx will check the integrity of the extension.
  • create_db: A tool for creating/initializing the database files from already existing extension archives.

The utilities store the extensions in the following directory hierarchy:

   archive
   ├── conf
   │   └── forums.conf
   ├── data
   │   └── ...
   └── log
       └── ...

The crawler downloads the most recent extension (i.e., the *.crx file as well as the overview page. In addition, the conf directory may contain one file, called forums.conf that lists the ids of extensions for which the forums and support pages should be downloaded as well. The data directory will contain the downloaded extensions as well as sqlite files containing the extracted meta data. The sqlite files can easily be re-generated using the create_db tool.

All utilities are written in Python 3.x. The following non-standard modules might be required:

  • requests (apt-get install python3-requests)
  • dateutil (apt-get install python3-dateutil)
  • jsmin (apt-get install python3-jsmin)

Team

License

This project is licensed under the GPL 3.0 (or any later version).