ExtensionCrawler/README.md

62 lines
1.9 KiB
Markdown
Raw Normal View History

2016-09-08 18:43:35 +00:00
# ExtensionCrawler
2017-08-23 23:29:59 +00:00
A collection of utilities for downloading and analyzing browser
extension from the Chrome Web store.
2016-09-08 18:43:35 +00:00
2017-08-23 23:29:59 +00:00
* `crawler`: A crawler for extensions from the Chrome Web Store.
* `crx-tool`: A tool for analyzing and extracting `*.crx` files
2016-09-29 15:32:54 +00:00
(i.e., Chrome extensions). Calling `crx-tool.py <extension>.crx`
will check the integrity of the extension.
2017-08-29 23:04:32 +00:00
* `crx-extract`: A simple tool for extracting `*.crx` files from the
tar-based archive hierarchy.
* `crx-jsinventory`: Build a JavaScript inventory of a `*.crx` file using a
JavaScript decomposition analysis.
2017-08-23 23:29:59 +00:00
* `create-db`: A tool for creating/initializing the database files
2017-06-23 17:47:14 +00:00
from already existing extension archives.
2016-09-08 18:55:40 +00:00
2017-08-23 23:29:59 +00:00
The utilities store the extensions in the following directory
2017-06-24 07:16:34 +00:00
hierarchy:
2017-08-23 23:29:59 +00:00
```shell
2017-06-24 07:16:34 +00:00
archive
   ├── conf
   │   └── forums.conf
   ├── data
   │   └── ...
   └── log
   └── ...
```
2017-08-23 23:29:59 +00:00
2017-06-24 07:16:34 +00:00
The crawler downloads the most recent extension (i.e., the `*.crx`
2017-08-23 23:29:59 +00:00
file as well as the overview page. In addition, the `conf` directory
may contain one file, called `forums.conf` that lists the ids of
2017-06-24 07:16:34 +00:00
extensions for which the forums and support pages should be downloaded
2017-08-23 23:29:59 +00:00
as well. The `data` directory will contain the downloaded extensions
2017-06-24 07:16:34 +00:00
as well as sqlite files containing the extracted meta data. The sqlite
2017-08-23 23:29:59 +00:00
files can easily be re-generated using the `create-db` tool.
2017-06-24 07:16:34 +00:00
All utilities are written in Python 3.x. The required modules are listed
in the file `requirements.txt`.
2016-09-08 18:57:54 +00:00
## Installation
2017-08-23 23:29:59 +00:00
Clone and use pip to install as a package.
2017-08-23 23:29:59 +00:00
```shell
git clone git@logicalhacking.com:BrowserSecurity/ExtensionCrawler.git
pip install -e ExtensionCrawler
```
2016-09-24 09:39:01 +00:00
## Team
2017-08-23 23:29:59 +00:00
2016-09-24 09:36:25 +00:00
* [Achim D. Brucker](http://www.brucker.ch/)
* [Michael Herzberg](http://www.dcs.shef.ac.uk/cgi-bin/makeperson?M.Herzberg)
2016-09-22 20:40:34 +00:00
2017-08-23 23:29:59 +00:00
### Contributors
* Mehmet Balande
2016-09-22 20:40:34 +00:00
## License
2017-08-23 23:29:59 +00:00
This project is licensed under the GPL 3.0 (or any later version).