ExtensionCrawler/README.md

75 lines
2.4 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# ExtensionCrawler
A collection of utilities for downloading and analyzing browser
extension from the Chrome Web store.
* `crawler`: A crawler for extensions from the Chrome Web Store.
* `crx-tool`: A tool for analyzing and extracting `*.crx` files
(i.e., Chrome extensions). Calling `crx-tool.py <extension>.crx`
will check the integrity of the extension.
* `crx-extract`: A simple tool for extracting `*.crx` files from the
tar-based archive hierarchy.
* `crx-jsinventory`: Build a JavaScript inventory of a `*.crx` file using a
JavaScript decomposition analysis.
* `crx-jsstrings`: A tool for extracting code blocks, comment blocks, and
string literals from JavaScript.
* `create-db`: A tool for updating a remote MariaDB from already
existing extension archives.
The utilities store the extensions in the following directory
hierarchy:
```shell
archive
   ├── conf
   │   └── forums.conf
   ├── data
   │   └── ...
   └── log
   └── ...
```
The crawler downloads the most recent extension (i.e., the `*.crx`
file as well as the overview page. In addition, the `conf` directory
may contain one file, called `forums.conf` that lists the ids of
extensions for which the forums and support pages should be downloaded
as well. The `data` directory will contain the downloaded extensions.
The `crawler` and `create-db` scripts will access and update a MariaDB.
They will use the host, datebase, and credentials found in `~/.my.cnf`.
Since they make use of various JSON features, it is recommended to use at
least version 10.2.8 of MariaDB.
All utilities are written in Python 3.7. The required modules are listed
in the file `requirements.txt`.
## Installation
Clone and use pip3 to install as a package.
```shell
git clone git@logicalhacking.com:BrowserSecurity/ExtensionCrawler.git
pip3 install --user -e ExtensionCrawler
```
## Team
* [Achim D. Brucker](http://www.brucker.ch/)
* [Michael Herzberg](http://www.dcs.shef.ac.uk/cgi-bin/makeperson?M.Herzberg)
### Contributors
* Mehmet Balande
## License
This project is licensed under the GPL 3.0 (or any later version).
SPDX-License-Identifier: GPL-3.0-or-later
## Master Repository
The master git repository for this project is hosted by the [Software
Assurance & Security Research Team](https://logicalhacking.com) at
<https://git.logicalhacking.com/BrowserSecurity/ExtensionCrawler>.