ExtensionCrawler/README.md

75 lines
2.4 KiB
Markdown
Raw Permalink Normal View History

2016-09-08 18:43:35 +00:00
# ExtensionCrawler
2017-08-23 23:29:59 +00:00
A collection of utilities for downloading and analyzing browser
extension from the Chrome Web store.
2016-09-08 18:43:35 +00:00
2017-08-23 23:29:59 +00:00
* `crawler`: A crawler for extensions from the Chrome Web Store.
* `crx-tool`: A tool for analyzing and extracting `*.crx` files
2016-09-29 15:32:54 +00:00
(i.e., Chrome extensions). Calling `crx-tool.py <extension>.crx`
will check the integrity of the extension.
2017-08-29 23:04:32 +00:00
* `crx-extract`: A simple tool for extracting `*.crx` files from the
tar-based archive hierarchy.
2017-08-30 16:18:31 +00:00
* `crx-jsinventory`: Build a JavaScript inventory of a `*.crx` file using a
JavaScript decomposition analysis.
2017-08-30 16:18:31 +00:00
* `crx-jsstrings`: A tool for extracting code blocks, comment blocks, and
2017-08-30 22:25:49 +00:00
string literals from JavaScript.
* `create-db`: A tool for updating a remote MariaDB from already
existing extension archives.
2016-09-08 18:55:40 +00:00
2017-08-23 23:29:59 +00:00
The utilities store the extensions in the following directory
2017-06-24 07:16:34 +00:00
hierarchy:
2017-08-23 23:29:59 +00:00
```shell
2017-06-24 07:16:34 +00:00
archive
   ├── conf
   │   └── forums.conf
   ├── data
   │   └── ...
   └── log
   └── ...
```
2017-08-23 23:29:59 +00:00
2017-06-24 07:16:34 +00:00
The crawler downloads the most recent extension (i.e., the `*.crx`
2017-08-23 23:29:59 +00:00
file as well as the overview page. In addition, the `conf` directory
may contain one file, called `forums.conf` that lists the ids of
2017-06-24 07:16:34 +00:00
extensions for which the forums and support pages should be downloaded
as well. The `data` directory will contain the downloaded extensions.
The `crawler` and `create-db` scripts will access and update a MariaDB.
They will use the host, datebase, and credentials found in `~/.my.cnf`.
Since they make use of various JSON features, it is recommended to use at
least version 10.2.8 of MariaDB.
2017-06-24 07:16:34 +00:00
2019-02-01 13:42:40 +00:00
All utilities are written in Python 3.7. The required modules are listed
in the file `requirements.txt`.
2016-09-08 18:57:54 +00:00
## Installation
2017-08-23 23:29:59 +00:00
2017-09-01 23:07:50 +00:00
Clone and use pip3 to install as a package.
2017-08-23 23:29:59 +00:00
```shell
git clone git@logicalhacking.com:BrowserSecurity/ExtensionCrawler.git
2017-09-01 23:07:50 +00:00
pip3 install --user -e ExtensionCrawler
```
2016-09-24 09:39:01 +00:00
## Team
2017-08-23 23:29:59 +00:00
2016-09-24 09:36:25 +00:00
* [Achim D. Brucker](http://www.brucker.ch/)
* [Michael Herzberg](http://www.dcs.shef.ac.uk/cgi-bin/makeperson?M.Herzberg)
2016-09-22 20:40:34 +00:00
2017-08-23 23:29:59 +00:00
### Contributors
* Mehmet Balande
2016-09-22 20:40:34 +00:00
## License
2017-08-23 23:29:59 +00:00
This project is licensed under the GPL 3.0 (or any later version).
2018-07-21 11:12:59 +00:00
2018-07-21 11:13:54 +00:00
SPDX-License-Identifier: GPL-3.0-or-later
2018-07-21 11:12:59 +00:00
## Master Repository
The master git repository for this project is hosted by the [Software
Assurance & Security Research Team](https://logicalhacking.com) at
<https://git.logicalhacking.com/BrowserSecurity/ExtensionCrawler>.