Michael Herzberg
|
778736e2d3
|
Fixed logging of if-modified-since.
|
2018-04-10 10:55:03 +01:00 |
Michael Herzberg
|
f677258f83
|
Added use of garbage collector.
|
2018-04-10 10:51:33 +01:00 |
Michael Herzberg
|
d27106d7a9
|
Added creation of separate .etag files outside the .tar file.
|
2018-04-09 19:42:41 +01:00 |
Michael Herzberg
|
50b598993f
|
Bugfix: actually download forums on sequential run.
|
2018-04-09 18:38:51 +01:00 |
Michael Herzberg
|
f4c0ff56ff
|
Use magic for mimetypes and don't attempt text-based analyses on binary resources.
|
2018-04-09 14:25:47 +01:00 |
Michael Herzberg
|
fcfa58fb3d
|
Wheel needs to be installed before ExtensionCrawler.
|
2018-04-09 00:14:07 +01:00 |
Achim D. Brucker
|
0c70b2e20b
|
Increase number of parallel downloads.
|
2018-04-08 22:45:56 +01:00 |
Michael Herzberg
|
3d136daae3
|
Various small bug fixes.
|
2018-04-08 17:44:59 +01:00 |
Michael Herzberg
|
faa2214af4
|
Timeout must be an integer.
|
2018-04-08 13:10:26 +01:00 |
Achim D. Brucker
|
33898a4cf3
|
Updated help text.
|
2018-04-08 10:10:30 +01:00 |
Achim D. Brucker
|
e1ef0758f7
|
Made the choice of Pool vs. ProcessPool a configuration option.
|
2018-04-08 10:06:26 +01:00 |
Achim D. Brucker
|
70b64616e1
|
Ensure the use of /usr/bin/mail.
|
2018-04-08 09:59:03 +01:00 |
Achim D. Brucker
|
7f71a40ff4
|
Configured number of parallel processes.
|
2018-04-07 21:14:36 +01:00 |
Achim D. Brucker
|
66023b6b72
|
Reverted test of ThreadPools.
|
2018-04-07 21:13:32 +01:00 |
Achim D. Brucker
|
a75380b0c5
|
Merge branch 'production' of logicalhacking.com:BrowserSecurity/ExtensionCrawler into production
|
2018-04-07 19:49:03 +01:00 |
Achim D. Brucker
|
987236958e
|
Testing ThreadPools.
|
2018-04-07 19:48:45 +01:00 |
Achim D. Brucker
|
c3d8de9b81
|
Testing ThreadPools.
|
2018-04-07 19:37:55 +01:00 |
Achim D. Brucker
|
a3c60c0ae8
|
Ensure that mail recipient is defined.
|
2018-04-07 17:54:15 +01:00 |
Achim D. Brucker
|
a7f0b26ead
|
Log memory usage.
|
2018-04-07 16:26:00 +01:00 |
Achim D. Brucker
|
2fc154d643
|
Use UTC-based time/dates for logging.
|
2018-04-07 15:54:39 +01:00 |
Achim D. Brucker
|
91a76091e3
|
Use UTC-based time/dates for logging.
|
2018-04-07 15:42:29 +01:00 |
Achim D. Brucker
|
c7d28d2c9e
|
Merge branch 'production' of logicalhacking.com:BrowserSecurity/ExtensionCrawler into production
|
2018-04-07 13:29:48 +01:00 |
Achim D. Brucker
|
f6a9d49da1
|
Reverted processing in chunks back into processing only one large list.
|
2018-04-07 13:17:33 +01:00 |
Michael Herzberg
|
558bff402a
|
Removed --writable flag from read-only ExtensionCrawler image.
|
2018-04-07 00:42:39 +01:00 |
Michael Herzberg
|
0c3423dcd8
|
Fitted db connection log messages into our logging framework.
|
2018-04-07 00:42:39 +01:00 |
Michael Herzberg
|
9c1d48fcbe
|
Added 'wheel' to dependencies to fix build error with simhash.
|
2018-04-07 00:42:39 +01:00 |
Achim D. Brucker
|
7756ad2963
|
Bug fix: actually use max_workers.
|
2018-04-06 23:04:01 +01:00 |
Achim D. Brucker
|
6a86b37e7c
|
Increase number of parallel downloads.
|
2018-04-06 21:36:11 +01:00 |
Achim D. Brucker
|
14a30a570d
|
Process extensions in chunks.
|
2018-04-06 21:34:09 +01:00 |
Achim D. Brucker
|
d5df43c5c3
|
Moved heuristic for parallel download into separate method.
|
2018-04-06 20:32:24 +01:00 |
Achim D. Brucker
|
9434df1b28
|
Set max task to 100.
|
2018-04-06 16:37:49 +01:00 |
Achim D. Brucker
|
69f1618db2
|
Reduced number of parallel downloads, as pebble seems to be much more memory hungry ...
|
2018-04-06 13:34:36 +01:00 |
Achim D. Brucker
|
d3fe5e758a
|
New default download timeout to 2 hours.
|
2018-04-06 12:08:02 +01:00 |
Achim D. Brucker
|
d9fc65a089
|
Reformatting.
|
2018-04-06 07:27:57 +01:00 |
Achim D. Brucker
|
8c9aab8216
|
Converted timeout into a proper configuration parameter.
|
2018-04-06 07:25:21 +01:00 |
Achim D. Brucker
|
9586eed280
|
Added documentation.
|
2018-04-06 07:18:15 +01:00 |
Achim D. Brucker
|
fd9cc1855a
|
Improved command line interface for selecting which type of extensiosn should be crawled.
|
2018-04-06 07:17:20 +01:00 |
Achim D. Brucker
|
47f4af5d1b
|
Fixed spelling of constant 'False'.
|
2018-04-06 06:46:25 +01:00 |
Achim D. Brucker
|
054fdb62cb
|
Prefix top-leve exception logs for workers with WorkerException.
|
2018-04-05 23:16:27 +01:00 |
Achim D. Brucker
|
5d70bf1831
|
Switched to pebble.ProcessPool() for concurrency.
|
2018-04-05 22:51:27 +01:00 |
Achim D. Brucker
|
fee88ed0fe
|
Implemented sequential download mode.
|
2018-04-05 17:32:11 +01:00 |
Achim D. Brucker
|
bf6269c600
|
Bug fix: warning mail for stalled download.
|
2018-04-05 17:05:51 +01:00 |
Achim D. Brucker
|
d0c185fa69
|
Log If-Modified-Since request for timing analysis.
|
2018-04-04 23:14:24 +01:00 |
Achim D. Brucker
|
2d33f3bebe
|
Added pdf output.
|
2018-04-04 09:35:35 +01:00 |
Achim D. Brucker
|
de2519130a
|
Improved derivative (downloads per eight hours) and png output.
|
2018-04-04 09:20:29 +01:00 |
Achim D. Brucker
|
423d3c35fa
|
Improved line types for png output.
|
2018-04-04 08:27:02 +01:00 |
Achim D. Brucker
|
0f1c53a011
|
More aggressive download heuristics.
|
2018-04-03 22:20:09 +01:00 |
Achim D. Brucker
|
7c688afee8
|
Configured 32 parallel downloads.
|
2018-04-03 22:14:02 +01:00 |
Achim D. Brucker
|
ca42e4026f
|
Added plot of an approximation of the first derivative.
|
2018-04-03 16:17:14 +01:00 |
Achim D. Brucker
|
1102fc102d
|
Kill running downloads more aggressively.
|
2018-04-02 16:41:07 +01:00 |