Achim D. Brucker
|
ec1c47625a
|
Added support for parallel update of database.
|
2017-09-13 09:13:35 +01:00 |
Achim D. Brucker
|
c386bd01dd
|
Added missing string conversion.
|
2017-09-13 08:29:23 +01:00 |
Achim D. Brucker
|
42e685ee32
|
Added missing string conversion.
|
2017-09-13 08:01:02 +01:00 |
Achim D. Brucker
|
18fb23d3dc
|
Use glob instead of os.walk() to avoid memory leak in the latter.
|
2017-09-13 04:04:38 +01:00 |
Achim D. Brucker
|
76d5993794
|
Added logging output.
|
2017-09-13 03:02:39 +01:00 |
Achim D. Brucker
|
c30f7fdd7c
|
Implemented skeleton of main routine.
|
2017-09-13 02:56:13 +01:00 |
Achim D. Brucker
|
a8a5534be1
|
Renamed module.
|
2017-09-13 01:13:17 +01:00 |
Achim D. Brucker
|
bdb84c2120
|
Renamed module.
|
2017-09-13 01:09:30 +01:00 |
Achim D. Brucker
|
4e5b52617f
|
Catch exception during decompression and increase max. allowed size of decompressed data to 100 times of compressed size.
|
2017-09-13 00:23:17 +01:00 |
Achim D. Brucker
|
88efe2b8a4
|
Reformatting.
|
2017-09-13 00:02:20 +01:00 |
Achim D. Brucker
|
ea9339bc53
|
Compute data identifiers for uncompressed content of gzip compressed files.
|
2017-09-13 00:01:15 +01:00 |
Achim D. Brucker
|
f9cf7bd35f
|
Refactoring: moved computation of data related identifiers into own method.
|
2017-09-12 23:52:52 +01:00 |
Achim D. Brucker
|
8243664974
|
Use StringIO representation for normalizing js/css files (avoid re-reading the file content from disk).
|
2017-09-12 23:43:09 +01:00 |
Achim D. Brucker
|
933c4d4d11
|
Determine file description from buffer instead from file (avoid reading file twice).
|
2017-09-12 23:23:22 +01:00 |
Michael Herzberg
|
5ce3f2a148
|
Added until-date option.
|
2017-09-12 11:01:44 +01:00 |
Achim D. Brucker
|
6353202ee8
|
Renaming: fileinfo -> filedb.
|
2017-09-10 22:59:07 +01:00 |
Achim D. Brucker
|
0426d7d3d1
|
Reformatting.
|
2017-09-10 22:39:47 +01:00 |
Achim D. Brucker
|
e5da9abaea
|
Added get_file_libinfo(...).
|
2017-09-10 22:38:49 +01:00 |
Achim D. Brucker
|
8d9f6e4fa1
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-10 17:40:45 +01:00 |
Achim D. Brucker
|
ad2af517a3
|
Agressively try to normalize as many filetypes as possible.
|
2017-09-10 17:40:30 +01:00 |
Achim D. Brucker
|
06ff5f3057
|
Method for computing basic file identifiers.
|
2017-09-10 15:57:07 +01:00 |
Achim D. Brucker
|
a6e90794bc
|
Extended const_basedir to check environment variable EXTENSION_ARCHIVE and modified main scripts to actually use const_basedir.
|
2017-09-10 15:55:22 +01:00 |
Achim D. Brucker
|
4b31097975
|
Added function for computing a list of normalized code blocks for a JavaScript file.
|
2017-09-10 15:02:57 +01:00 |
Michael Herzberg
|
fbef566466
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-10 12:20:33 +01:00 |
Michael Herzberg
|
e09cb16083
|
Updated path to archive.
|
2017-09-10 12:20:23 +01:00 |
Achim D. Brucker
|
52b42dfaef
|
Changed pull method to return list of changed files.
|
2017-09-10 11:01:29 +01:00 |
Achim D. Brucker
|
c3053427c0
|
Added method for obtaining initial commit date and pulling git repos.
|
2017-09-09 23:13:26 +01:00 |
Achim D. Brucker
|
08b70ed63a
|
Updated archive dir to reflect new file hierarchy by default.
|
2017-09-08 21:10:40 +01:00 |
Achim D. Brucker
|
a519495096
|
Removed outdated sync script (only useful for old sqlite-based setup).
|
2017-09-08 20:58:36 +01:00 |
Achim D. Brucker
|
b93c84f948
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-08 11:42:42 +01:00 |
Achim D. Brucker
|
de314c1112
|
Added GitPython dependency.
|
2017-09-07 20:25:05 +01:00 |
Achim D. Brucker
|
8c33558934
|
Reformatting.
|
2017-09-07 20:09:29 +01:00 |
Michael Herzberg
|
69a04c0a7b
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-07 12:44:19 +01:00 |
Michael Herzberg
|
66adacccad
|
Adjusted parameters in grepper sge script.
|
2017-09-07 12:44:08 +01:00 |
Achim D. Brucker
|
2b63192bc2
|
Initial commit.
|
2017-09-06 23:32:03 +01:00 |
Achim D. Brucker
|
3b2913616b
|
Skip first_seen if not defined.
|
2017-09-05 10:15:48 +01:00 |
Michael Herzberg
|
a9173345e8
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-04 15:54:38 +01:00 |
Michael Herzberg
|
36d36facfe
|
Relaxed mysql retries.
|
2017-09-04 15:54:28 +01:00 |
Achim D. Brucker
|
6395d98443
|
Releaxed handling of network errors.
|
2017-09-04 09:11:27 +01:00 |
Achim D. Brucker
|
cfeb29d95f
|
Clean-up of logging infrastructure.
|
2017-09-03 15:56:27 +01:00 |
Achim D. Brucker
|
f42f8e3d03
|
Improved error handling for request failures.
|
2017-09-03 15:43:33 +01:00 |
Achim D. Brucker
|
872346fa61
|
Add timout parameter to http get requests.
|
2017-09-03 12:03:51 +01:00 |
Achim D. Brucker
|
0b0268e320
|
Copy outphased date to hash map of files archive.
|
2017-09-03 11:13:27 +01:00 |
Achim D. Brucker
|
0f716e98da
|
Bug fix: only try to preserve outphased library information is there is any stored locally.
|
2017-09-03 11:09:39 +01:00 |
Achim D. Brucker
|
80c8e7caa0
|
Preserve outphased library versions.
|
2017-09-03 11:00:05 +01:00 |
Achim D. Brucker
|
03504ff81a
|
Improved error handling.
|
2017-09-03 10:45:56 +01:00 |
Achim D. Brucker
|
13191f1ce0
|
Renaming: date -> first_seen.
|
2017-09-03 10:32:45 +01:00 |
Achim D. Brucker
|
59f9b47a81
|
Switched to Logging framework.
|
2017-09-03 10:29:57 +01:00 |
Achim D. Brucker
|
074447064c
|
Enabled parallel download.
|
2017-09-03 10:06:55 +01:00 |
Achim D. Brucker
|
e3aa92f1b8
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-02 22:15:36 +01:00 |