Commit Graph

831 Commits

Author SHA1 Message Date
Achim D. Brucker bdb84c2120 Renamed module. 2017-09-13 01:09:30 +01:00
Achim D. Brucker 4e5b52617f Catch exception during decompression and increase max. allowed size of decompressed data to 100 times of compressed size. 2017-09-13 00:23:17 +01:00
Achim D. Brucker 88efe2b8a4 Reformatting. 2017-09-13 00:02:20 +01:00
Achim D. Brucker ea9339bc53 Compute data identifiers for uncompressed content of gzip compressed files. 2017-09-13 00:01:15 +01:00
Achim D. Brucker f9cf7bd35f Refactoring: moved computation of data related identifiers into own method. 2017-09-12 23:52:52 +01:00
Achim D. Brucker 8243664974 Use StringIO representation for normalizing js/css files (avoid re-reading the file content from disk). 2017-09-12 23:43:09 +01:00
Achim D. Brucker 933c4d4d11 Determine file description from buffer instead from file (avoid reading file twice). 2017-09-12 23:23:22 +01:00
Michael Herzberg 5ce3f2a148 Added until-date option. 2017-09-12 11:01:44 +01:00
Achim D. Brucker 6353202ee8 Renaming: fileinfo -> filedb. 2017-09-10 22:59:07 +01:00
Achim D. Brucker 0426d7d3d1 Reformatting. 2017-09-10 22:39:47 +01:00
Achim D. Brucker e5da9abaea Added get_file_libinfo(...). 2017-09-10 22:38:49 +01:00
Achim D. Brucker 8d9f6e4fa1 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-10 17:40:45 +01:00
Achim D. Brucker ad2af517a3 Agressively try to normalize as many filetypes as possible. 2017-09-10 17:40:30 +01:00
Achim D. Brucker 06ff5f3057 Method for computing basic file identifiers. 2017-09-10 15:57:07 +01:00
Achim D. Brucker a6e90794bc Extended const_basedir to check environment variable EXTENSION_ARCHIVE and modified main scripts to actually use const_basedir. 2017-09-10 15:55:22 +01:00
Achim D. Brucker 4b31097975 Added function for computing a list of normalized code blocks for a JavaScript file. 2017-09-10 15:02:57 +01:00
Michael Herzberg fbef566466 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-10 12:20:33 +01:00
Michael Herzberg e09cb16083 Updated path to archive. 2017-09-10 12:20:23 +01:00
Achim D. Brucker 52b42dfaef Changed pull method to return list of changed files. 2017-09-10 11:01:29 +01:00
Achim D. Brucker c3053427c0 Added method for obtaining initial commit date and pulling git repos. 2017-09-09 23:13:26 +01:00
Achim D. Brucker 08b70ed63a Updated archive dir to reflect new file hierarchy by default. 2017-09-08 21:10:40 +01:00
Achim D. Brucker a519495096 Removed outdated sync script (only useful for old sqlite-based setup). 2017-09-08 20:58:36 +01:00
Achim D. Brucker b93c84f948 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-08 11:42:42 +01:00
Achim D. Brucker de314c1112 Added GitPython dependency. 2017-09-07 20:25:05 +01:00
Achim D. Brucker 8c33558934 Reformatting. 2017-09-07 20:09:29 +01:00
Michael Herzberg 69a04c0a7b Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-07 12:44:19 +01:00
Michael Herzberg 66adacccad Adjusted parameters in grepper sge script. 2017-09-07 12:44:08 +01:00
Achim D. Brucker 2b63192bc2 Initial commit. 2017-09-06 23:32:03 +01:00
Achim D. Brucker 3b2913616b Skip first_seen if not defined. 2017-09-05 10:15:48 +01:00
Michael Herzberg a9173345e8 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-04 15:54:38 +01:00
Michael Herzberg 36d36facfe Relaxed mysql retries. 2017-09-04 15:54:28 +01:00
Achim D. Brucker 6395d98443 Releaxed handling of network errors. 2017-09-04 09:11:27 +01:00
Achim D. Brucker cfeb29d95f Clean-up of logging infrastructure. 2017-09-03 15:56:27 +01:00
Achim D. Brucker f42f8e3d03 Improved error handling for request failures. 2017-09-03 15:43:33 +01:00
Achim D. Brucker 872346fa61 Add timout parameter to http get requests. 2017-09-03 12:03:51 +01:00
Achim D. Brucker 0b0268e320 Copy outphased date to hash map of files archive. 2017-09-03 11:13:27 +01:00
Achim D. Brucker 0f716e98da Bug fix: only try to preserve outphased library information is there is any stored locally. 2017-09-03 11:09:39 +01:00
Achim D. Brucker 80c8e7caa0 Preserve outphased library versions. 2017-09-03 11:00:05 +01:00
Achim D. Brucker 03504ff81a Improved error handling. 2017-09-03 10:45:56 +01:00
Achim D. Brucker 13191f1ce0 Renaming: date -> first_seen. 2017-09-03 10:32:45 +01:00
Achim D. Brucker 59f9b47a81 Switched to Logging framework. 2017-09-03 10:29:57 +01:00
Achim D. Brucker 074447064c Enabled parallel download. 2017-09-03 10:06:55 +01:00
Achim D. Brucker e3aa92f1b8 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-02 22:15:36 +01:00
Achim D. Brucker 515a462938 Added methods for generating/updating index files based on the file hash. 2017-09-02 22:10:43 +01:00
Achim D. Brucker 9ae5905973 Generalized hash map builders. 2017-09-02 21:53:58 +01:00
Achim D. Brucker 22c3a7581d Reformatting. 2017-09-02 21:44:20 +01:00
Achim D. Brucker 3097db3790 Added methods for generating sha1 indexed dictionary. 2017-09-02 21:40:44 +01:00
Achim D. Brucker e5c2372222 Improved log output (verbose mode). 2017-09-02 20:57:01 +01:00
Achim D. Brucker c32ab6bc94 print URL of downloaded library files in verbose mode. 2017-09-02 20:44:47 +01:00
Achim D. Brucker ea8460f1b8 Updated local update. 2017-09-02 20:41:16 +01:00