Commit Graph

602 Commits

Author SHA1 Message Date
Michael Herzberg a1781b9ff9 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-15 21:32:25 +01:00
Michael Herzberg 1814b1738a Added email notifications on abort. 2017-09-15 21:32:12 +01:00
Achim D. Brucker 400e74ae3f Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-15 20:21:45 +01:00
Achim D. Brucker 26678636eb Ignore commits where blobs are None. 2017-09-15 20:21:05 +01:00
Michael Herzberg 85680d360b Automatically reopen database connection on failure. 2017-09-15 18:23:25 +01:00
Michael Herzberg ddbbc2672d Try to insert also other data if some inserts fail. Use autocommit to prevent data loss on retries. 2017-09-15 18:15:03 +01:00
Michael Herzberg c57bce2491 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-15 17:42:05 +01:00
Achim D. Brucker 936f2d3189 Log git info before starting pull (update). 2017-09-14 22:54:37 +01:00
Achim D. Brucker 2ff30f7382 Parallel execution of git date queries. 2017-09-14 15:11:53 +01:00
Achim D. Brucker 12a1e282aa The method pull_get_updated_lib_files(...) now also returns unique library/version information. 2017-09-14 10:44:30 +01:00
Achim D. Brucker e3f1202e44 Use version dictionary. 2017-09-14 10:33:00 +01:00
Achim D. Brucker f54f29c9ba Added build_release_date_dic(...). 2017-09-14 09:50:09 +01:00
Achim D. Brucker 3b217922c5 Added line count. 2017-09-13 16:41:01 +01:00
Achim D. Brucker 420eec7462 Minor memory optimizations. 2017-09-13 11:12:33 +01:00
Achim D. Brucker ec1c47625a Added support for parallel update of database. 2017-09-13 09:13:35 +01:00
Achim D. Brucker c386bd01dd Added missing string conversion. 2017-09-13 08:29:23 +01:00
Achim D. Brucker 42e685ee32 Added missing string conversion. 2017-09-13 08:01:02 +01:00
Achim D. Brucker 18fb23d3dc Use glob instead of os.walk() to avoid memory leak in the latter. 2017-09-13 04:04:38 +01:00
Achim D. Brucker 76d5993794 Added logging output. 2017-09-13 03:02:39 +01:00
Achim D. Brucker c30f7fdd7c Implemented skeleton of main routine. 2017-09-13 02:56:13 +01:00
Achim D. Brucker a8a5534be1 Renamed module. 2017-09-13 01:13:17 +01:00
Achim D. Brucker bdb84c2120 Renamed module. 2017-09-13 01:09:30 +01:00
Achim D. Brucker 4e5b52617f Catch exception during decompression and increase max. allowed size of decompressed data to 100 times of compressed size. 2017-09-13 00:23:17 +01:00
Achim D. Brucker 88efe2b8a4 Reformatting. 2017-09-13 00:02:20 +01:00
Achim D. Brucker ea9339bc53 Compute data identifiers for uncompressed content of gzip compressed files. 2017-09-13 00:01:15 +01:00
Achim D. Brucker f9cf7bd35f Refactoring: moved computation of data related identifiers into own method. 2017-09-12 23:52:52 +01:00
Achim D. Brucker 8243664974 Use StringIO representation for normalizing js/css files (avoid re-reading the file content from disk). 2017-09-12 23:43:09 +01:00
Achim D. Brucker 933c4d4d11 Determine file description from buffer instead from file (avoid reading file twice). 2017-09-12 23:23:22 +01:00
Michael Herzberg 5ce3f2a148 Added until-date option. 2017-09-12 11:01:44 +01:00
Achim D. Brucker 6353202ee8 Renaming: fileinfo -> filedb. 2017-09-10 22:59:07 +01:00
Achim D. Brucker 0426d7d3d1 Reformatting. 2017-09-10 22:39:47 +01:00
Achim D. Brucker e5da9abaea Added get_file_libinfo(...). 2017-09-10 22:38:49 +01:00
Achim D. Brucker 8d9f6e4fa1 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-10 17:40:45 +01:00
Achim D. Brucker ad2af517a3 Agressively try to normalize as many filetypes as possible. 2017-09-10 17:40:30 +01:00
Achim D. Brucker 06ff5f3057 Method for computing basic file identifiers. 2017-09-10 15:57:07 +01:00
Achim D. Brucker a6e90794bc Extended const_basedir to check environment variable EXTENSION_ARCHIVE and modified main scripts to actually use const_basedir. 2017-09-10 15:55:22 +01:00
Achim D. Brucker 4b31097975 Added function for computing a list of normalized code blocks for a JavaScript file. 2017-09-10 15:02:57 +01:00
Michael Herzberg fbef566466 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-10 12:20:33 +01:00
Michael Herzberg e09cb16083 Updated path to archive. 2017-09-10 12:20:23 +01:00
Achim D. Brucker 52b42dfaef Changed pull method to return list of changed files. 2017-09-10 11:01:29 +01:00
Achim D. Brucker c3053427c0 Added method for obtaining initial commit date and pulling git repos. 2017-09-09 23:13:26 +01:00
Achim D. Brucker 08b70ed63a Updated archive dir to reflect new file hierarchy by default. 2017-09-08 21:10:40 +01:00
Achim D. Brucker a519495096 Removed outdated sync script (only useful for old sqlite-based setup). 2017-09-08 20:58:36 +01:00
Achim D. Brucker b93c84f948 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-08 11:42:42 +01:00
Achim D. Brucker de314c1112 Added GitPython dependency. 2017-09-07 20:25:05 +01:00
Achim D. Brucker 8c33558934 Reformatting. 2017-09-07 20:09:29 +01:00
Michael Herzberg 69a04c0a7b Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-07 12:44:19 +01:00
Michael Herzberg 66adacccad Adjusted parameters in grepper sge script. 2017-09-07 12:44:08 +01:00
Achim D. Brucker 2b63192bc2 Initial commit. 2017-09-06 23:32:03 +01:00
Achim D. Brucker 3b2913616b Skip first_seen if not defined. 2017-09-05 10:15:48 +01:00