Commit Graph

621 Commits

Author SHA1 Message Date
Michael Herzberg 7277e6f76e Fixed log msg bug. 2017-09-17 17:45:01 +01:00
Michael Herzberg 0cb7d6e792 Fixed error in exception handling. 2017-09-17 17:40:48 +01:00
Michael Herzberg 1ddb9c1c10 Surpress HTTPS connection log messages. 2017-09-17 12:26:51 +01:00
Michael Herzberg 1fab393e56 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-16 17:23:16 +01:00
Michael Herzberg c3e295267b Log loglevel and only print stacktrace on first mysql exception. 2017-09-16 17:22:57 +01:00
Achim D. Brucker 205c8836e9 Bug fix: do not catch exceptions too aggresively and fix libvers computation for updates. 2017-09-16 17:20:23 +01:00
Achim D. Brucker 4cf41e2e4f Refactoring: moved generic file identifiers into own module. 2017-09-16 17:19:36 +01:00
Achim D. Brucker e98f58fff8 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-16 13:41:56 +01:00
Achim D. Brucker 24c65daecf Bug fix: check for dirty missed actual function application. 2017-09-16 13:41:47 +01:00
Achim D. Brucker c274b96f66 Added csv output for debugging. 2017-09-16 13:21:49 +01:00
Michael Herzberg 69e95fdf13 Catch json parse extensions for reviews etc. more nicely. 2017-09-16 12:53:35 +01:00
Achim D. Brucker de6dde5269 Updated help text to include taskid/maxtaskid. 2017-09-16 12:41:18 +01:00
Michael Herzberg 58aacef3ff Reopen connection after every exception. 2017-09-16 12:31:00 +01:00
Michael Herzberg a514c0001e Added check for empty crx files. 2017-09-16 12:14:41 +01:00
Michael Herzberg b51de8577f Added compression for mysql. 2017-09-16 12:04:35 +01:00
Achim D. Brucker 92e1c4c2e5 Skip deleted files. 2017-09-16 11:41:21 +01:00
Achim D. Brucker 082cd2fc65 Added hacking pull method that uses the regular git binary. While method will not work well with filenames containg spaces and there mit be other glitches, it allows to pull an update of the cdnjs git reposistory (> 100GB) within a couple of minutes compared to a couple of days that the non hackish solution needs. 2017-09-16 11:36:40 +01:00
Achim D. Brucker 5d3343acf1 Refactoring: moved git_repo creation into pull_get_list_changed_files(...). 2017-09-16 10:33:11 +01:00
Achim D. Brucker 7b0e63da10 Implemented n/N options for external parallelisation (only for fresh initialization). 2017-09-15 22:40:46 +01:00
Michael Herzberg a1781b9ff9 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-15 21:32:25 +01:00
Michael Herzberg 1814b1738a Added email notifications on abort. 2017-09-15 21:32:12 +01:00
Achim D. Brucker 400e74ae3f Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-15 20:21:45 +01:00
Achim D. Brucker 26678636eb Ignore commits where blobs are None. 2017-09-15 20:21:05 +01:00
Michael Herzberg 85680d360b Automatically reopen database connection on failure. 2017-09-15 18:23:25 +01:00
Michael Herzberg ddbbc2672d Try to insert also other data if some inserts fail. Use autocommit to prevent data loss on retries. 2017-09-15 18:15:03 +01:00
Michael Herzberg c57bce2491 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-09-15 17:42:05 +01:00
Achim D. Brucker 936f2d3189 Log git info before starting pull (update). 2017-09-14 22:54:37 +01:00
Achim D. Brucker 2ff30f7382 Parallel execution of git date queries. 2017-09-14 15:11:53 +01:00
Achim D. Brucker 12a1e282aa The method pull_get_updated_lib_files(...) now also returns unique library/version information. 2017-09-14 10:44:30 +01:00
Achim D. Brucker e3f1202e44 Use version dictionary. 2017-09-14 10:33:00 +01:00
Achim D. Brucker f54f29c9ba Added build_release_date_dic(...). 2017-09-14 09:50:09 +01:00
Achim D. Brucker 3b217922c5 Added line count. 2017-09-13 16:41:01 +01:00
Achim D. Brucker 420eec7462 Minor memory optimizations. 2017-09-13 11:12:33 +01:00
Achim D. Brucker ec1c47625a Added support for parallel update of database. 2017-09-13 09:13:35 +01:00
Achim D. Brucker c386bd01dd Added missing string conversion. 2017-09-13 08:29:23 +01:00
Achim D. Brucker 42e685ee32 Added missing string conversion. 2017-09-13 08:01:02 +01:00
Achim D. Brucker 18fb23d3dc Use glob instead of os.walk() to avoid memory leak in the latter. 2017-09-13 04:04:38 +01:00
Achim D. Brucker 76d5993794 Added logging output. 2017-09-13 03:02:39 +01:00
Achim D. Brucker c30f7fdd7c Implemented skeleton of main routine. 2017-09-13 02:56:13 +01:00
Achim D. Brucker a8a5534be1 Renamed module. 2017-09-13 01:13:17 +01:00
Achim D. Brucker bdb84c2120 Renamed module. 2017-09-13 01:09:30 +01:00
Achim D. Brucker 4e5b52617f Catch exception during decompression and increase max. allowed size of decompressed data to 100 times of compressed size. 2017-09-13 00:23:17 +01:00
Achim D. Brucker 88efe2b8a4 Reformatting. 2017-09-13 00:02:20 +01:00
Achim D. Brucker ea9339bc53 Compute data identifiers for uncompressed content of gzip compressed files. 2017-09-13 00:01:15 +01:00
Achim D. Brucker f9cf7bd35f Refactoring: moved computation of data related identifiers into own method. 2017-09-12 23:52:52 +01:00
Achim D. Brucker 8243664974 Use StringIO representation for normalizing js/css files (avoid re-reading the file content from disk). 2017-09-12 23:43:09 +01:00
Achim D. Brucker 933c4d4d11 Determine file description from buffer instead from file (avoid reading file twice). 2017-09-12 23:23:22 +01:00
Michael Herzberg 5ce3f2a148 Added until-date option. 2017-09-12 11:01:44 +01:00
Achim D. Brucker 6353202ee8 Renaming: fileinfo -> filedb. 2017-09-10 22:59:07 +01:00
Achim D. Brucker 0426d7d3d1 Reformatting. 2017-09-10 22:39:47 +01:00