Commit Graph

963 Commits

Author SHA1 Message Date
Achim D. Brucker bad8334df1 Report parallel downloads instead of total downloads as fourth column. 2018-04-27 16:29:32 +01:00
Achim D. Brucker fd4ed697a7 Added default value for ext_id in const_log_format() to ensure backwards compatibility. 2018-04-22 22:50:27 +01:00
Michael Herzberg 9eb164bb81 Fixed refactor bug. 2018-04-22 21:47:30 +01:00
Michael Herzberg 49ea3bb496 Make sure semaphore is released if an exception occurs during http request. 2018-04-22 13:59:15 +01:00
Michael Herzberg 756dcb3ed1 Increased wait time again... 2018-04-21 21:33:36 +01:00
Michael Herzberg 1dab51d3f5 Reduced bot detection timeout. 2018-04-21 20:50:08 +01:00
Michael Herzberg 5b0f49b35a Deleted annoying Creating DB Connection message. 2018-04-21 20:35:23 +01:00
Michael Herzberg 13e4ee050c Reset mysqlclient version. 2018-04-21 20:17:18 +01:00
Michael Herzberg 738d7e9b4f Adjusted monitor script for new log line. 2018-04-21 20:13:08 +01:00
Michael Herzberg d8d49b1b80 Moved ext_id into logger formatter to make logger output more uniform. 2018-04-21 19:59:02 +01:00
Michael Herzberg dd011aaad1 Removed -P option. 2018-04-21 19:28:47 +01:00
Michael Herzberg ecb00f6009 Merge branch 'master' into mixed_forums 2018-04-21 19:19:07 +01:00
Michael Herzberg a789fe505f Fixed style errors and warnings. 2018-04-21 19:00:07 +01:00
Michael Herzberg ac3c1c7f20 Removed plain multiprocessing option. 2018-04-21 17:25:22 +01:00
Michael Herzberg 0613ac1ac1 Removed explicitly calling the garbage collector. 2018-04-21 16:52:58 +01:00
Michael Herzberg 2715e95665 Only try to add review and support pages if HTTP return code is 200. 2018-04-21 16:50:33 +01:00
Michael Herzberg dbeba9e9bf Use a lock to mix forum downloads into the parallel mode. 2018-04-21 13:59:33 +01:00
Michael Herzberg aee916a629 Moved setting of forkserver further outwards... 2018-04-15 16:26:26 +01:00
Michael Herzberg ff78f8e7d8 Fixed missing parameter. 2018-04-12 23:25:31 +01:00
Michael Herzberg a758134c97 Readded mimetype from mimetypes. TODO: add mysql columns 2018-04-11 16:52:22 +01:00
Michael Herzberg 87b2847c6e Make ProcessPool and pystuck the default (for now). 2018-04-11 15:39:23 +01:00
Michael Herzberg cd09e2509d Removed retry of worker exceptions; instead, properly log them similary to tar and sql exceptions. 2018-04-11 15:38:32 +01:00
Michael Herzberg 22dc8f8263 Added --pystuck option to start pystuck servers for all processes. 2018-04-11 15:15:52 +01:00
Michael Herzberg 46494ec18b Re-setup logging in new processes. 2018-04-10 18:19:12 +01:00
Michael Herzberg 410fa3cf1c Moved setting of forkserver to prevent multiple invocations. 2018-04-10 17:24:10 +01:00
Michael Herzberg 12bdc1b00f Don't crash if something is wrong with the etag file. 2018-04-10 16:32:12 +01:00
Michael Herzberg 385003771a Set chunksize, maxtasksperchild, and max_tasks to 100. 2018-04-10 16:23:22 +01:00
Michael Herzberg bbe575d07b Pebble: start processing results right away. 2018-04-10 16:15:33 +01:00
Michael Herzberg 6bee81b711 Use forkserver. 2018-04-10 16:13:31 +01:00
Michael Herzberg 778736e2d3 Fixed logging of if-modified-since. 2018-04-10 10:55:03 +01:00
Michael Herzberg f677258f83 Added use of garbage collector. 2018-04-10 10:51:33 +01:00
Michael Herzberg d27106d7a9 Added creation of separate .etag files outside the .tar file. 2018-04-09 19:42:41 +01:00
Michael Herzberg 50b598993f Bugfix: actually download forums on sequential run. 2018-04-09 18:38:51 +01:00
Michael Herzberg f4c0ff56ff Use magic for mimetypes and don't attempt text-based analyses on binary resources. 2018-04-09 14:25:47 +01:00
Michael Herzberg fcfa58fb3d Wheel needs to be installed before ExtensionCrawler. 2018-04-09 00:14:07 +01:00
Achim D. Brucker 0c70b2e20b Increase number of parallel downloads. 2018-04-08 22:45:56 +01:00
Michael Herzberg 3d136daae3 Various small bug fixes. 2018-04-08 17:44:59 +01:00
Michael Herzberg faa2214af4 Timeout must be an integer. 2018-04-08 13:10:26 +01:00
Achim D. Brucker 33898a4cf3 Updated help text. 2018-04-08 10:10:30 +01:00
Achim D. Brucker e1ef0758f7 Made the choice of Pool vs. ProcessPool a configuration option. 2018-04-08 10:06:26 +01:00
Achim D. Brucker 70b64616e1 Ensure the use of /usr/bin/mail. 2018-04-08 09:59:03 +01:00
Achim D. Brucker 7f71a40ff4 Configured number of parallel processes. 2018-04-07 21:14:36 +01:00
Achim D. Brucker 66023b6b72 Reverted test of ThreadPools. 2018-04-07 21:13:32 +01:00
Achim D. Brucker a75380b0c5 Merge branch 'production' of logicalhacking.com:BrowserSecurity/ExtensionCrawler into production 2018-04-07 19:49:03 +01:00
Achim D. Brucker 987236958e Testing ThreadPools. 2018-04-07 19:48:45 +01:00
Achim D. Brucker c3d8de9b81 Testing ThreadPools. 2018-04-07 19:37:55 +01:00
Achim D. Brucker a3c60c0ae8 Ensure that mail recipient is defined. 2018-04-07 17:54:15 +01:00
Achim D. Brucker a7f0b26ead Log memory usage. 2018-04-07 16:26:00 +01:00
Achim D. Brucker 2fc154d643 Use UTC-based time/dates for logging. 2018-04-07 15:54:39 +01:00
Achim D. Brucker 91a76091e3 Use UTC-based time/dates for logging. 2018-04-07 15:42:29 +01:00