Commit Graph

1136 Commits

Author SHA1 Message Date
Michael Herzberg 67ce14b3db Fixed. 2018-07-08 17:18:58 +01:00
Michael Herzberg 6f080652b6 Fixed. 2018-07-08 16:31:08 +01:00
Michael Herzberg bfee27cf11 Fixed join. 2018-07-06 21:03:49 +01:00
Michael Herzberg 578d533e0e Use sqlite now. 2018-07-06 19:19:29 +01:00
Michael Herzberg f6b4f2bd1b Trying different queries. 2018-07-03 10:43:49 +01:00
Michael Herzberg a6942d1a07 Simply print matches now. 2018-07-02 16:04:13 +02:00
Michael Herzberg aafefc4634 Make the bucket global. 2018-07-02 10:01:11 +01:00
Michael Herzberg e1325c711e Don't load the whole mysql result into memory. 2018-07-02 08:44:28 +01:00
Michael Herzberg e98a50832f Added missing shebang. 2018-07-01 19:59:56 +01:00
Michael Herzberg 17fb832433 Renamed simhashbucket. 2018-07-01 19:50:06 +01:00
Michael Herzberg 92d05ba229 Added simhashbucket. 2018-07-01 19:49:02 +01:00
Achim D. Brucker 4356335659 Removed -m options not supported by ts on our server. 2018-06-19 20:12:20 +01:00
Achim D. Brucker 25d6e04d6e Create log file for output to stderr. 2018-06-15 23:05:28 +01:00
Achim D. Brucker d04086b7ad Fixed mail notification for stalled download. 2018-06-14 08:15:35 +01:00
Michael Herzberg 651506bd0c Sort db inserts to prevent deadlocks. 2018-06-13 09:33:55 +01:00
Michael Herzberg 5b9971ecec Fixed path. 2018-06-10 11:21:23 +01:00
Michael Herzberg d88a7b1e3c Optimized tar file discover when reading ids from file. 2018-06-10 01:44:53 +01:00
Michael Herzberg 630fcba1df Actually write simhash into db... 2018-06-10 01:22:12 +01:00
Michael Herzberg 71178a2a7b Added two similarity analyses. 2018-06-10 01:20:52 +01:00
Michael Herzberg 4d32497ad7 Force ExtensionCrawler.img overwrite. 2018-05-15 00:50:10 +01:00
Michael Herzberg 78bd8a376d Bringing create-db up-to-date. 2018-05-15 00:33:37 +01:00
Achim D. Brucker 437c1c4727 Merge branch 'production' of logicalhacking.com:BrowserSecurity/ExtensionCrawler into production 2018-04-27 16:29:49 +01:00
Achim D. Brucker bad8334df1 Report parallel downloads instead of total downloads as fourth column. 2018-04-27 16:29:32 +01:00
Michael Herzberg 2e1769a853 Fixed no attribute 'id' error. 2018-04-23 15:50:31 +01:00
Achim D. Brucker fd4ed697a7 Added default value for ext_id in const_log_format() to ensure backwards compatibility. 2018-04-22 22:50:27 +01:00
Michael Herzberg 9eb164bb81 Fixed refactor bug. 2018-04-22 21:47:30 +01:00
Michael Herzberg 49ea3bb496 Make sure semaphore is released if an exception occurs during http request. 2018-04-22 13:59:15 +01:00
Michael Herzberg 756dcb3ed1 Increased wait time again... 2018-04-21 21:33:36 +01:00
Michael Herzberg 1dab51d3f5 Reduced bot detection timeout. 2018-04-21 20:50:08 +01:00
Michael Herzberg 5b0f49b35a Deleted annoying Creating DB Connection message. 2018-04-21 20:35:23 +01:00
Michael Herzberg 13e4ee050c Reset mysqlclient version. 2018-04-21 20:17:18 +01:00
Michael Herzberg 738d7e9b4f Adjusted monitor script for new log line. 2018-04-21 20:13:08 +01:00
Michael Herzberg d8d49b1b80 Moved ext_id into logger formatter to make logger output more uniform. 2018-04-21 19:59:02 +01:00
Michael Herzberg dd011aaad1 Removed -P option. 2018-04-21 19:28:47 +01:00
Michael Herzberg ecb00f6009 Merge branch 'master' into mixed_forums 2018-04-21 19:19:07 +01:00
Michael Herzberg a789fe505f Fixed style errors and warnings. 2018-04-21 19:00:07 +01:00
Michael Herzberg ac3c1c7f20 Removed plain multiprocessing option. 2018-04-21 17:25:22 +01:00
Michael Herzberg 0613ac1ac1 Removed explicitly calling the garbage collector. 2018-04-21 16:52:58 +01:00
Michael Herzberg 2715e95665 Only try to add review and support pages if HTTP return code is 200. 2018-04-21 16:50:33 +01:00
Michael Herzberg dbeba9e9bf Use a lock to mix forum downloads into the parallel mode. 2018-04-21 13:59:33 +01:00
Michael Herzberg aee916a629 Moved setting of forkserver further outwards... 2018-04-15 16:26:26 +01:00
Michael Herzberg ff78f8e7d8 Fixed missing parameter. 2018-04-12 23:25:31 +01:00
Michael Herzberg a758134c97 Readded mimetype from mimetypes. TODO: add mysql columns 2018-04-11 16:52:22 +01:00
Michael Herzberg 87b2847c6e Make ProcessPool and pystuck the default (for now). 2018-04-11 15:39:23 +01:00
Michael Herzberg cd09e2509d Removed retry of worker exceptions; instead, properly log them similary to tar and sql exceptions. 2018-04-11 15:38:32 +01:00
Michael Herzberg 22dc8f8263 Added --pystuck option to start pystuck servers for all processes. 2018-04-11 15:15:52 +01:00
Michael Herzberg 46494ec18b Re-setup logging in new processes. 2018-04-10 18:19:12 +01:00
Michael Herzberg 410fa3cf1c Moved setting of forkserver to prevent multiple invocations. 2018-04-10 17:24:10 +01:00
Michael Herzberg 12bdc1b00f Don't crash if something is wrong with the etag file. 2018-04-10 16:32:12 +01:00
Michael Herzberg 385003771a Set chunksize, maxtasksperchild, and max_tasks to 100. 2018-04-10 16:23:22 +01:00