Commit Graph

503 Commits

Author SHA1 Message Date
Achim D. Brucker ae3bbd7339 Using values of enumeration to obtain nice and short human readable representations. 2017-08-30 00:12:57 +01:00
Michael Herzberg 47f424cf2f Added more logging. 2017-08-29 23:10:46 +01:00
Michael Herzberg 080f00f17c Added new columns for jsfile table. 2017-08-29 22:40:01 +01:00
Michael Herzberg 95d71a9edc Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-08-29 22:29:49 +01:00
Michael Herzberg 3e24d1f08c Changed logging to use logging library. 2017-08-29 22:29:38 +01:00
Achim D. Brucker 39cd03dccc Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-08-29 18:01:42 +01:00
Achim D. Brucker 97f5b14158 Compute sha1 for JavaScript files. 2017-08-29 18:01:28 +01:00
Michael Herzberg bddd80c138 Made removal of manifest.json comments stricter. 2017-08-29 15:43:04 +01:00
Michael Herzberg 7ffdf30545 Push manifest into table crx column manifest. 2017-08-29 15:41:13 +01:00
Michael Herzberg 2b11117b6f Always process crx, regardless whether or not crx_etag is already in db. 2017-08-29 15:24:59 +01:00
Michael Herzberg 8b91957372 Reduced default MySQL timeout. 2017-08-29 15:20:58 +01:00
Michael Herzberg 6a99d41471 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-08-29 15:11:37 +01:00
Achim D. Brucker d4ad5f96f8 Report empty files as own category/type. 2017-08-28 22:38:06 +01:00
Michael Herzberg f81aac7c61 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-08-28 22:38:05 +01:00
Achim D. Brucker 2ace19f453 Compute js_info (including md5 hash and character set detection) only once per file. 2017-08-28 21:05:15 +01:00
Achim D. Brucker 91dfe67513 Auto-detect character encoding of JavaScript files using cchardet. 2017-08-28 20:53:55 +01:00
Michael Herzberg c30f0c4147 Removed database and host setting. To be set in ~/.my.cnf file now. 2017-08-28 20:17:11 +01:00
Achim D. Brucker 5cff2bc1b7 New check based on file hash (md5). 2017-08-28 20:09:34 +01:00
Achim D. Brucker 030adb6adc Minor refactoring and cleanup. 2017-08-28 19:20:50 +01:00
Michael Herzberg 5175d28edc Convert some stuff to string for db insert. 2017-08-28 17:12:32 +01:00
Michael Herzberg 0a4e8839a1 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-08-28 11:50:49 +01:00
Michael Herzberg 81077b807c Added mysql retry logic and use time.time() now. 2017-08-28 11:50:41 +01:00
Achim D. Brucker 9bf0b47f98 Minor improvement of string conversion for JsBlock. 2017-08-28 10:50:52 +01:00
Achim D. Brucker c721e6fdbf Merge with upstream. 2017-08-28 10:49:01 +01:00
Achim D. Brucker f10923af03 Integreated js_mincer into decomposition analysis to allow, in the future, to check comments, code, and string literals explicitely. 2017-08-28 10:40:37 +01:00
Achim D. Brucker 9ef27f9ac9 Added missing return statements. 2017-08-28 10:28:21 +01:00
Achim D. Brucker 90b1db4a25 Added additional comment checks. 2017-08-28 01:26:13 +01:00
Achim D. Brucker 9b272c9302 Added option to merge subsequent single line comments into a single line comment block. 2017-08-28 01:17:00 +01:00
Achim D. Brucker 111777c821 Improved position counting. 2017-08-28 00:57:58 +01:00
Achim D. Brucker d4de20efc1 Bug fix: start position of blocks and omit empty code blocks. 2017-08-28 00:19:28 +01:00
Achim D. Brucker e2e92594ce Bug fix: catch also last block of file. 2017-08-27 23:34:33 +01:00
Michael Herzberg 257afe92f0 Use selective insert. 2017-08-27 23:00:28 +01:00
Achim D. Brucker 629f492fa7 Added tests for code blocks and comments. 2017-08-27 22:58:09 +01:00
Achim D. Brucker 7ff1623bc6 Introduced JavaScript mincer working on file objects. 2017-08-27 22:51:55 +01:00
Michael Herzberg b98b7bc0f7 Fixed column typo. 2017-08-27 22:49:07 +01:00
Achim D. Brucker e324ab9483 Re-formatted and added documentation. 2017-08-27 22:41:04 +01:00
Achim D. Brucker 9376b4056f Collect string literals in code blocks. 2017-08-27 22:27:35 +01:00
Achim D. Brucker 41ca506b9f Return iterator that iterates over JavaScript blocks. 2017-08-27 22:17:04 +01:00
Achim D. Brucker 5add586da3 Initial commit. 2017-08-27 20:47:24 +01:00
Achim D. Brucker f6f0bc0394 Renamed jsdecompose.py to js_decomposer.py. 2017-08-27 20:45:56 +01:00
Michael Herzberg 9521240d90 Make stuff configurable. 2017-08-27 18:28:19 +01:00
Michael Herzberg 0cff600861 Fixed etag keys. 2017-08-27 17:35:58 +01:00
Michael Herzberg d4b0a6535b Fixed some things. 2017-08-27 16:57:23 +01:00
Michael Herzberg f075192b44 made sqlite default again. 2017-08-27 03:26:29 +01:00
Michael Herzberg 22c90dcb4f Truncate timezone from timestamps for mysql, make mysql default. 2017-08-27 03:14:43 +01:00
Michael Herzberg 585c8faf0e Added mysql, but still outcommented. 2017-08-27 02:53:15 +01:00
Michael Herzberg c5c04cd1ed Refactored sqlite-specifics into own class. 2017-08-27 00:22:19 +01:00
Achim D. Brucker 0bd6a55adb Added documentation for analyse_filename. 2017-08-26 22:45:14 +01:00
Achim D. Brucker df472fbbe8 Refactored filename check. 2017-08-26 22:43:57 +01:00
Achim D. Brucker b2c862ede1 Added fields for storing evidence information for detected library/version information. 2017-08-25 07:07:34 +01:00
Achim D. Brucker 807af6f32d Refactoring: proper use of enumerations. 2017-08-24 21:37:35 +01:00
Achim D. Brucker 45d2c7ad44 Fundamental refactoring. 2017-08-24 19:43:48 +01:00
Achim D. Brucker 676cc5ac9d Renamed detectLibraries to decompose_js. 2017-08-24 00:47:35 +01:00
Achim D. Brucker 486b967d2d Refactoring. 2017-08-24 00:44:34 +01:00
Achim D. Brucker 9ced7ea3b5 Refactoring and bug fix in library classification. 2017-08-24 00:29:44 +01:00
Achim D. Brucker 94bd0f9a95 Refactoring. 2017-08-23 23:37:15 +01:00
Achim D. Brucker 2bbd6281f7 Reformatting. 2017-08-23 20:09:02 +01:00
Achim D. Brucker 4c5f8889d2 Refactoring. 2017-08-23 20:04:52 +01:00
Achim D. Brucker cd217f57a6 Integrated JavaScript decomposition analysis. 2017-08-23 19:42:00 +01:00
Achim D. Brucker 5d89e28486 Cleanup. 2017-08-23 19:17:35 +01:00
Achim D. Brucker 123623b111 Minor code cleanup. 2017-08-23 17:36:41 +01:00
Achim D. Brucker 3208a6e58a Initial import of JavaScript decomposition framework. 2017-08-23 17:22:58 +01:00
Michael Herzberg 68e7e72e93 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-08-09 13:06:42 +01:00
Michael Herzberg 40f800b4de Check if 'annotations' exists in search results. 2017-08-09 13:06:22 +01:00
Achim D. Brucker d3da686e16 Changed formular for computing download delay. 2017-08-05 10:57:45 +01:00
Michael Herzberg c61f19e860 Use INSERT OR IGNORE. 2017-07-31 23:55:21 +01:00
Michael Herzberg b8f57196c7 Changed fts table structure. 2017-07-31 23:23:57 +01:00
Michael Herzberg b34d45c4dc Added md5sum to sqlite. 2017-07-31 20:38:21 +01:00
Achim D. Brucker 35c133e395 Slightly more aggressive implementation of google_dos_protection. 2017-07-30 14:41:20 +01:00
Achim D. Brucker 5268f2a732 Refactoring: clean-up of imports and a few other minor improvements. 2017-07-29 16:13:39 +01:00
Achim D. Brucker eb0054b47d Refactoring: Moved default configuration to config module. 2017-07-29 12:36:20 +01:00
Achim D. Brucker 0b24fb15fe Refactoring. 2017-07-29 11:32:06 +01:00
Achim D. Brucker 0ca3476b09 Slightly more aggressive implementation of google_dos_protection. 2017-07-29 11:21:15 +01:00
Achim D. Brucker d05ca9678e Refactoring. 2017-07-29 10:57:35 +01:00
Achim D. Brucker ac663299b3 Refactoring. 2017-07-29 10:17:16 +01:00
Achim D. Brucker 10cce2859d Renamed variable/attribute pk to public_key. 2017-07-29 09:15:22 +01:00
Achim D. Brucker 333bcaa62d Strip path from crx file. 2017-07-29 09:12:01 +01:00
Achim D. Brucker e5d671c7c4 Refactoring. 2017-07-29 09:05:16 +01:00
Michael Herzberg 11604c0fa5 Collect jsfilesize instead of jsloc. 2017-07-26 12:05:54 +01:00
Achim D. Brucker 73eedab07d Log time delta for each extension upate. 2017-07-26 07:29:48 +01:00
Michael Herzberg b3d1ab912e Wait a maximum of 10min before stopping jsbeautifier. 2017-07-25 22:57:55 +01:00
Michael Herzberg 072e008fe2 Run the garbage collector manually after using jsbeautify. 2017-07-19 17:25:21 +01:00
Michael Herzberg 186f6162af Fixed NoneType str conversion exception. 2017-07-17 15:29:00 +01:00
Michael Herzberg 9b1e5db96f Check for attributes key first and use traceback module instead of printing str(e). 2017-07-17 14:00:39 +01:00
Michael Herzberg eded1ca893 Only attempt to search for replies when we actually have search parameters. 2017-07-16 20:14:50 +01:00
Michael Herzberg 26bddde328 Removed primary keys from fts tables as that had no effect. 2017-07-12 18:30:37 +01:00
Michael Herzberg 6a6a12c88a Added parsing of support to sqlite. 2017-07-12 18:11:31 +01:00
Michael Herzberg 16a44cf499 Added parsing of review replies to sqlite. 2017-07-12 17:56:40 +01:00
Michael Herzberg 0ed8c15a2d Made review a fts table. 2017-07-12 17:04:56 +01:00
Michael Herzberg 51bdcb4f16 Also download replies for support forum. 2017-07-12 16:57:16 +01:00
Michael Herzberg 11b0ccee4a Added download of review replies. 2017-07-12 16:10:47 +01:00
Michael Herzberg d6ae9d28b8 Fixed bug that lead to downloading the first review page twice instead of the first and second review page. 2017-07-12 14:09:01 +01:00
Michael Herzberg 60dd98e60e Fixed parsing of developer from overview page. 2017-07-12 13:54:44 +01:00
Michael Herzberg e60265975f Renamed etag to crx_etag. 2017-07-10 12:46:41 +01:00
Achim D. Brucker c4e13daae5 Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler 2017-07-09 18:55:24 +01:00
Michael Herzberg 77023e001e Do not treat js file decoding strictly. 2017-07-07 22:17:56 +01:00
Michael Herzberg 38c88d7461 Added parsing of content_script_urls. 2017-07-07 20:09:22 +01:00
Michael Herzberg fbc0a7c87c Added crx size and jsloc. 2017-07-07 19:47:14 +01:00
Michael Herzberg 62dc61826a Added parsing of itemcategory. 2017-07-07 19:29:51 +01:00
Michael Herzberg dbe8a26a6b Fixed download parsing. 2017-07-05 16:20:52 +01:00
Michael Herzberg cface0128c Changed download number extraction to also work with Google Docs extensions (and potentially others). 2017-07-05 16:08:15 +01:00
Michael Herzberg 4c01b95f69 Added ratingValue and ratingCount to db. 2017-07-05 14:23:45 +01:00
Achim D. Brucker d5d0a44b69 Reformatting. 2017-07-05 08:21:40 +01:00
Achim D. Brucker 600ec933f4 Introduced optional argument to last_crx - return latest crx that is not newer than the passed date/time. 2017-07-05 08:21:00 +01:00
Achim D. Brucker 30c0b92979 Ignore empty crx files in calculating last crx file date. 2017-07-04 09:30:33 +01:00
Achim D. Brucker 939b29f55a Use getmembers instead of getnames in last_crx(). 2017-07-03 07:04:03 +01:00
Michael Herzberg 6d5221c5d7 Make db path configurable. 2017-06-22 17:46:18 +01:00
Michael Herzberg 6833ba6683 Fixed sqlite creation, added missing commit 2017-06-20 23:47:31 +01:00
Michael Herzberg 4220d48d34 Close db when an exception is thrown. 2017-06-20 23:15:15 +01:00
Achim D. Brucker 8dbd535e3e Merge branch 'master' into production 2017-06-20 20:03:10 +01:00
Achim D. Brucker d9ebe265ae Re-formatting 2017-06-20 18:17:44 +01:00
Achim D. Brucker 05227494d6 Re-formatting 2017-06-20 18:17:36 +01:00
Michael Herzberg dae5d4caa9 Fixed creation of empty .crx files. 2017-06-20 18:05:22 +01:00
Michael Herzberg 7ef8ecf3b1 Relax json parsing of manifest. 2017-06-20 17:45:13 +01:00
Michael Herzberg 1cfe1bdab9 Also ignore /* style comments in manifests. 2017-06-20 15:22:09 +01:00
Michael Herzberg 437a00d256 Don't print warning when crx status is 404. 2017-06-20 15:07:44 +01:00
Michael Herzberg 69cdcd7174 Remove JavaScript-style comments from manifest before parsing. 2017-06-20 11:22:54 +01:00
Michael Herzberg b6bf280d1e Fixed error. 2017-06-20 08:49:01 +01:00
Michael Herzberg 3496e89460 Fixed error. 2017-06-20 08:43:43 +01:00
Michael Herzberg aa259807e2 Catch exceptions due to empty crx header file. 2017-06-20 08:42:30 +01:00
Michael Herzberg c47ba57c97 Changed handing of manifest parsing exceptions. 2017-06-20 08:28:50 +01:00
Michael Herzberg d2dd2aaf81 Moved db path into config file. 2017-06-20 08:10:28 +01:00
Michael Herzberg 39d7bf0330 Deal with missing annotation block in reviews. 2017-06-20 08:03:15 +01:00
Michael Herzberg fa82129c2b Deal with a possibly missing overview.html.status file. 2017-06-19 21:34:42 +01:00
Michael Herzberg c1cd41c2e1 Improved logging. 2017-06-19 18:41:29 +01:00
Michael Herzberg 282e2c4e8c Worked on sqlite stuff. 2017-06-19 16:42:35 +01:00
Achim D. Brucker 456bf292c8 Merge branch 'master' into production 2017-06-19 05:57:29 +01:00
Achim D. Brucker be293b1ba8 Fixed import. 2017-06-19 05:57:17 +01:00
Achim D. Brucker f95619670c Merge branch 'master' into production 2017-06-18 15:38:13 +01:00
Achim D. Brucker d9195c8174 Max. number of concurrent download can now be configured via command line. 2017-06-18 15:36:21 +01:00
Achim D. Brucker 66eff6780d Fixed passign is_new. 2017-06-17 18:26:04 +01:00
Achim D. Brucker 85c8f6a546 Pass is_new flag to sqlite update. 2017-06-17 18:19:44 +01:00
Achim D. Brucker 2e6323c8c5 Report number of extensions for which the SQL database was updated. 2017-06-17 18:15:08 +01:00
Michael Herzberg 7f24a9da7a Split db creation into incremental part and separate full regeneration script. 2017-06-17 17:10:18 +01:00
Achim D. Brucker ea71c5b6e3 Removed debugging code raising execptions. 2017-06-17 15:38:17 +01:00
Achim D. Brucker 8fcc7ab99f Fixed logging. 2017-06-17 00:48:34 +01:00
Achim D. Brucker 97460c498f Basic support for logging of errors related to SQL import/update. 2017-06-17 00:43:40 +01:00
Achim D. Brucker c4a5c5a231 Ignore non extensions ids in forums.conf. 2017-06-16 23:32:52 +01:00
Achim D. Brucker 86a608c6a1 Re-formatting. 2017-06-16 23:19:13 +01:00
Achim D. Brucker 1c8d68d495 Moved path utility functions into config module. 2017-06-16 23:09:23 +01:00
Achim D. Brucker 9f174f6785 Downport to python 3.5. 2017-06-16 22:38:48 +01:00
Michael Herzberg 6e2772711f Next version of sqlite generator. 2017-06-16 20:40:48 +01:00
Michael Herzberg c08124fa17 First version of sqlite generator. 2017-06-16 14:56:23 +01:00
Achim D. Brucker ab4c0ad002 Fixed logging. 2017-06-16 12:07:51 +01:00
Achim D. Brucker b9e5ca6f82 Stub for updating sqlite. 2017-06-16 11:06:04 +01:00
Achim D. Brucker 73baef61b2 Re-formatting. 2017-06-16 10:29:07 +01:00
Achim D. Brucker 64778c783e Check etags in addition to modified-since (basic implementation). 2017-06-16 10:28:47 +01:00
Achim D. Brucker 763ac137b2 Re-introduced parallel download (they are not causing the If-Modified-Since problem). 2017-06-15 23:14:07 +01:00
Achim D. Brucker 3ff67bddc8 Disabled parallel download (IF-Modified Bug Hunting). 2017-06-14 16:56:05 +01:00
Achim D. Brucker fd717d1516 Reformatting. 2017-05-27 20:39:00 +01:00
Achim D. Brucker e164b7fe72 Reformatting. 2017-05-27 20:38:56 +01:00
Achim D. Brucker b4e680ccef Disabled local backup. 2017-05-20 21:09:03 +01:00
Achim D. Brucker 02eed607b4 Re-added wait before downloading first forum/support page - old setup was too aggressive. 2017-04-14 19:56:39 +01:00
Achim D. Brucker a547e840cb Normalized formatting. 2017-04-13 09:34:33 +01:00
Achim D. Brucker 0ee655f018 Interlaving forum and support download with crx download and removed inital wait before downloading first forum/support page. 2017-04-12 08:59:08 +01:00
Achim D. Brucker 0f88b5d8c5 Reduced pool size to reduce load. 2017-04-11 07:44:19 +01:00
Achim D. Brucker 63bf936e1e Avoid use of ReadTarFS (seems to be very slow compared to the tarfile module). 2017-04-10 07:08:52 +01:00
Achim D. Brucker 23fd7731bc Reduced waiting time for DOS protection. 2017-04-09 08:20:52 +01:00
Achim D. Brucker e65c21a3b1 Increased pool size to 48. 2017-04-08 00:12:40 +01:00
Achim D. Brucker c03fa789bd Fixed reporting of new extensions. 2017-04-07 07:07:49 +01:00
Achim D. Brucker c75cdc9d09 Switched to proper append mode for tar archive. 2017-04-06 21:42:59 +01:00
Achim D. Brucker a06d218073 Re-Formatting 2017-04-05 21:42:00 +01:00
Achim D. Brucker 86ed713b9d Updated last_crx call in update_extension to match new archive layout. 2017-04-04 22:51:45 +01:00
Achim D. Brucker 8cfd0d3bb2 Fixed folder confusion during tar file creation. 2017-04-03 21:32:53 +01:00
Achim D. Brucker 90d58705c4 Re-formatting. 2017-04-02 09:23:13 +01:00
Achim D. Brucker 308c0d8721 Improved log output. 2017-04-01 06:57:39 +01:00
Achim D. Brucker 5a8b76ce14 Move tar creation to (local) tmp dir. 2017-03-31 17:04:51 +01:00
Achim D. Brucker e72da5b3ff Catching even more file i/o-exceptions. 2017-03-28 06:38:12 +01:00
Achim D. Brucker 660ebcf34d Sync before executing large file operations. 2017-03-27 22:00:06 +01:00
Achim D. Brucker 00b59d8429 Ignore exceptions during backup creation. 2017-03-26 21:30:07 +01:00
Achim D. Brucker fe49f56959 Re-formatting. 2017-03-25 06:31:48 +00:00
Achim D. Brucker e6e8914004 Fixed call of write_text (missing argument to interface changes). 2017-03-24 06:19:40 +00:00
Achim D. Brucker 581f2ef10d Catch exception in case of corrupt tar file and rename tar file as mitigation. 2017-03-23 20:32:51 +00:00
Achim D. Brucker 26a9c7e8f1 Increased pool size to 24 to mitigitate longer downloads due to tar/untar-operations. 2017-03-22 05:44:09 +00:00
Achim D. Brucker 7007ddebb6 Ensure that dir is defined even if an exception is thrown. 2017-03-21 20:47:28 +00:00
Achim D. Brucker ad47feae61 Moved crx analysis in own module. 2017-03-18 09:21:18 +00:00
Achim D. Brucker 74992388e0 Re-formatting. 2017-03-16 08:30:33 +00:00
Achim D. Brucker 5007e60c71 First working version that creates tar archives. 2017-03-13 07:02:17 +00:00
Achim D. Brucker 6892a614a6 Converted get_existing_ids to new tar-based file layout. 2017-03-01 12:48:29 +00:00
Achim D. Brucker ad53411d55 Code reformatting. 2017-02-05 14:11:09 +00:00
Achim D. Brucker b1666aade0 Fixed datetime imports. 2017-02-03 09:45:51 +00:00
Achim D. Brucker f6a7a6466c Catch exception during discovery. 2017-02-01 07:32:52 +00:00
Achim D. Brucker 58f0e2cb1f Fixed non initialised variable in exception handling. 2017-01-31 22:35:05 +00:00
Achim D. Brucker 1802350dbb Added timeout for requests. 2017-01-31 09:52:21 +00:00
Achim D. Brucker c73b1e5ac2 Compute new extensions by checking if directory already exists. 2017-01-29 13:29:46 +00:00
Achim D. Brucker 828c6c95da Reformatting. 2017-01-28 17:22:28 +00:00
Achim D. Brucker 111c5394c3 Increased DDOS protection delay. 2017-01-28 17:21:25 +00:00
Achim D. Brucker f41a2f67b1 Only try to download crx if response status is 200. 2017-01-28 17:20:17 +00:00
Achim D. Brucker 18a3519bc5 Log messages are now (kind of) thread safe. 2017-01-28 17:08:18 +00:00
Achim D. Brucker 0083fd8aab Basic multithreading support. 2017-01-28 16:37:44 +00:00
Achim D. Brucker a301796a9b Ensure that new extensions are indeed unkown/new. 2017-01-28 15:27:31 +00:00
Achim D. Brucker 644419ed5d Fixed crx download validation. 2017-01-28 15:05:56 +00:00
Achim D. Brucker 2dbf90adc5 Flush output. 2017-01-28 14:49:29 +00:00
Achim D. Brucker 3cdeba20b4 Reformatting. 2017-01-28 13:15:05 +00:00
Achim D. Brucker 3ed43f036d Refactoring. 2017-01-28 13:12:47 +00:00
Achim D. Brucker 23e1147370 Refactoring. 2017-01-28 13:03:40 +00:00
Achim D. Brucker e8f01eae55 Refactoring. 2017-01-28 12:56:29 +00:00
Achim D. Brucker 9c4ba39558 Refactoring. 2017-01-28 12:52:18 +00:00
Achim D. Brucker bd53b117e5 Refactoring. 2017-01-27 22:40:07 +00:00
Achim D. Brucker 3dc7a5d663 Added comment on skipping language specific extension URLs. 2017-01-22 15:53:00 +00:00
Achim D. Brucker 6b38e2c738 Reformatting. 2017-01-21 18:47:44 +00:00
Achim D. Brucker b4756b3350 Modulified parse_sitemap. 2017-01-21 12:51:55 +00:00
Achim D. Brucker fac837d016 Module setup. 2017-01-20 23:02:56 +00:00