Achim D. Brucker
|
74da2e9c08
|
Initial simhash integration.
|
2017-11-19 00:36:15 +00:00 |
Achim D. Brucker
|
acfdb9ee50
|
Removed unused function analyse_comment_blocks.
|
2017-11-18 23:21:19 +00:00 |
Achim D. Brucker
|
e3519f012d
|
Reformatting.
|
2017-11-17 16:58:48 +00:00 |
Achim D. Brucker
|
32c08672d9
|
Added log output for failed data decoding.
|
2017-11-16 07:13:55 +00:00 |
Achim D. Brucker
|
3db3435c07
|
Refactoring of heursitic detection stubs.
|
2017-11-15 08:05:40 +00:00 |
Achim D. Brucker
|
c5dce7bcd0
|
Fixed decoding of content (str_data).
|
2017-11-15 07:12:41 +00:00 |
Achim D. Brucker
|
91e6014c6c
|
Moved to single-threaded mode.
|
2017-11-12 14:07:25 +00:00 |
Achim D. Brucker
|
4cb49f2281
|
Merge branch 'production'
|
2017-11-11 21:56:33 +00:00 |
Achim D. Brucker
|
9bd283f35a
|
Fixed use of append.
|
2017-11-10 00:13:06 +00:00 |
Achim D. Brucker
|
7dfbdac670
|
Disabled parallel updates (for debugging a deadlock situation).
|
2017-11-09 23:38:05 +00:00 |
Achim D. Brucker
|
5cc7a92f90
|
Fixed typo.
|
2017-11-09 00:17:09 +00:00 |
Achim D. Brucker
|
ac910bf819
|
Updated python version to 3.6.
|
2017-11-07 20:58:24 +00:00 |
Achim D. Brucker
|
631f461d1f
|
Removed not supported connection_timeout parameter.
|
2017-11-06 06:11:14 +00:00 |
Achim D. Brucker
|
6279bd9909
|
Fixed syntax error.
|
2017-11-05 20:14:12 +00:00 |
Achim D. Brucker
|
15079496cc
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-11-05 00:07:23 +00:00 |
Achim D. Brucker
|
fcab770233
|
Reformatting.
|
2017-11-05 00:07:04 +00:00 |
Achim D. Brucker
|
07a7b346c7
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-11-04 23:14:27 +00:00 |
Achim D. Brucker
|
7ba829c90f
|
Made python 3.6 the default.
|
2017-11-02 18:46:20 +00:00 |
Achim D. Brucker
|
cfc26e62d7
|
Free git object as early as possible.
|
2017-10-22 21:20:49 +01:00 |
Achim D. Brucker
|
0963ea59d3
|
Fixed typo.
|
2017-10-21 20:12:19 +01:00 |
Achim D. Brucker
|
d88e73167d
|
Explicitely free git_obj.
|
2017-10-20 18:59:53 +01:00 |
Achim D. Brucker
|
e4a8075da9
|
Configure timeout and retries for data base connection.
|
2017-10-18 20:19:43 +01:00 |
Achim D. Brucker
|
9f5d8f9b9e
|
Added logs during creation of db connection.
|
2017-10-18 08:35:27 +01:00 |
Achim D. Brucker
|
14da483046
|
Even more logging.
|
2017-10-17 15:17:29 +01:00 |
Achim D. Brucker
|
37ebd510c9
|
Reformatting.
|
2017-10-16 09:47:14 +01:00 |
Achim D. Brucker
|
4ee9c51ef7
|
Reformatting.
|
2017-10-16 09:42:43 +01:00 |
Achim D. Brucker
|
fc33abb7a6
|
Fixed logging.
|
2017-10-16 09:35:53 +01:00 |
Achim D. Brucker
|
8780eb8f2f
|
Added further logging output (info).
|
2017-10-16 05:36:59 +01:00 |
Achim D. Brucker
|
bbfbbed35a
|
Identify ressource/media files using the file library.
|
2017-10-15 15:34:45 +01:00 |
Michael Herzberg
|
afe137ba36
|
Integrated last_crx_etag into last_crx.
|
2017-10-14 19:59:46 +01:00 |
Achim D. Brucker
|
64bc9bd90d
|
Make use of data base with md5 sums optional.
|
2017-10-14 19:17:37 +01:00 |
Michael Herzberg
|
ea800da613
|
Create new thread after 100 extensions.
|
2017-10-13 15:54:29 +01:00 |
Achim D. Brucker
|
03b08db905
|
Bug fix: download all extension in parallel mode.
|
2017-10-13 10:35:51 +01:00 |
Michael Herzberg
|
f51bcfbf46
|
Use con object from db.py.
|
2017-10-12 16:01:45 +01:00 |
Achim D. Brucker
|
d3b7dea4d8
|
Added dectection based on file sizes after stripping white spaces.
|
2017-10-11 20:18:15 +01:00 |
Achim D. Brucker
|
10a80e2861
|
Compute size size after stripping.
|
2017-10-11 20:16:33 +01:00 |
Achim D. Brucker
|
39490ca490
|
Enforce block type to be code if it is not a comment.
|
2017-10-11 10:20:09 +01:00 |
Achim D. Brucker
|
91e0180151
|
Fixed indentation.
|
2017-10-11 09:46:39 +01:00 |
Achim D. Brucker
|
a077d7e8b2
|
Fixed typo.
|
2017-10-11 09:43:54 +01:00 |
Achim D. Brucker
|
dbdaa772dc
|
Fixed typo.
|
2017-10-11 09:41:51 +01:00 |
Achim D. Brucker
|
a4926aed19
|
Only store relative path for library files.
|
2017-10-11 09:22:27 +01:00 |
Achim D. Brucker
|
8dd745f826
|
Classify normalized detection as 'very likely library'.
|
2017-10-11 09:14:22 +01:00 |
Achim D. Brucker
|
ee7ce8b446
|
Report stored library filename of detected libraries.
|
2017-10-11 08:48:20 +01:00 |
Achim D. Brucker
|
8c43fadfdb
|
Basic implementation: check_md5_normalized(...).
|
2017-10-11 00:48:04 +01:00 |
Achim D. Brucker
|
154118cf50
|
Basic implementation: check_md5_decompressed(...).
|
2017-10-11 00:44:15 +01:00 |
Achim D. Brucker
|
c6e5cb8511
|
Basic implementation: md5 checksum based library detection.
|
2017-10-11 00:40:06 +01:00 |
Achim D. Brucker
|
518372c6f2
|
Fixed library/version computation for sub-tasks.
|
2017-10-10 23:02:21 +01:00 |
Achim D. Brucker
|
61010a6a01
|
Bug fix: library identification for multi-task jobs.
|
2017-10-10 22:16:46 +01:00 |
Michael Herzberg
|
63ae8ac4a7
|
Added missing fields for cdnjs and introduced new crxfile and libdet tables.
|
2017-10-10 18:55:28 +01:00 |
Michael Herzberg
|
6632cd0ded
|
Added database update for cdnjs.
|
2017-10-10 15:35:02 +01:00 |
Michael Herzberg
|
048990e8f8
|
Turned dbbackend into a package.
|
2017-10-10 15:10:41 +01:00 |
Michael Herzberg
|
301ad23d4c
|
Use new review etc. table structure.
|
2017-10-09 17:18:01 +01:00 |
Michael Herzberg
|
2b1e55c7ec
|
Fixed import.
|
2017-10-09 13:56:22 +01:00 |
Michael Herzberg
|
300a8c905a
|
Only log last mysql exception as error, rest as warning.
|
2017-10-08 20:57:25 +01:00 |
Achim D. Brucker
|
25c37d83c1
|
Silently correct 'name use count' exception from libmagic (caused by a but in the magic Python module).
|
2017-10-08 15:18:58 +01:00 |
Achim D. Brucker
|
1963a20b69
|
Report starting positions of string literals.
|
2017-10-08 12:03:50 +01:00 |
Michael Herzberg
|
615b8f46a3
|
Fixed mysql caching.
|
2017-10-07 21:01:14 +01:00 |
Michael Herzberg
|
2abc386f48
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-10-06 20:13:16 +01:00 |
Michael Herzberg
|
6372c62336
|
Removed sorting again.
|
2017-10-06 20:13:08 +01:00 |
Achim D. Brucker
|
1ee76d9817
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-10-06 19:36:12 +01:00 |
Achim D. Brucker
|
c1750838f1
|
Added support for tar files.
|
2017-10-06 18:33:35 +01:00 |
Michael Herzberg
|
d6869455a8
|
Sort extension ids before processing.
|
2017-10-06 12:12:49 +01:00 |
Michael Herzberg
|
d05194b9bb
|
Group cached commits for efficiency.
|
2017-10-06 12:08:21 +01:00 |
Michael Herzberg
|
2cb56edd9b
|
Adjusted retries for create-db.
|
2017-10-05 11:14:59 +01:00 |
Michael Herzberg
|
6ba73c2ed9
|
Changed autocommit behaviour.
|
2017-10-04 20:56:47 +01:00 |
Achim D. Brucker
|
e63a13ae09
|
Bug fix: decompression.
|
2017-09-22 08:42:02 +01:00 |
Achim D. Brucker
|
e4245ed1dd
|
Reformatting.
|
2017-09-20 10:03:14 +01:00 |
Achim D. Brucker
|
a63dd53e45
|
Refactoring.
|
2017-09-20 10:02:02 +01:00 |
Achim D. Brucker
|
0cb0a4226d
|
Added option for passing a list with libs to update.
|
2017-09-20 07:57:14 +01:00 |
Michael Herzberg
|
4712e15249
|
Fixed autocommit bug.
|
2017-09-19 17:09:35 +01:00 |
Achim D. Brucker
|
50a7ba8a91
|
Minor refactoring.
|
2017-09-19 10:02:46 +01:00 |
Achim D. Brucker
|
4f84c5626d
|
Minor refactoring.
|
2017-09-19 09:16:32 +01:00 |
Achim D. Brucker
|
061622f588
|
Refactoring: stub of new main analysis method.
|
2017-09-18 09:09:00 +01:00 |
Achim D. Brucker
|
aadbc5aa0c
|
Refactoring: removed unused variables.
|
2017-09-18 00:35:35 +01:00 |
Achim D. Brucker
|
50b91d3a35
|
Renaming jsFilename -> filename.
|
2017-09-18 00:30:55 +01:00 |
Michael Herzberg
|
175ebd53b7
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-17 17:45:17 +01:00 |
Michael Herzberg
|
7277e6f76e
|
Fixed log msg bug.
|
2017-09-17 17:45:01 +01:00 |
Michael Herzberg
|
0cb7d6e792
|
Fixed error in exception handling.
|
2017-09-17 17:40:48 +01:00 |
Achim D. Brucker
|
3626b9fb76
|
Ordered and extended enumeration DetectionType. Order reflects reliability of checks.
|
2017-09-17 13:40:38 +01:00 |
Achim D. Brucker
|
a3346cb95e
|
Use file_identfiers module to compute file identifiers.
|
2017-09-17 13:18:49 +01:00 |
Achim D. Brucker
|
6d69377f28
|
Introduced optional parameter data to compute identifiers without opening a file handle.
|
2017-09-17 13:18:20 +01:00 |
Michael Herzberg
|
1fab393e56
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-16 17:23:16 +01:00 |
Michael Herzberg
|
c3e295267b
|
Log loglevel and only print stacktrace on first mysql exception.
|
2017-09-16 17:22:57 +01:00 |
Achim D. Brucker
|
205c8836e9
|
Bug fix: do not catch exceptions too aggresively and fix libvers computation for updates.
|
2017-09-16 17:20:23 +01:00 |
Achim D. Brucker
|
4cf41e2e4f
|
Refactoring: moved generic file identifiers into own module.
|
2017-09-16 17:19:36 +01:00 |
Achim D. Brucker
|
e98f58fff8
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-16 13:41:56 +01:00 |
Achim D. Brucker
|
24c65daecf
|
Bug fix: check for dirty missed actual function application.
|
2017-09-16 13:41:47 +01:00 |
Achim D. Brucker
|
c274b96f66
|
Added csv output for debugging.
|
2017-09-16 13:21:49 +01:00 |
Michael Herzberg
|
69e95fdf13
|
Catch json parse extensions for reviews etc. more nicely.
|
2017-09-16 12:53:35 +01:00 |
Michael Herzberg
|
58aacef3ff
|
Reopen connection after every exception.
|
2017-09-16 12:31:00 +01:00 |
Michael Herzberg
|
a514c0001e
|
Added check for empty crx files.
|
2017-09-16 12:14:41 +01:00 |
Michael Herzberg
|
b51de8577f
|
Added compression for mysql.
|
2017-09-16 12:04:35 +01:00 |
Achim D. Brucker
|
92e1c4c2e5
|
Skip deleted files.
|
2017-09-16 11:41:21 +01:00 |
Achim D. Brucker
|
082cd2fc65
|
Added hacking pull method that uses the regular git binary. While method will not work well with filenames containg spaces and there mit be other glitches, it allows to pull an update of the cdnjs git reposistory (> 100GB) within a couple of minutes compared to a couple of days that the non hackish solution needs.
|
2017-09-16 11:36:40 +01:00 |
Achim D. Brucker
|
5d3343acf1
|
Refactoring: moved git_repo creation into pull_get_list_changed_files(...).
|
2017-09-16 10:33:11 +01:00 |
Achim D. Brucker
|
7b0e63da10
|
Implemented n/N options for external parallelisation (only for fresh initialization).
|
2017-09-15 22:40:46 +01:00 |
Achim D. Brucker
|
400e74ae3f
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-15 20:21:45 +01:00 |
Achim D. Brucker
|
26678636eb
|
Ignore commits where blobs are None.
|
2017-09-15 20:21:05 +01:00 |
Michael Herzberg
|
85680d360b
|
Automatically reopen database connection on failure.
|
2017-09-15 18:23:25 +01:00 |
Michael Herzberg
|
ddbbc2672d
|
Try to insert also other data if some inserts fail. Use autocommit to prevent data loss on retries.
|
2017-09-15 18:15:03 +01:00 |
Achim D. Brucker
|
936f2d3189
|
Log git info before starting pull (update).
|
2017-09-14 22:54:37 +01:00 |
Achim D. Brucker
|
2ff30f7382
|
Parallel execution of git date queries.
|
2017-09-14 15:11:53 +01:00 |
Achim D. Brucker
|
12a1e282aa
|
The method pull_get_updated_lib_files(...) now also returns unique library/version information.
|
2017-09-14 10:44:30 +01:00 |
Achim D. Brucker
|
e3f1202e44
|
Use version dictionary.
|
2017-09-14 10:33:00 +01:00 |
Achim D. Brucker
|
f54f29c9ba
|
Added build_release_date_dic(...).
|
2017-09-14 09:50:09 +01:00 |
Achim D. Brucker
|
3b217922c5
|
Added line count.
|
2017-09-13 16:41:01 +01:00 |
Achim D. Brucker
|
420eec7462
|
Minor memory optimizations.
|
2017-09-13 11:12:33 +01:00 |
Achim D. Brucker
|
ec1c47625a
|
Added support for parallel update of database.
|
2017-09-13 09:13:35 +01:00 |
Achim D. Brucker
|
c386bd01dd
|
Added missing string conversion.
|
2017-09-13 08:29:23 +01:00 |
Achim D. Brucker
|
42e685ee32
|
Added missing string conversion.
|
2017-09-13 08:01:02 +01:00 |
Achim D. Brucker
|
18fb23d3dc
|
Use glob instead of os.walk() to avoid memory leak in the latter.
|
2017-09-13 04:04:38 +01:00 |
Achim D. Brucker
|
76d5993794
|
Added logging output.
|
2017-09-13 03:02:39 +01:00 |
Achim D. Brucker
|
c30f7fdd7c
|
Implemented skeleton of main routine.
|
2017-09-13 02:56:13 +01:00 |
Achim D. Brucker
|
a8a5534be1
|
Renamed module.
|
2017-09-13 01:13:17 +01:00 |
Achim D. Brucker
|
bdb84c2120
|
Renamed module.
|
2017-09-13 01:09:30 +01:00 |
Achim D. Brucker
|
4e5b52617f
|
Catch exception during decompression and increase max. allowed size of decompressed data to 100 times of compressed size.
|
2017-09-13 00:23:17 +01:00 |
Achim D. Brucker
|
88efe2b8a4
|
Reformatting.
|
2017-09-13 00:02:20 +01:00 |
Achim D. Brucker
|
ea9339bc53
|
Compute data identifiers for uncompressed content of gzip compressed files.
|
2017-09-13 00:01:15 +01:00 |
Achim D. Brucker
|
f9cf7bd35f
|
Refactoring: moved computation of data related identifiers into own method.
|
2017-09-12 23:52:52 +01:00 |
Achim D. Brucker
|
8243664974
|
Use StringIO representation for normalizing js/css files (avoid re-reading the file content from disk).
|
2017-09-12 23:43:09 +01:00 |
Achim D. Brucker
|
933c4d4d11
|
Determine file description from buffer instead from file (avoid reading file twice).
|
2017-09-12 23:23:22 +01:00 |
Achim D. Brucker
|
6353202ee8
|
Renaming: fileinfo -> filedb.
|
2017-09-10 22:59:07 +01:00 |
Achim D. Brucker
|
0426d7d3d1
|
Reformatting.
|
2017-09-10 22:39:47 +01:00 |
Achim D. Brucker
|
e5da9abaea
|
Added get_file_libinfo(...).
|
2017-09-10 22:38:49 +01:00 |
Achim D. Brucker
|
ad2af517a3
|
Agressively try to normalize as many filetypes as possible.
|
2017-09-10 17:40:30 +01:00 |
Achim D. Brucker
|
06ff5f3057
|
Method for computing basic file identifiers.
|
2017-09-10 15:57:07 +01:00 |
Achim D. Brucker
|
a6e90794bc
|
Extended const_basedir to check environment variable EXTENSION_ARCHIVE and modified main scripts to actually use const_basedir.
|
2017-09-10 15:55:22 +01:00 |
Achim D. Brucker
|
4b31097975
|
Added function for computing a list of normalized code blocks for a JavaScript file.
|
2017-09-10 15:02:57 +01:00 |
Achim D. Brucker
|
52b42dfaef
|
Changed pull method to return list of changed files.
|
2017-09-10 11:01:29 +01:00 |
Achim D. Brucker
|
c3053427c0
|
Added method for obtaining initial commit date and pulling git repos.
|
2017-09-09 23:13:26 +01:00 |
Achim D. Brucker
|
8c33558934
|
Reformatting.
|
2017-09-07 20:09:29 +01:00 |
Achim D. Brucker
|
3b2913616b
|
Skip first_seen if not defined.
|
2017-09-05 10:15:48 +01:00 |
Michael Herzberg
|
a9173345e8
|
Merge branch 'master' of logicalhacking.com:BrowserSecurity/ExtensionCrawler
|
2017-09-04 15:54:38 +01:00 |
Michael Herzberg
|
36d36facfe
|
Relaxed mysql retries.
|
2017-09-04 15:54:28 +01:00 |
Achim D. Brucker
|
6395d98443
|
Releaxed handling of network errors.
|
2017-09-04 09:11:27 +01:00 |
Achim D. Brucker
|
cfeb29d95f
|
Clean-up of logging infrastructure.
|
2017-09-03 15:56:27 +01:00 |
Achim D. Brucker
|
f42f8e3d03
|
Improved error handling for request failures.
|
2017-09-03 15:43:33 +01:00 |
Achim D. Brucker
|
872346fa61
|
Add timout parameter to http get requests.
|
2017-09-03 12:03:51 +01:00 |
Achim D. Brucker
|
0b0268e320
|
Copy outphased date to hash map of files archive.
|
2017-09-03 11:13:27 +01:00 |
Achim D. Brucker
|
0f716e98da
|
Bug fix: only try to preserve outphased library information is there is any stored locally.
|
2017-09-03 11:09:39 +01:00 |
Achim D. Brucker
|
80c8e7caa0
|
Preserve outphased library versions.
|
2017-09-03 11:00:05 +01:00 |
Achim D. Brucker
|
03504ff81a
|
Improved error handling.
|
2017-09-03 10:45:56 +01:00 |
Achim D. Brucker
|
13191f1ce0
|
Renaming: date -> first_seen.
|
2017-09-03 10:32:45 +01:00 |
Achim D. Brucker
|
59f9b47a81
|
Switched to Logging framework.
|
2017-09-03 10:29:57 +01:00 |
Achim D. Brucker
|
074447064c
|
Enabled parallel download.
|
2017-09-03 10:06:55 +01:00 |
Achim D. Brucker
|
515a462938
|
Added methods for generating/updating index files based on the file hash.
|
2017-09-02 22:10:43 +01:00 |
Achim D. Brucker
|
9ae5905973
|
Generalized hash map builders.
|
2017-09-02 21:53:58 +01:00 |
Achim D. Brucker
|
22c3a7581d
|
Reformatting.
|
2017-09-02 21:44:20 +01:00 |
Achim D. Brucker
|
3097db3790
|
Added methods for generating sha1 indexed dictionary.
|
2017-09-02 21:40:44 +01:00 |
Achim D. Brucker
|
e5c2372222
|
Improved log output (verbose mode).
|
2017-09-02 20:57:01 +01:00 |