Auto Docs¶
dupedetector: Detects duplicate files, hopefully quickly
-
dupedetector.cli()¶ Hook for setuptools entrypoint. Parses the args and feeds them to main.
-
dupedetector.filter_lol(lol, cb)¶ Filters a list of lists (lol) by applying a callback to each element of the inner lists to generate a key.
That key is then used to assign each candidate to a filtered list of lists, and only lists which contain > 1 entry are returned.
-
dupedetector.get_parser()¶ Generates the CLI Parser
-
dupedetector.hash_end_sample(fp, samplesize=1000000, chunksize=256)¶ Hash the last $samplesize bytes of a file
-
dupedetector.hash_first_sample(fp, samplesize=1000000, chunksize=256)¶ Hash $samplesize bytes from the start of a file
-
dupedetector.hash_middle_sample(fp, samplesize=1000000, chunksize=256)¶ Hash $samplesize bytes from around a files middle
-
dupedetector.main(args)¶ Do the stuff
-
dupedetector.md5(fp, offset=0, samplesize=None, chunksize=256)¶ Specialized md5 function. Just returns a hexdigest.
Reads $chunksize bytes from the file at $fp at a time.
If $samplesize is not None, only read that many bytes then stop, even if we haven’t hit the end of the file.
-
dupedetector.rscan(path)¶ Given a path, return all files from there down. Inclusive.