Auto Docs

dupedetector: Detects duplicate files, hopefully quickly

dupedetector.cli()

Hook for setuptools entrypoint. Parses the args and feeds them to main.

dupedetector.filter_lol(lol, cb)

Filters a list of lists (lol) by applying a callback to each element of the inner lists to generate a key.

That key is then used to assign each candidate to a filtered list of lists, and only lists which contain > 1 entry are returned.

dupedetector.get_parser()

Generates the CLI Parser

dupedetector.hash_end_sample(fp, samplesize=1000000, chunksize=256)

Hash the last $samplesize bytes of a file

dupedetector.hash_first_sample(fp, samplesize=1000000, chunksize=256)

Hash $samplesize bytes from the start of a file

dupedetector.hash_middle_sample(fp, samplesize=1000000, chunksize=256)

Hash $samplesize bytes from around a files middle

dupedetector.main(args)

Do the stuff

dupedetector.md5(fp, offset=0, samplesize=None, chunksize=256)

Specialized md5 function. Just returns a hexdigest.

Reads $chunksize bytes from the file at $fp at a time.

If $samplesize is not None, only read that many bytes then stop, even if we haven’t hit the end of the file.

dupedetector.rscan(path)

Given a path, return all files from there down. Inclusive.