Do analysis on dependencies

-- depth, breadth, think of other stuff

Assigned to
4 years ago
4 years ago
AREA-Analysis TYPE-Feature

~icefox 4 years ago

  • Direct dep count
  • Transitive dep count
  • Dependency depth -- for each node of dep tree, max/avg/median/whatever depth
  • Dependency breadth -- for each node of tree, number of direct deps
  • Transitive dep LOC count -- Find total (tokei) LoC for each language for all deps

~icefox 4 years ago

Okay... The thing is that transitive dependencies are a lot harder than they look. Here's the full story:

The SIMPLE view is that each crate has a tree of dependencies. Except that it doesn't, cause there may be dependencies in common, so it's a DAG. So considering all of crates.io we have a REALLY BIG DAG, and for each node we have to count the number of reachable nodes from it. This might not be doable in less than O(m*n), over m nodes and n edges, but brute forcing it isn't too hard.

But we have a problem. Deps don't specify a crate, they specify a version constraint, that is, a set of crate versions. And this set of crate versions may grow. So there is not ONE DAG of dependencies for each crate version, there are MANY potential DAG's. Rummaging through these many potential DAG's and trying to find one that Works is essentially what Cargo is for.

SO, we should use cargo for this -- or rather, the cargo_metadata crate. It appears that this in fact just calls cargo metadata directly and parses the output, so, great. It also appears to be heavily used and well-maintained. Also great.

~icefox 4 years ago

Basic transitive and direct dep counts should now work. LOC isn't there yet but all the data to produce it is.

Reverse deps are harder but the framework is there.

We also have the framework for #20 as of 133:cae9a787a977 , all the metadata is generated and parsed but most of it is then thrown away. Just need to store it and present it in a useful fashion.

Getting the metadata is heckin' slow. Perhaps part of it is 'cause it invokes cargo to do it, and that ends up blocking on the global package index a lot 'cause multiple instances of cargo are trying to share it? Possibly; it's definitely not CPU bound.

It also appears to have issues with some crates, such as rustlex, rocket-contrib and rmp-serde. I think they're either part of workspaces or have an otherwise-unorthodox crate setup which confuses the paths.

~icefox 4 years ago

The package index for cargo can be set with the env var CARGO_HOME it seems. Need to figure out how to make rayon set a different value for it in each thread and see if that helps.

~icefox 4 years ago

Previous note was fixed, there's a -Z option to cargo that makes it not update the index, which makes it go about 10x faster. Our index is always updated at the start of a run anyway, so.

New things to ponder: The cargo_metadata dependency analysis is more complete than that provided by cargo_index. Do we just want to use that for everything?

~icefox 4 years ago

Discovery: I think the dependency section gives data on what deps are SPECIFIED, and the resolve section gives data on what it actually FINDS for them. Abstract dep tree vs. concrete one.

Register here or Log in to comment, or comment via email.