-- depth, breadth, think of other stuff
- Direct dep count
- Transitive dep count
- Dependency depth -- for each node of dep tree, max/avg/median/whatever depth
- Dependency breadth -- for each node of tree, number of direct deps
- Transitive dep LOC count -- Find total (tokei) LoC for each language for all deps
Okay... The thing is that transitive dependencies are a lot harder than they look. Here's the full story:
The SIMPLE view is that each crate has a tree of dependencies. Except that it doesn't, cause there may be dependencies in common, so it's a DAG. So considering all of crates.io we have a REALLY BIG DAG, and for each node we have to count the number of reachable nodes from it. This might not be doable in less than
O(m*n), over m nodes and n edges, but brute forcing it isn't too hard.
But we have a problem. Deps don't specify a crate, they specify a version constraint, that is, a set of crate versions. And this set of crate versions may grow. So there is not ONE DAG of dependencies for each crate version, there are MANY potential DAG's. Rummaging through these many potential DAG's and trying to find one that Works is essentially what Cargo is for.
SO, we should use cargo for this -- or rather, the
cargo_metadatacrate. It appears that this in fact just calls
cargo metadatadirectly and parses the output, so, great. It also appears to be heavily used and well-maintained. Also great.
Basic transitive and direct dep counts should now work. LOC isn't there yet but all the data to produce it is.
Reverse deps are harder but the framework is there.
We also have the framework for #20 as of 133:cae9a787a977 , all the metadata is generated and parsed but most of it is then thrown away. Just need to store it and present it in a useful fashion.
Getting the metadata is heckin' slow. Perhaps part of it is 'cause it invokes cargo to do it, and that ends up blocking on the global package index a lot 'cause multiple instances of cargo are trying to share it? Possibly; it's definitely not CPU bound.
It also appears to have issues with some crates, such as rustlex, rocket-contrib and rmp-serde. I think they're either part of workspaces or have an otherwise-unorthodox crate setup which confuses the paths.
The package index for cargo can be set with the env var
CARGO_HOMEit seems. Need to figure out how to make rayon set a different value for it in each thread and see if that helps.
Previous note was fixed, there's a
-Zoption to cargo that makes it not update the index, which makes it go about 10x faster. Our index is always updated at the start of a run anyway, so.
New things to ponder: The
cargo_metadatadependency analysis is more complete than that provided by
cargo_index. Do we just want to use that for everything?
Discovery: I think the
dependencysection gives data on what deps are SPECIFIED, and the
resolvesection gives data on what it actually FINDS for them. Abstract dep tree vs. concrete one.