Measuring dependency age as a risk metric (libyears and/or average libdays)

David A. Wheeler

Below is an email response about the “libyears” metric, a metric which *might* be useful for measuring age of dependencies. I thought it would be worth pointing out.

I’m adding the Best Practices WG, because this discussion is increasingly including that WG’s interests. Sorry for the cross-post, but I don’t see how else to include all relevant parties.

--- David A. Wheeler


On Nov 6, 2020, at 2:58 PM, Dan Lorenc <dlorenc@...> wrote:

This is really interesting! Thanks for forwarding. A couple thoughts inline.

I’ll note that the “libyear” site abandons the “version number distance” of the 2015 paper and simply measures time instead. Time is much simpler to describe & less arbitrary.

Nice, this does seem simpler. To play devil's advocate here though, what if a dependency doesn't release a new version for a very long time? I guess I could see a combination approach where you would check to see if there is a newer version available, but then use the time between the two releases as the metric. If you're on version foo, and there is no version foo+1, it's not very useful to continue to increment a libyear counter until there's an something actionable the project could do.

Hmm, I see I didn’t explain it clearly enough. Let me try to clarify. The libyears metric totals up all the “dependency freshness” of all dependencies. The 2015 paper says:
“We define 'dependency freshness’ as the difference between the currently used version of a dependency, and the version of a dependency the system would ideally use.”

So libyear doesn’t measure for each dependency “how old the dependency you’re using is”. Instead, it totals “how much older is the dependency you’re using compared to the version you SHOULD be using?”. Let’s say that you depend on package ‘foo’ version 4.0.0, released on 2017-01-1, but the current version of package ‘foo’ is version 5.0.0, released on 2018-01-01. We’ll further assume you should “always use the latest” (obviously there are exceptions). That means that your libyear value caused by this one dependency is exactly 1 year. It will stay 1 year as time marches on until (1) package ‘foo’ releases a new version (which will make the number worse) or (2) you update (which will make the number better).

There is a limitation to this metric if a dependency stops being maintained. If you depend on a current version, but it will never be updated because it’s unmaintained, you’re still using the latest version & thus there’s no libyear increase. As far as libyear is concerned, a 10-year-old dependency that is the current version is just fine.  In reality maybe it is, and maybe it isn’t.  I think different metric(s) will be needed to capture that case, especially since some projects really *don’t* need to change much or ever (leftpad, ASCIItable, isNaN, etc.).

I think that the idea of measuring obsolescence over time has value, though this libyear metric by no means perfect:

1, Libyear penalizes projects which use more dependencies, even if they generally keep up, because they simply have more numbers to add up. I think reporting the average would be better or at least a useful additional metric. For an average, all dependencies that are current would be considered “0 days out of date”. With an average, using “average number of days obsolete” instead of “average number of years obsolete” would probably be better. So I’ll call that alternative “average libdays”. 

I don't think this is necessarily a problem. Penalizing something for having lots of dependencies might be a good idea. See Russ Cox's post on this topic. But depending on how the metric is used, it might not even be a penalty. As a maintainer of many OSS projects, I'd love to be able to see the libyear number, even if it's non-normalized. Maintainers could agree on a target (let's try to stay under 20 libyears), and upgrade dependencies whenever we get close to that. Comparing libyears across projects is going to be challenging, but comparing libyears across releases of a same project would be useful even without normalization.

I agree that it’d be useful to compare on the same project, since that would eliminate many normalization issues.

I can share some experiences from using this today on the CII Best Practices badge software, aka the BadgeApp. So this is news, hot off the press :-) .

I used the tool “libyear-bundler” which measures libyears, and more *importantly*, it reports every package with a nonzero value. It reported that the BadgeApp had a libyears value of 40.7 years. When I sorted the results by age, it quickly became obvious that the oldest out-of-date dependency was “crack”; this is not a direct dependency, but a transitive one that was 4.8 years out-of-date (!). This library is used because it’s a dependency of “webmock”.  Webmock itself is only 0.7 years out-of-date, but because it forces a much older library to be used transitively, it increased the overall age. I updated the library webmock (3.8.3->3.9.4), and that single update reduced the libyears to 33.0 years. This update did add one new library, which also required a license approval (we check licenses & the new library didn’t provide that data in a standard way).

We normally calculate the number of dependencies anyway, and the updated version has 83 direct dependencies & 193 total dependencies. That means we reduced the average libdays from 77 ((40.7/192)*365.25) to 62 ((33.0/193)*365.25). Or if you prefer: average libyears went from 0.21 to 0.17.

Well why isn’t it zero, you ask? Obviously it’s all hideously out of date, right? No, those are unusually *good* numbers:
1. Most language-level library ecosystems only allow *ONE* version of a given package to be in a particular application. In a vast number of cases, updates cannot be performed because of an incompatibility with something else. JavaScript does support multiple versions, as do some system-level package management systems, but many others do not. Of course, supporting multiple versions has its own problems - all too often the older versions NEVER get upgraded because there’s less pressure to do so.
2. This measure doesn’t consider branches. The tools assume that the current version MUST be used, and if there are multiple supported branches, it penalizes you. This is especially painful when there is a *set* of libraries (e.g., we use the current version of the Ruby on Rails 5.2.* branch, which is supported, but the latest version is 6.*, and Rails is a set of libraries not just one).
3. In some cases we intentionally do *NOT* upgrade. E.g., vcr ’s latest version is not open source software, and since it’s just for recording test results we have no need to upgrade to a non-OSS program.
4. We routinely update - though the way we use our tools primarily emphasizes the *direct* dependencies, and clearly the indirect dependency age also matters.

I would expect the same to be true for any nontrivial application.

2. It’s not entirely clear what a “good” value is (for libyears or average libdays). Zero is ideal, but probably unrealistic in non-trivial projects. We could use industry averages as an estimator (similar to how Cox 2015 does it but in a less complex & easier-to-explain fashion). 

This seems like something the best practices group might be interested in. Once it's measured, how do we know if the score is good? What do we recommend people try to stay under?

That’s one reason the average might be better. A project with 900 dependencies, which averages 2 months behind, would have 180 libyears behind. A project with 10 dependencies, each one 10 years behind, would have 100 libyears and appear to be “better”. I’m willing to believe that libyears might be a better quick estimate of effort to update (more dependencies increases the problems of updates), but more skeptical of the raw total as a measure of “goodness” or risk.

On the other hand, “on average your dependencies should be no more than 6 months behind” or “12 months behind” is an entirely reasonable measure that can apply to projects big & small.

There are complications, of course. Sometimes you do *not* want to upgrade (e.g., due to license changes), or you’re using a supported branch so the upgrade argument carries less weight. I suspect that there’s no way to make the number perfect no matter what :-). But if it gives enough insight to help make decisions, that’s a good thing.

--- David A. Wheeler