home about news mgmt

About Mixed Results: Samples of Regression Analyses On CPAN Testers Reports

What this is about

The perl programming language has a huge repository of reusable code known as the CPAN. CPAN has a strong tradition of providing self-testing facilities with every software package it contains. There are dedicated serverfarms running around the clock to produce fresh test results of many different platforms in many different environments. These get delivered to the cpantesters.

On this page we are watching the results of all these tests and calculate statistical regressions to help the users of CPAN spot patterns where things are starting to fail.

Howto use

# distro links pass fail uploaded third fail RT/comment high correlations
... ... ... ... ... ... ... ... ...
16 TONYC/​Imager-​0.71_02 matrix 64 7 2009-12-01 09:23 2009-12-02 18:06 qr:(Can't locate \S+pm)
... ... ... ... ... ... ... ... ...

The main page consists of a large table of CPAN distributions that had at least 3 PASSes and at least 3 FAILs and has been uploaded within the last 16 years. This limitation is an entirely arbitrary compromise between incoming tests, number of distros with mixed results and computing power needed to calculate all the regressions. It may change any time when some of these parameters change.

The list is sorted by upload date, descending. Of any distribution only the latest upload can be included in the list. If Javascript is enabled in your browser you can sort this table by clicking on the table header in the column you want to sort.

These are the columns on the main page:

just a counter
Path to the distro consisting of author's id, maybe some path below their directory, and filename of the distro. This is usually a link to a page with details about the calculated results. If it isn't a link, then the statistical regression has not yet been calculated but will be calculated in the next hours.
Links to the CPAN testers matrix (which provides a comprehensive descriptive table of passes and fails by perl version and OS)
pass and fail
The number of pass reports and fail reports respectively. Other report types are ignored for the sake of this investigation. Note that only distributions are listed that have at least three passes and three fails.
Upload date and time in UTC
third fail
Date and time in UTC when the third tester reported a fail. Sorting by this column is one of the most interesting views on the table because it gathers all recently introduced problems on the top.
This column is maintained manually and is for this reason often outdated: it contains a comment or a link to an RT or other bugtracker ticket that has something to do with the diagnosed problem.
high correlations
This is an interesting part: a comma separated list of parameters whose values have a correlation to the test result. The CPAN testers reporting tools provide a lot of independent parameters that might influence the result. Whenever there is a correlation between one of these parameters and the result, this parameter is listed in this column. Of course, as always with statistical findings, such correlations may be trivial artefacts (e.g. Heteroskedascity) that provide little to no information, so be careful not to draw wrong conclusions. See below for hints about which insights have been gained so far.

The distro column contains links to separate pages per distribution providing results of the regressions analyses. These are sorted by R-squared in descending order. R-squared is a measurement for the quality of the statistical correlation. It's always between zero and one. The higher the R-squared the stronger the impact of the referenced variable on the outcome.

Collection of Regressions

Imager-0.71_02 (matrix, searchtools, parsed reports, metacpan.org)

Pass/Fail at calctime:64/7
Top regressions:1.000 qr:(Can't locate \S+pm)

Regression#1 qr:(Can't locate \S+pm)
... ... ... ...
Regression#2 conf:bincompat5005
... ... ... ...
Regression#3 conf:ccdlflags
... ... ... ...

The individual CPAN distribution's page of results of the many individual regressions starts with a small listing of metadata like who the author and what the exact filename is and links to other sites dealing with CPAN.

Below that a rather long list of individual results of regression calculations follows. These are sorted by their goodness of fit, i.e. the first few results are likely to be the only interesting ones. But since statistics needs to be interpreted to be useful we present them all.

Every regression has a header denoting the name of the independent variable. These names are documented in the CPAN::Testers::ParseReport module and should in most cases be self explaining. So the variable with the name conf:bincompat5005 denotes the config variable bincompat5005 of the perl interpreters that were running these tests. And qr:(Can't locate \S+pm) denotes a string found in the test output when matching it agaist the regular expression (Can't locate \S+pm).

When you click of any such header you are taken to the page described below under Links to input data.

Interpretation of correlations

Regression#1 qr:(Can't locate \S+pm)
[0='const'] 1.0000 0.0000 37707732907022898844.00
[1='eq_Can't locate Kwiki/Plugin.pm'] -1.0000 0.0000 -32758385024668659982.00
[2='eq_Can't locate Spoon/Plugin.pm'] -1.0000 0.0000 -7256856581466726773.00
R^2= 1.000, N= 107, K= 3

You can read more about statistical regression analysis but you are allowed to bypass the theory and focus on two things: the term R2 (pronounced R squared) is the overall measure of the goodness of fit of the estimates. It is between 0 and 1 where 1 is a good fit. The T-stat values measure each estimation if its slope differs significantly from 0. It is between −infinity and +infinity.

In the detailed page of all regression tables for a particular distribution you may find a table like the one on the left.

First a few notes about what you see here: The title of the table denotes the name of the variable that is inspected by this regression test, in this case qr:(Can't locate \S+pm). At the bottom you find the (R-squared; R^2) described above, the N denotes the number of observations we had and the K denotes the number of lines (actual values) we see in the middle part. The middle part lists the influencing factors in this test: in the first line a constant part [0='const'], below it the actually observed strings. Each line has the Theta value that can be interpreted as the direction and slope of the influence, the StdErr that gives us a measurement of the variance, and the T-stat which provides the significance of this theta (being different from zero).

We have chosen to color-code the Theta column: negative values are reddish and positive values are greenish, signifying that positive values indicate an influence towards a PASS and negative ones to a FAIL. Both green and red are getting paler the lower the R-squared is to indicate that interpreting values with low R-squared should be avoided. A weak influence is only then a weak influence when the goodness of fit confirms it. Otherwise you just can't tell. Some reason makes this influence insignificant.

The example regression to the left is the most trivial case: whenever the message Can't locate Kwiki/Plugin.pm has been seen in the test output then the test result was a FAIL. Same thing for Spoon/Plugin.pm. The author has most probably forgotten to declare a dependency on the two modules Kwiki::Plugin and Spoon::Plugin. Not necessarily, of course. It may be that the author has forgotten the dependency on Spoon::Plugin and that the dependency on Kwiki::Plugin is an indirect dependency. Or it may be some completely remote connection between some other relevant variable and this kind of observed error message.

Usually the Can't locate \S+\.pm messages are the most reliable indicators of the real reason of a fail. Another good candidate are mod:* variables like the following:

Regression#2 mod:Test::More
[0='const'] 1.0000 0.0000 67664785050021653200.00
[1='eq_0.72'] -1.0000 0.0000 -14765666636887643371.00
[2='eq_0.80'] -1.0000 0.0000 -10567464028645793135.00
[3='eq_0.86'] -1.0000 0.0000 -47228831801881750440.00
[4='eq_0.88'] 0.0000 0.0000 6.01
[5='eq_0.89_01'] 0.0000 0.0000 2.54
[6='eq_0.92'] -0.0000 0.0000 -5.29
[7='eq_0.94'] -0.0000 0.0000 -4.58
R^2= 1.000, N= 178, K= 8

Here we have a case where the author seems to be using a feature of Test::More that was introduced in version 0.88. Older versions have only FAILs, 0.88 and up have only PASSes.

But such mod:* correlations can be misleading and at times completely bogus when the number of test results is low and the number of releases of a represented module is relatively high. Then (and not only then) it may happen that only one tester farm has this version installed and this testers farm is broken for some reason and the real reason for the fail has nothing to do with this version of that module. This is why you're required to reproduce FAILs and PASSes on your own hardware and compare with the other correlations and draw your conclusions carefully before writing a ticket on RT.

Regression#3 mod:XML::ExtOn
[0='const'] 0.0000 0.0000 17.27
[1='eq_0.11'] 1.0000 0.0000 143147779654612836944.00
R^2= 1.000, N= 26, K= 2

One annoyance of regression analysis is the fact that every calculation presented is always relative to a reference group. See for example this result:

We can recognize that having installed version 0.11 of XML::ExtOn helps to get a PASS. But we cannot see, compared to which other version. This is something you have to test yourself before drawing conclusions. This is why you sometimes will want to click on the header of the table to zoom in into the actual test inputs as described in the next section.

Links to input data


idstate mod:XML::ExtOn
6231877 UNKNOWN 0.11
6172961 UNKNOWN 0.11
6161798 PASS 0.11
6155043 PASS 0.11
6095573 PASS 0.11
... ... ...
4955988 FAIL 0.09
4942282 FAIL 0.09
4931627 FAIL 0.09
... ... ...

If you click on the header of an individual regression result you follow a link to this table. It shows a summary of the input data collected and each line in turn has a link to the original cpantester report containing the referenced value. You can choose from the drop-down menu which columns you want to have summarized. Note that you can select multiple columns in this drop down simultaneously. If Javascript is enabled in your browser you can sort this table by clicking on the table header in the column you want to sort.

If you want to collect statistics you're missing here, you may want to check out CPAN::Testers::ParseReport.

How can I help?

Write me your bugreports at the CPAN-Testers-ParseReport Tracker. Become a CPAN tester. Provide more and diverse data. When you find a pass to fail ratio of 9:5 you should not expect too much and rather try to improve the data than draw conclusions. Be aware that the regression analysis often is exposing covariances that explain nothing. Healthy results typically appear with higher numbers of PASS and FAIL. When the column high correlations shows many different candidate fields then it is also very likely that many testers had identical configurations and the results are not really telling much.

Watch your own perl installations and see where you can try to diversify the data and provide tests with them. If the data suggest that perl 5.8.4 is failing, try to find a 5.8.4 on your disk and see if you can get it working. If you find your own results in the sample try to reproduce them. Too often we have seen random test results. And try to produce the other result by variation of parameters you can influence. Anything that makes the data less uniform automatically helps the statistical methods to bring the most interesting factors to the top.

And when you identify the culprit don't hesitate to report the bug to RT (or whichever bug tracking has been chosen for the distro) so that others do not waste their time with the same distro.

Source code

The sources of all the programs involved here are in my git repo of cpan and related tools git://repo.or.cz/andk-cpan-tools.git . The main script is cnntp-solver.pl. It is maintaining the databases and producing regressions. The catalyst app with the name CPAN-Blame produces the pages.

Andreas König, 2015-03-31

This site is gratefully hosted on a Dedicated Server, sponsored by www.webfusion.co.uk.