ferroopia.blogg.se - Scipy weighted standard deviation

#Scipy weighted standard deviation full

IMO both defaults can be justified in stats. I wouldn't change it in statsmodels to something that satisfies a "pure" definition by default.īut I'm indifferent to the default in scipy.stats. The default is just defined for the most common use cases in statistics. It's still a mad Python function that allows us to compute scaled and unscaled mad. It is easy to specify 1 for unscaled, but I wouldn't want to have to find the number for "normal" each time I use it. In statsmodels and R you have to specify a number. I like the iqr way of specifying "normal" "Although practicality beats purity." Zen 9 Stata doesn't seem to have it, user provided mad use scaled I cannot figure out from the SAS documentation what they are doing, the basic mad function is "raw"Įdit new SAS docs: MAD computes both and raw by default If you want to make them consistent, I wonder if it wouldn't be better to just deprecate the entire function in favor of a differently-named function with a better API. Making them consistent at this point, even with a warning cycle, seems just too confusing. So the change would be to make the scale argument in median_absolute_deviation() be consistent with iqr()Īpart from changing the default value of scale to 'raw', which I am in favor of, making scale in median_absolute_deviation consistent with iqr is even more of a breaking change since, currently, iqr is divided by its scale value and median_absolute_deviation is multiplied by its scale value. Scipy's default scale=1.4826 value is a poor approximation to the exact value 1/c = 1/(3/4.) that uses, again making it inconsistent with statsmodels.The current scipy implementation of median_absolute_deviation is multiplied by its scale parameter, making it inconsistent with, which is divided by its c parameter.A poor API choice in statsmodels is not a good reason for scipy to replicate that bad design.Scipy has a much larger user base than statsmodels, and users unfamiliar with statsmodels are not going to expect the MAD to be multiplied by a magic number by default.I disagree with this justification for the current scipy API for the following reasons: Or who are familiar with the statsmodels implementation. I'd rather have a name like med_abs_dev() Making this change would have different defaults from statsmodels and might confuse other users who are using the method as a robust scale estimate.

#Scipy weighted standard deviation full

This would also give us a chance to change the name, I regret the full spelling as it is a lot of keystrokes. This would remove the ambiguity and confusion on the part of the end user. Given the consideration of 1 and 2, perhaps an alternative is to make the scale argument a required rather than an optional value and not have a default? At least that is what we usually do for deprecations. I think the usual approach is to issue a warning for 2 or more releases and then make the change.The function is already out in 1.3.X release so systems which took a dependency on that default would receive a breaking change if there was a switch.Making this change would have different defaults from statsmodels and might confuse other users who are using the method as a robust scale estimate.Which describes the similar functionality. The implementation here is for using a scaling factor for the robust measure of scale for a Gaussian as the default and using the value 1 as you both propose would be for the loss function calculation, or perhaps a robust measure of scale for some other probability distribution where the constant 1 provides a consistent estimator.Īnd the corresponding function in the statsmodels package here: The name is overloaded in statistics (as are many names) to describe the loss function as well as the robust measure of scale which explains the confusion. One of the statsmodels maintainers noted in the issue (linked beneath) that there were other packages using statsmodels as a dependency solely for the mad function, the abbreviation used in that package. The original proposal for the function was for consistency with an existing function in the statsmodels package which is used to provide a robust standard error estimate.