Uploaded mEstimators.jl #68

Marco-Congedo · 2020-02-26T13:05:22Z

No description provided.

tlienart · 2020-02-26T13:13:59Z

Thanks! this is nice but would benefit from a bit of cleaning up:

the use of emojis in code makes it not super readable
there should be tests (ideally against a reference implementation)
the benchmarks should not be in the code
it should be integrated with the rest of the code

did you want us to take over the PR or do you want to clean it up yourself?

codecov-io · 2020-02-26T13:19:21Z

Codecov Report

Merging #68 into master will decrease coverage by 19.78%.
The diff coverage is 0%.

@@             Coverage Diff             @@
##           master      #68       +/-   ##
===========================================
- Coverage   96.84%   77.06%   -19.79%     
===========================================
  Files           4        5        +1     
  Lines         222      279       +57     
===========================================
  Hits          215      215               
- Misses          7       64       +57

Impacted Files	Coverage Δ
src/mEstimators.jl	`0% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 467c3d8...ce924c9. Read the comment docs.

Marco-Congedo · 2020-02-26T13:35:04Z

Hi

emojis: it depends, the glad face to indicate that the algorithm converged is very readable (and nicer) to me, but i am aware many people don't like them. Whatever it works for you is fine to me.
test: I am not aware of any implementation in Julia and i don't use Python, nor Matlab...
It should be alright as the test i included at the end of the unit with t-student data suggests, but yes, tests are necessary.
benchmark: i sent you the examples, tests and benchmark in the same file only for you to have all is needed to check the code.
Integration: i guess for you it woud be much faster to do.

One thing: Julia's cholesky function is not very efficient for matrices of dimension up to about 400 when BLAS is instructed to use several threads. See here. However, several threads for BLAS may be handy later on in the algorithms. One way around may be to set
BLAS.set_num_threads(1) right before doing the cholesky decomposition and BLAS.set_num_threads(Sys.CPU_THREADS) just after. I didn't do it as i wanted to discuss this with you.

Marco-Congedo · 2020-02-26T14:02:59Z

I have tried the BLAS.set_num_threads idea and the nrtme estimator is almost twice as fast for 20x20 scatter matrices. I have updated the code.

line 95 was `scm = (X' * X) .* inv(n)` instead of `scm = (X * X') .* inv(n)`

Hello, I cleaned up the code some more. Please let me know how you want to proceed.

Marco-Congedo · 2020-02-28T13:24:10Z

Hello,

Does the latest code i uploaded suits you?
How would you like to proceed?

Cheers,
Marco

mateuszbaran · 2020-02-28T13:39:52Z

Hi!

Thanks for the code 👍 . I'll try to find some time this weekend to integrate it with the rest of the library.

Marco-Congedo · 2020-02-28T13:57:18Z

OK, great, thanks.
I will right now commit the latest version, just a little bit cleaner.

tlienart · 2020-02-29T11:46:43Z

I'm a bit swamped at the moment but will hopefully also have a look soon, once @mateuszbaran has had a chance too, in any case, thanks a lot for the contribution!

Marco-Congedo · 2020-02-29T12:07:40Z

With pleasure, and thank you for taking it up and do the integration.
By the way, i have tested the two estimators on a lot of real data and they give meaningful results,
so, most probably they are correct. The classical Tyler's estimator is simple to code and the testing on student-t distributed data is very convincing. I am 99% confident it is correct. I am not that confident for the regularized normalized estimator though, as the code is more complex. Also, please keep in mind that i did not test at all the case of complex data input. Please don't support that for the moment being.

mateuszbaran · 2020-02-29T12:49:39Z

I've started updating your code to our style. How do you interpret these mean distances expressed in dB? Is there a threshold for considering these values "good enough"?

Marco-Congedo · 2020-02-29T12:57:01Z

I have done this with a colleague that is very much familiar with the Tyler's estimator. I will ask him what a reasonable threshold is and get back to you. I will also ask some matlab code for formal testing. I would not include the t-distribution comparison as a test, but as an example for demonstration purposes.

Marco-Congedo · 2020-03-09T18:05:38Z

Hello Mateusz,

sorry for the late reply but my colleague took some time to answer. What they do is to check that the estimator sticks to the Cramér-Rao bound, as shown in this paper. The good thing is that the bound is very easy to compute as it depends only on the matrix size and number of samples (Eq. 17 therein), but agian, this does not make a formal test. Alternatively, i can send you some Matlab code confidentially (it is not mine) for testing my implementation over that one on a fixed matrix. If this is good for you please send me a mail at marco dot congedo at gmail dot com as i don't have your. Cheers!

mateuszbaran · 2020-03-09T18:28:45Z

Thanks for the reply and sorry for not finishing my changes. I'd prefer to avoid looking at some non-public code in this case. I would appreciate though if you could send me computed covariance matrices for a few sample matrices, my e-mail is in the Project.toml file: mateuszbaran89 at gmail dot com.

Marco-Congedo · 2020-03-09T20:25:47Z

Hi Mateusz,

i uploaded in the "test" directory of my brunch three .mat files and a script for performing the tests.
The three .mat files holds each two variables, a data matrix X and the estimator M derived with the benchmark Matlab code. For the three files the data matrix has been filled with random Gaussian entry, with dimension 3x100, 5x100 and 10x100. All tests pass. In order to read the files and use the script you need the MAT.jl package. Let me know if there is anything that is not clear. Cheers!

mateuszbaran · 2020-03-09T20:39:19Z

Thanks again, I'll integrate it later.

Marco-Congedo · 2020-04-04T13:35:43Z

Hello, hope all is good with you. Any news on the integration of the M-estimators?

mateuszbaran · 2020-04-04T15:44:47Z

Hi! Sorry, I constantly get distracted with something else. I'll push it a bit forward today.

…stimation.jl into m-estimators

mateuszbaran · 2020-04-04T16:26:59Z

Weird, when I subtract mean from the matrix X in the function for the Tyler M-estimator I get an error from PosDefManifold:

M-estimators: Error During Test at /home/mateusz/.julia/dev/CovarianceEstimation/test/test_m-estimators.jl:11
  Got exception outside of a @test
  DomainError with -4.440892098500626e-16:
  log will only return a complex result if called with a complex argument. Try log(Complex(x)).
  Stacktrace:
   [1] throw_complex_domainerror(::Symbol, ::Float64) at ./math.jl:33
   [2] log(::Float64) at ./special/log.jl:285
   [3] _broadcast_getindex_evalf at ./broadcast.jl:631 [inlined]
   [4] _broadcast_getindex at ./broadcast.jl:604 [inlined]
   [5] _getindex at ./broadcast.jl:627 [inlined] (repeats 2 times)
   [6] _broadcast_getindex at ./broadcast.jl:603 [inlined]
   [7] getindex at ./broadcast.jl:564 [inlined]
   [8] macro expansion at ./broadcast.jl:910 [inlined]
   [9] macro expansion at ./simdloop.jl:77 [inlined]
   [10] copyto! at ./broadcast.jl:909 [inlined]
   [11] copyto! at ./broadcast.jl:864 [inlined]
   [12] copy at ./broadcast.jl:840 [inlined]
   [13] materialize at ./broadcast.jl:820 [inlined]
   [14] distanceSqr(::Metric, ::Hermitian{Float64,Array{Float64,2}}, ::Hermitian{Float64,Array{Float64,2}}) at /home/mateusz/.julia/packages/PosDefManifold/6TlZk/src/riemannianGeometry.jl:390
   [15] distance(::Metric, ::Hermitian{Float64,Array{Float64,2}}, ::Hermitian{Float64,Array{Float64,2}}) at /home/mateusz/.julia/packages/PosDefManifold/6TlZk/src/riemannianGeometry.jl:468
   [16] macro expansion at /home/mateusz/.julia/dev/CovarianceEstimation/test/test_m-estimators.jl:46 [inlined]
   [17] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/Test/src/Test.jl:1113 [inlined]
   [18] top-level scope at /home/mateusz/.julia/dev/CovarianceEstimation/test/test_m-estimators.jl:15
   [19] include_string(::Module, ::String, ::String) at ./loading.jl:1080
   [20] (::Atom.var"#200#204"{String,String})() at /home/mateusz/.julia/packages/Atom/wlPiw/src/eval.jl:164
   [21] withpath(::Atom.var"#200#204"{String,String}, ::String) at /home/mateusz/.julia/packages/CodeTools/kosGY/src/utils.jl:30
   [22] withpath(::Function, ::String) at /home/mateusz/.julia/packages/Atom/wlPiw/src/eval.jl:9
   [23] #199 at /home/mateusz/.julia/packages/Atom/wlPiw/src/eval.jl:161 [inlined]
   [24] with_logstate(::Atom.var"#199#203"{String,String}, ::Base.CoreLogging.LogState) at ./logging.jl:398
   [25] with_logger at ./logging.jl:505 [inlined]
   [26] #198 at /home/mateusz/.julia/packages/Atom/wlPiw/src/eval.jl:160 [inlined]
   [27] hideprompt(::Atom.var"#198#202"{String,String}) at /home/mateusz/.julia/packages/Atom/wlPiw/src/repl.jl:140
   [28] macro expansion at /home/mateusz/.julia/packages/Media/ItEPc/src/dynamic.jl:24 [inlined]
   [29] evalall(::String, ::Nothing, ::String) at /home/mateusz/.julia/packages/Atom/wlPiw/src/eval.jl:150
   [30] macro expansion at /home/mateusz/.julia/packages/Atom/wlPiw/src/eval.jl:39 [inlined]
   [31] (::Atom.var"#172#173")() at ./task.jl:358

Looks like some problem with numerical accuracy of eigenvalue calculation. Could you check that?

Marco-Congedo · 2020-04-04T17:45:47Z

Hi,
it looks like the error is thrown in test_m-estimators.jl by the distance function of PosDefManifold.jl, which computes the square root of the sum of the log of some eigenvalues,
where the sum is computed by this expression:

if  size(P, 1) <= 80 && T<:Real
                       return  max(0., sum(log.(eigvals(inv(P)*Q)).^2))
else                 return  max(0., sum(log.(eigvals(P, Q)).^2))
end

The error occurs in one of those lines of test_m-estimators.jl:

        dscm[i] = distance(Fisher, Hermitian(trueC), Hermitian(scm))
        dtme[i] = distance(Fisher, Hermitian(trueC), Hermitian(M))
        dnrtme[i] = distance(Fisher, Hermitian(trueC), Hermitian(nrM))

Since just above you are checking positive definitiveness of M and nrM, i guess either trueC or scm are not positive-definite when you subtract the mean.
To circumvent the problem in this case you can use a metric that does not recquire positive-definitiveness of the matrices, e.g.,

dscm[i] = distance(Euclidean, Hermitian(trueC), Hermitian(scm))

or

dscm[i] = distance(Wasserstein, Hermitian(trueC), Hermitian(scm))

However, the formal test i sent in TestTeyler.jl is all we need i think, as this compare the result to a well-established Matlab code. Also, by the way, although i don't have a formal test for the other Tyler's estimate, i have been trying the code a lot with real data and i have meaningful results, so, i'm pretty confident that is correct as well.

mateuszbaran · 2020-04-04T19:00:57Z

I've taken a look at that Tyler's paper: https://projecteuclid.org/download/pdf_1/euclid.aos/1176350263 and it looks like the main algorithm assumes zero mean of the input data and there are some caveats when the mean needs to be estimated (Section 4). I'll have to read it more carefully.

Uploaded mEstimators.jl

a180df6

implemented BLAS.set_num_threads trick

c19838f

Marco-Congedo added 3 commits February 26, 2020 15:08

removed the 💡 symbol

3841863

corrected a bug

a7e719a

line 95 was `scm = (X' * X) .* inv(n)` instead of `scm = (X * X') .* inv(n)`

Cleaned up code

0be3f42

Hello, I cleaned up the code some more. Please let me know how you want to proceed.

Updated mEstimators.jl

ce924c9

mateuszbaran added 2 commits February 29, 2020 13:01

file rename

97b5ea7

corrected benchmarking code

b9cc66c

mateuszbaran added 2 commits February 29, 2020 13:27

splitting the m-estimators.jl file

e4752d2

Updating Tyler M estimator to the library style

4a18693

mateuszbaran and others added 2 commits March 9, 2020 20:58

updating NR Tyler M estimator

4abda4f

Uploaded data and script to perform test on Tayler's M-estimator

b6dfb8e

Weird results when mean is subtracted

8c91c16

Merge branch 'master' of https://github.com/Marco-Congedo/CovarianceE…

fd8360c

…stimation.jl into m-estimators

Uploaded mEstimators.jl #68

Are you sure you want to change the base?

Uploaded mEstimators.jl #68

Uh oh!

Conversation

Marco-Congedo commented Feb 26, 2020

Uh oh!

tlienart commented Feb 26, 2020

Uh oh!

codecov-io commented Feb 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Marco-Congedo commented Feb 26, 2020

Uh oh!

Marco-Congedo commented Feb 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Marco-Congedo commented Feb 28, 2020

Uh oh!

mateuszbaran commented Feb 28, 2020

Uh oh!

Marco-Congedo commented Feb 28, 2020

Uh oh!

tlienart commented Feb 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Marco-Congedo commented Feb 29, 2020

Uh oh!

mateuszbaran commented Feb 29, 2020

Uh oh!

Marco-Congedo commented Feb 29, 2020

Uh oh!

Marco-Congedo commented Mar 9, 2020

Uh oh!

mateuszbaran commented Mar 9, 2020

Uh oh!

Marco-Congedo commented Mar 9, 2020

Uh oh!

mateuszbaran commented Mar 9, 2020

Uh oh!

Marco-Congedo commented Apr 4, 2020

Uh oh!

mateuszbaran commented Apr 4, 2020

Uh oh!

mateuszbaran commented Apr 4, 2020

Uh oh!

Marco-Congedo commented Apr 4, 2020

Uh oh!

mateuszbaran commented Apr 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-io commented Feb 26, 2020 •

edited

Loading

Marco-Congedo commented Feb 26, 2020 •

edited

Loading

tlienart commented Feb 29, 2020 •

edited

Loading