Download Statistical learning and optimization for functional MRI data mining

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Statistical learning and optimization
for functional MRI data mining
Alexandre Gramfort
alexandre.gramfort@telecom-paristech.fr
Assistant Professor
LTCI, Télécom ParisTech, Université Paris-Saclay
Macaron Workshop - INRIA Grenoble - 2017
http://www.youtube.com/watch?v=h1Gu1YSoDaY
http://www.youtube.com/watch?v=nsjDnYxJ0bo
Outline
•
Background
•
Estimating the hemodynamic response
function [Pedregosa et al. Neuroimage 2015]
•
Mapping the visual pathways with
computational models and fMRI [Eickenberg
et al. Neuroimage 2016]
•
Optimal transport barycenter for group
studies [Gramfort et al. IPMI 2015]
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
4
Functional MRI
Neurons
Oxy. Hb
t+k
Deoxy. Hb
t
…
Time
Scanner
Magnetic
resonance
imaging
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
5
≈ 1 image / 2s
http://www.youtube.com/watch?v=uhCF-zlk0jY
courtesy of Gael Varoquaux
fMRI supervised learning (decoding)
y
X
Image,
sound, task
Scanning
Decoding
m
i
t
s
Any variable:
healthy?
fMRI volume
Challenge: Predict a behavioral variable from the fMRI data
Objective: Predict y given X or learn a function f : X -> y
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
7
?'(#>(+?%4(#)7+&(@
?'(#>(+?%4(#)7+&(@
Classification example with fMRI
<=+))8>8&+?85*#@#748*&87=()
67+&(#5>#?'(#A4+8*#>(+?%4()#BC%=?8D+48+?(E
The objective is to be able
L8D(*#+#?4+8*8*I#H+?+#)(?#@#7+84)
!"#$$%&'()*'+)#,-.
5>#B>(+?%4()-#=+A(=E-#(/'",#?'(#
FG4(H8&?#%*)((*#B?()?E#8C+I(@######54
to predict
&'+4+&?(48)?8&#5>#(+&'#&+?(I54,#8*#
F<5C7+4(#74(H8&?(H#=+A(=#J8?'#?4%(#?+4I(?
?'(#>(+?%4(#)7+&(@
given an fMRI activation map
!"#$$%&'()*'+)#,-.
!"#$$%&'()*'+)#,-.
F.*#?'8)#&+)(#74(H8&?(H#M#?4%(#
FG4(H8&?#%*)((*#B?()?E#8C+I(@######54
FG4(H8&?#%*)((*#B?()?E#8C+I(@######54
F<5C7+4(#74(H8&?(H#=+A(=#J8?'#?4%(#?+4I(?
F<5C7+4(#74(H8&?(H#=+A(=#J8?'#?4%(#?+4I(?
Patient vs.
Controls
FN(7(+?#>54#+==#)+C7=()
!"#$$%&'()*'+)#,-.
Faces
vs.
F.*#?'8)#&+)(#74(H8&?(H#M#?4%(#
F.*#?'8)#&+)(#74(H8&?(H#M#?4%(#
Houses
FOD(4+I(
FG4(H8&?#%*)((*#B?()?E#8C+I(@######54
F<5C7+4(#74(H8&?(H#=+A(=#J8?'#?4%(#?+4I(?
...
vs.
...
F.*#?'8)#&+)(#74(H8&?(H#M#?4%(# !
FN(7(+?#>54#+==#)+C7=()
FN(7(+?#>54#+==#)+C7=()
. 0123(%45678*###############################3(%45678*-#9:#;+*"#/:9:#
1
vs.
-1
/
KFOD(4+I(
FOD(4+I(
FN(7(+?#>54#+==#)+C7=()
FOD(4+I(
!
5678*###############################3(%45678*-#9:#;+*"#/:9:#
+,-#./0123(%45678*###############################3(%45678*-#9:#;+*"#/:9:#
ie. y = { 1, 1}
objective: Predict y = { 1, 1} given
x2R
!"#$%&'()*+,-#./0123(%45678*###############################3(%45678*-#9:#;+*"#/:9:#
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
!
p!
8
fMRI supervised learning (Encoding)
X
y
Image,
sound, task
m
i
t
s
Scanning
Encoding
fMRI volume
Challenge: Predict the BOLD response from the stimuli descriptors
Objective: Predict y given X or learn a function f : X -> y
A. Gramfort
[Thirion et al. 06, Kay et al. 08, Naselaris et al. 11, Nishimoto et al. 2011,
Schoenmakers et al. 13 ...]
Statistical Learning and optimization for functional MRI data mining
9
Learning the hemodynamic response
function (HRF) for encoding and
decoding models
thanks to
Fabian Pedregosa
Michael Eickenberg
Data-driven HRF estimation for encoding and decoding models, Fabian Pedregosa, Michael
Eickenberg, Philippe Ciuciu, Bertrand Thirion and Alexandre Gramfort, Neuroimage 2015
PDF: https://hal.inria.fr/hal-00952554/en
Code: https://pypi.python.org/pypi/hrf_estimation
fMRI paradigm and HRF
HRF: Hemodynamic
response function
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
11
fMRI paradigm and HRF
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
12
General Linear Model (GLM)
y
Observed
BOLD
=
A. Gramfort
Design Matrix
Activation
coefficients
Noise
+
Statistical Learning and optimization for functional MRI data mining
13
Basis constrained HRF
Hemodynamic response function (HRF) is known to vary
substantially across subjects, brain regions and age.
D. Handwerker et al., “Variation of BOLD hemodynamic responses across subjects and brain
regions and their effects on statistical analyses.,” Neuroimage 2004.
S. Badillo et al., “Group-level impacts of within- and between-subject hemodynamic variability in
fMRI,” Neuroimage 2013.
Two basis-constrained models of the HRF: FIR and 3HRF
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
14
Rank1-GLM
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
15
Rank1-GLM
From 1 HRF per condition
From 1 HRF shared between all conditions
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
16
Rank1-GLM
Assuming 1 HRF shared between all conditions and a different
amplitude/scale per condition this leads to:
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
17
Rank1-GLM
argminh,
ky
Xvec(h
T
)k2
subject to khk1= 1 and hh, href i > 0
=) solved locally using quasi-Newton methods
Challenge: This optimization problem is not big yet it needs to be done tens
of thousands of time (typically 30,000 to 50,000 times for each voxel)
Remark: Worked better than alternated optimization or 1st order methods
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
18
Results
Cross-validation score in two different datasets
S.Tom et al., “The neural basis of loss aversion in decision-making under risk,” Science 2007.
K. N. Kay et al., “Identifying natural images from human brain activity.,” Nature 2008.
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
19
Results
Measure: voxel-wise encoding score. Correlation with
the BOLD at each voxel on left-out data.
R1-GLM (FIR basis) improves voxel-wise encoding score on more than
98% of the voxels.
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
20
Results
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
21
Results
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
22
Convolutional Networks Map
the Architecture of the
Human Visual System
work of Michael Eickenberg
joint work with Bertrand Thirion and Gaël Varoquaux
“Seeing it all: Convolutional network layers map the function of the human visual system”
Michael Eickenberg, Alexandre Gramfort, Gaël Varoquaux, Bertrand Thirion (submitted)
Convolutional Nets for Computer Vision
[Krizhevski et al, 2012]
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
24
Relating biological and computer vision
Cat V1
orientation selectivity
ConvNet Layer 1
[Hubel & Wiesel, 1959]
[Sermanet 2013]
Low
Level
● V1 functionality comprises edge detection
● Convolutional nets learn edge detectors, color boundary detectors
and blob detectors
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
25
Can we use computer vision
models and a large fMRI data to
better understand human vision?
Approach
+
[Kay et al, 2008]
Nonlinear Feature Extraction
Via
Convolutional Net Layers
Voxel-Wise Prediction
Using Linear Model
(Ridge Regression)
Forward Model Setup:
● Encoding model [Naselaris et al., 2011]
● Make sure complexity resides in feature extraction
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
27
Convolutional Net Forward Models
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
28
Convolutional Net Forward Models
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
29
Best Predicting Layers per Voxel
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
30
Score Fingerprints per Region of Interest
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
31
Score Fingerprints per Region of Interest
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
32
Score Fingerprints per Region of Interest
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
33
Score Fingerprints per Region of Interest
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
34
Fingerprints summary statistic
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
35
Fingerprints summary statistic
Photos
A. Gramfort
Videos
Statistical Learning and optimization for functional MRI data mining
36
Synthesizing Brain activation maps
If our model is strong enough, we can use it to
reproduce known experiments
Generate BOLD response, do GLM analysis
New
stimuli
A. Gramfort
Convolutional net forward model
Activation
Maps
Statistical Learning and optimization for functional MRI data mining
37
High-level Validation: Faces / Places
Convolutional Net Forward Model
Activation Maps
GLM Contrast Maps
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
38
Faces vs Places: Ground Truth
Stimuli from [Kay 2008]
Close-up faces and scenes
A. Gramfort
Contrast of
stimuli from [Kay 2008]
Close-up faces and scenes
Statistical Learning and optimization for functional MRI data mining
39
Faces vs Places
Simulation on
[Kay 2008] Left out stimuli
A. Gramfort
BOLD ground truth
Statistical Learning and optimization for functional MRI data mining
40
Fast Optimal Transport Averaging of
Neuroimaging Data
Joint work with:
Gabriel Peyré
Marco Cuturi
[Fast Optimal Transport Averaging of Neuroimaging Data
Alexandre Gramfort, Gabriel Peyré, Marco Cuturi, Proc. IPMI 2015]
The overall goal
What is an “average
activation”?
Functional
neuroimaging
experiment
20 subjects
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
42
with Magnetoencephalography (MEG)
From sensors to
sources at every ms
for each subject
Dipoles
dSPM
LCMV
V1
V2d
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
43
Motivation
Motivation
1.2
1
0.8
0.6
0.4
0.2
0
−0.2
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
4 points in R2
x1 , x2 , x3 , x4
13.1.15 a 2D flat brain with 4 activations…
Imagine
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
2
44
Motivation
Mean
1.2
1
0.8
0.6
0.4
0.2
0
−0.2
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Their mean is (x1 + x2 + x3 + x4) /4.
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
45
Motivation
Computing Means
Consider for each point the function ∥· −
A. Gramfort
2
x i ∥2
Statistical Learning and optimization for functional MRI data mining
46
Motivation
Computing
Means
The mean is the
A. Gramfort
1
argmin 4
!4
i=1∥·
−
2
x i ∥2 .
Statistical Learning and optimization for functional MRI data mining
47
Motivation
Now if the domain
is not flat: you have
a ground metric
Means in Metric Spaces
A. Gramfort
Means in Metric Spaces
Statistical Learning and optimization for functional MRI data
!mining
48
From points to probability measures
to Probability Measures
1.2
1
0.8
0.6
0.4
0.2
0
−0.2
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
Assume that each datum is now an empirical measure.
What could be the mean of these 4 measures?
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
49
From points to probability measures
Should preserve the uncertainty
& take into account the metric
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
50
Problem formulation
Problem of interest
Given a discrepancy function ∆ between probabilities,
!
compute their mean: argmin i ∆(·, νi)
• The idea is useful, sometimes tractable & appears in
◦ Bregman clustering for histograms [Banerjee’05]..
◦ TopicIf modeling
& al.’03]..
Remark:
discrepancy[Blei
is a squared
Riemanian distance it’s a
Fréchet
mean. problems (k-means).
◦ Clustering
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
51
Optimal Transport
The Optimal Transport Approach
ν
µ
x
y
D(x, y)
(Ω, D)
Optimal Transport distances rely on 2 key concepts:
• A metric D : Ω × Ω → R+ ;
• Π(µ, ν): joint probabilities with marginals µ, ν.
A. Gramfort
13.1.15
Statistical Learning and optimization for functional MRI data mining
52
20
Example
of
joint
probabilities
Joint Probabilities of (µ, ν)
...on the real line
µ(x)
P
ν(y)
0.5
0
−2
−1
3
0
1
x
2
0.4
y
0.2
1
2
0
3
4
−1
5−2
5
4P (x, y)
0.6
0
Π(µ, ν) = probability measures on Ω
with marginals µ and ν.
A. Gramfort
2
Statistical Learning and optimization for functional MRI data mining
53
Example
of
joint
probabilities
Joint Probabilities of (µ, ν)
...on the real line
µ(x)
P
ν(y)
0.5
0
−2
−1
3
0
1
x
2
0.4
y
0.2
1
2
0
3
4
−1
5−2
5
P
(x,
y)
4
0.6
0
Π(µ, ν) = probability measures on Ω2
with marginals µ and ν.
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
54
Optimal
Transport
Optimal
Transport
Distance
ν
µ
x
y
D(x, y)
(Ω, D)
p-Wasserstein distance for p ≥ 1 is:
!
#1/p
" "
p
Wp(µ, ν) =
inf
.
D(x, y) dP (x, y)
P ∈Π(µ,ν)
Ω×Ω
[Monge-Kantorovich, Kantorovich-Rubinstein,Wasserstein, Earth Mover’s Distance, Mallows ...]
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
55
(in
which
case
one
can
easily
scretized
ODF
used for di↵usion
imaging
data
[10].
For
p
ws
U
(a,
b)
(known
as
the
transportation
polytop
(Ω, D)
!
(Ω, D)
t).
The
p-Wasserstein
distance
ν=
b δ ! th
νthe
=
b δ root of the
tein
distance
W
(a,
b)
between
a
and
b
is
p
p
selow)
the set
of dto⇥the
d nonnegative
matrices such that t
is
equal
optimum
ar program, known as a transportation problem [4, §7.2]. A t
2equal
p
n×m n×m
s
are
a
and
b
respectively:
p to
Transportation
Polytope
misona dW
variables,
(µ,
ν)
can
be
cast
as
a
linear
program
in
R
can be cast
a linear program
in R: cost:
network
flowν)problem
on aas bipartite
graph with
p Wp (µ,
distance matrix M raised element-wise to the power p), and
p def
def
d⇥d
T
def
p
n×m
in
hT,
M
i,
(2)
p
n×m
ows U1.(a,Mb)XY
(known
as
the
transportation
polytope
of
a
and
b
=[D(x
,
y
)
]
∈
R
(metric
information)
U
(a,
b)
=
{T
2
R
|
T
1
=
a,
T
1
=
i
j
ij
1.
M
=[D(x
,
y
)
]
∈
R
(metric
information)
d
d
i
j
ij
XY
polytope of r, c
(a,b) • U (r, c) is the transportation+
s the set of d ⇥ d nonnegative matrices such that their row and
Transportation
Polytope
(joint
probabilities)
pare2.
ls
equal
to
a
and
b
respectively:
2.
Transportation
Polytope
(joint
probabilities)
n×m
T naturally has
Me constraints
.
induced
by
a
and
b,
one
U (r, c) = {P ∈ R
| P 1m = r, P 1n = c}
Optimal Transport in dimension d
m
j =1
j yj m
j =1
j yj
+
n×m
T 1 (in
def
n×m
T b}whic
6= |b|1 Uand
non-empty
when
|a|
=
|b|
d⇥d
T
1
U
(a,
b)
=
{P
∈
R
a,
P
1
=
|
P
1
=
U
(a,
b)
=
{P
∈
R
a,
1
=
b}
P
1
=
m|a,
nP
(a,
b)
=
{T
2
R
|
T
1
=
T
1
=
b}.
+
m
n
d
d
+
+
If the total masses Tof a and b
at
the
matrix
ab
/|a|
belongs
to
that
set).
The
p1
de constraints
above is not
useful
because
Toy
Example
n
=
m
=
3
p
induced
by
a
and
b,
one
naturally
has
that
U
(a,
b)
Example:
raised
to the
power
p that
(written
Wp (a, b) below) is e
⎡
⎤
m
have
been
proposed
in
when
|a|1 = |b|1 (in which case one
c
1 6= |b|1 and non-empty
2
.1
0
.1
( )*
'( ) problem
metric
Optimal
Transport
(OT)
on
d
v
pleteness.
In
the
viT computer
⎢ belongs
⎥ to that.2set). .4
at the matrix ab /|a|
The p-Wasserstein
1
13.1.15
30
13.1.15
1 0 .1 ∈ U
.2 , .1
T
P
=
.1
⎣
⎦
e raised
by: (i)
equalityW p (a, b)
.6 below)
.5 is equal to the
to relaxing
the powerthe
p.2(written
p
.1
.3
T p
def 2
p
Tametric
1d  a,
T
1

b
in
EquaOptimal
Transport
(OT) b,
problem
variables,
Wp d(a,
b) = OT(a,
M ) on
= d min
hT, M
he total mass of the solution
T 2U (a,b)
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
56
U (a, b) = {T 2 R+
| T 1d = a, T 1d = b}.
Optimal
Transport
dimension
the
constraints induced
by a and b, onein
naturally
has that U (a, b)d
is em
(Ω, D)
ν=
!m
j =1
bj δyj
|a|1 6= |b|1 and non-empty when |a|1 = |b|1 (in which case one can e
p abT /|a| belongs to that set). The p-Wasserstein
n×m
that the matrix
Wp (µ, ν) can
be cast as a linear program in R dist
:
1
b) raised to the power p (written Wpp (a, b) below) is equal to the opti
2
arametric
Optimal
Transport
(OT)
problem
on
d
variables,
def
p
n×m
Optimal transport problem reads:
1. MXY =[D(xi, yj ) ]ij ∈ R
Wpp (a, b)
p
def
= OT(a, b, M ) =
(metric information)
min hT, M p i,
2. Transportation PolytopeT (joint
2U (a,b) probabilities)
d X
d
X
p
p
p the transport plan
is
T
T
M
meterizedhT,
byMthei =
marginals
a,
b
and
matrix
M
.
n×m
T
ij
ij
U (a, b) = {P ∈ R
| P 1 = a, P 1 = b}
i=1 j=1
+
m
n
al Transport
for
Unnormalized
Measures.
If
the
total
masses
of
a
a
Problem: No solution if
d|b|1 , the definition provided above is not useful bec
namely |a|113.1.15
6=X
) = ;. Several
of1 the OT problem have been proposed in
|a|1 = extensions
|ai | =
6 |b|
g; we recall them
i=1 here for the sake of completeness. In the compute
terature, [23] proposed to handle that case by: (i) relaxing the equ
T
aints of U (a, b) to inequality
constraints
T
1

a,
T
1d  b in E
Need to add and remove mass
d
1);A. Gramfort
(ii) adding Statistical
an equality
constraint
onfunctional
the total
mass
Learning and
optimization for
MRI data
mining of the
57 solu
+
mal
Transport
Averaging
of
Neuroimaging
Data
se distance D(i, !) = D(!, i) to any
5
caling
of
all
vectors
by
that
constant.
We
define
and
non-normalized
data
t non-negative
point ! as a bu↵er
when
comparing
d
S
,
where
S
=
{u
2
R
,
|u|

1}.
d
d
1
+
an
Unnormalized
Measures
al ofofKantorovich’s
formulation
in the
Add a virtual point ! whose distance to element i in ⌦
d
to
a
classic
optimal
transport
problem,
Distances on Sd ). Let
2 R+ such that i
1 the smoothing
N
!)
= D(!, i)
=measures
ias
sing
approach
of
[6]
{bLet
, · ·p· , b 0.} For
of D(i,
Ntwo
non-negative
elements a and b of on
Sd ,(⌦,
weD)
j
j
j
j
analysis
in
the
next
section,
we
only
by
1,
namely
N
vectors
in
S
.
Let
=
1
b
be
b
,
1

j

n
|b
|

Scale
each
observation
so
that
d
1
1 |b|1 d0. Their p-Kantorovich distance raised 1
s) agoal
2 Rin
that their
Our
this section
is tototal
find,mass
given a vector of
+ such
[a,
|a|
1]
Map
each
to
(Kind
of feature map)
a
1
nstant.
This
assumption
alleviates
the
onent p, a vector a inSd which minimizes the sum of
p not require toj treat separately the
⇤oes
Kp p to all the b ,
M
d+1⇥d+1
Use),ascomparing
metric M̂ a,
,when
M̂
where
= b 2 TRd . Note
2 R+
.
(3)
also
1
+0
N collections of data.
dealing with finite
h
i
X
⇥
⇤
j
1
u
j
p
b
d
ts
metric
b )all
argminproperties
OT(ofa1Wasserstein
, M̂ ). (P1)
nt
to=considering
vectors
in|u|R1+ ,suchj distances:
N j=1
u2S
d
all associates
vector
[a; 1 We
|a|1define
] 2 ⌃d+1 can be
vectors by athat
constant.
d
ehich
Sd =
{ustandard
2 R+ , |u|Wasserstein
the
distance
using
M̂
a huge linear
program
1  1}. … but
between
Kantorovich
for points
in Sd 58
and
A. Gramfort
Statistical Learning anddistances
optimization for functional
MRI data mining
n
[7], which
canEarth
be easily
modified
to incorporate
constraints
on
ponds
to the
Mover’s
Distance
[23], and
use a high
ty
will
prove
useful
in
the
next
section.
opic regularization of order 1/ , which has also the merit
strategy of [7] is to regularize directly the optimal transport pro
1) strongly convex. is set in our experiments
to
100/m
[Cuturi
NIPS
2013]
opic
penalty,
whose
weight
is
parameterized
by
a
parameter
>
Idea: Regularize cost with entropy
Smoothing to speed things up
n(M ) is the median of all pairwise distances {Mij }ij .
p
def
OT (a, b, M ) =
p
min hT, M i
1
H(T ),
2 (p-Kantorovich Means
with Constrained Mass)
T 2U (a,b)
0. A Kantorovich mean with a target mass ⇢  1 of
Strongly
convex
with
unique T
minimum
1 ) stands
N for
H(T
the
entropy
of
the
matrix
element
of t
b , · · · , b } def
in SdPis the unique vectorseen
a inasSan
such
that:
d
2
sizeProblem
d , H(T reads:
) =
ij tij log(tij ). As shown by [7], the regularize
1 X
oblem OT admits a unique
optimal solution.
j
pAs such, OT (a
a
2
argmin
OT
(a,
b
,
M̂
).
↵erentiable function of a N
whose gradient can be recovered thro
a2Sd
j
n of the corresponding
smoothed dual optimal transport. Withou
|a|1 =⇢
urther on this approach, we propose to simply replace all expressio
In practice: solved with an exponentiated gradient with
optimal transport
problem OT in our
formulations
by theirth
sm
nanprojection
Algorithm
1
an
implementation
of
[7,
Alg.1].
Unlike
in the dual (matrix-matrix computations and
part
OT
.
ider a fixed step-length exponentiated gradient descent,
element wise multiplications which are GPGPU friendly)
alization
step.
We
set
the
default
mass
of
the
barycente
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
59
BA45
MT
n = 100 d = 10242
Results fMRI
•
•
•
20 subjects
Left hand button press
[Pinel et al. 2007]
Averaging of standardized effect size
Sharp activation foci & less amplitude reduction
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
61
Results MEG
subjects
[Henson et al. 2011]
• •2016subjects
handpresentation
button pressof faces and scrambled faces
• •LeftVisual
Averaging
dSPM source
estimates
of of
standardized
effect
size
• •Averaging
V1
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
62
Results MEG
•
Contrast between faces and scrambled faces
With Tesla K40 GPU card (< a minute of computation)
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
63
“Philosophical” Conclusion
•
The world of neuroimaging is full of challenging maths and
computer science problems ...
•
… look at the data to find the relevant ones
•
… but don’t be scared if they are not well posed
"An approximate answer to the right problem is worth a good deal more
than an exact answer to an approximate problem. ~ John Tukey"
A. Gramfort
Statistical Learning and optimization for functional MRI data mining
64
Some refs:
Fabian Pedregosa, Michael Eickenberg, Philippe Ciuciu, Bertrand Thirion and Alexandre
Gramfort, Data-driven HRF estimation for encoding and decoding models, Neuroimage 2015
Michael Eickenberg, Alexandre Gramfort, Gaël Varoquaux, Bertrand Thirion, Seeing it all:
Convolutional network layers map the function of the human visual system, Neuroimage 2016
Alexandre Gramfort, Gabriel Peyré, Marco Cuturi, Fast Optimal Transport Averaging of
Neuroimaging Data, Proc. IPMI 2015
1 position to work on scikit-learn available !
Contact
http://alexandre.gramfort.net
GitHub : @agramfort
Twitter : @agramfort