Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistical learning and optimization for functional MRI data mining Alexandre Gramfort alexandre.gramfort@telecom-paristech.fr Assistant Professor LTCI, Télécom ParisTech, Université Paris-Saclay Macaron Workshop - INRIA Grenoble - 2017 http://www.youtube.com/watch?v=h1Gu1YSoDaY http://www.youtube.com/watch?v=nsjDnYxJ0bo Outline • Background • Estimating the hemodynamic response function [Pedregosa et al. Neuroimage 2015] • Mapping the visual pathways with computational models and fMRI [Eickenberg et al. Neuroimage 2016] • Optimal transport barycenter for group studies [Gramfort et al. IPMI 2015] A. Gramfort Statistical Learning and optimization for functional MRI data mining 4 Functional MRI Neurons Oxy. Hb t+k Deoxy. Hb t … Time Scanner Magnetic resonance imaging A. Gramfort Statistical Learning and optimization for functional MRI data mining 5 ≈ 1 image / 2s http://www.youtube.com/watch?v=uhCF-zlk0jY courtesy of Gael Varoquaux fMRI supervised learning (decoding) y X Image, sound, task Scanning Decoding m i t s Any variable: healthy? fMRI volume Challenge: Predict a behavioral variable from the fMRI data Objective: Predict y given X or learn a function f : X -> y A. Gramfort Statistical Learning and optimization for functional MRI data mining 7 ?'(#>(+?%4(#)7+&(@ ?'(#>(+?%4(#)7+&(@ Classification example with fMRI <=+))8>8&+?85*#@#748*&87=() 67+&(#5>#?'(#A4+8*#>(+?%4()#BC%=?8D+48+?(E The objective is to be able L8D(*#+#?4+8*8*I#H+?+#)(?#@#7+84) !"#$$%&'()*'+)#,-. 5>#B>(+?%4()-#=+A(=E-#(/'",#?'(# FG4(H8&?#%*)((*#B?()?E#8C+I(@######54 to predict &'+4+&?(48)?8>#(+&'#&+?(I54,#8*# F<5C7+4(#74(H8&?(H#=+A(=#J8?'#?4%(#?+4I(? ?'(#>(+?%4(#)7+&(@ given an fMRI activation map !"#$$%&'()*'+)#,-. !"#$$%&'()*'+)#,-. F.*#?'8)#&+)(#74(H8&?(H#M#?4%(# FG4(H8&?#%*)((*#B?()?E#8C+I(@######54 FG4(H8&?#%*)((*#B?()?E#8C+I(@######54 F<5C7+4(#74(H8&?(H#=+A(=#J8?'#?4%(#?+4I(? F<5C7+4(#74(H8&?(H#=+A(=#J8?'#?4%(#?+4I(? Patient vs. Controls FN(7(+?#>54#+==#)+C7=() !"#$$%&'()*'+)#,-. Faces vs. F.*#?'8)#&+)(#74(H8&?(H#M#?4%(# F.*#?'8)#&+)(#74(H8&?(H#M#?4%(# Houses FOD(4+I( FG4(H8&?#%*)((*#B?()?E#8C+I(@######54 F<5C7+4(#74(H8&?(H#=+A(=#J8?'#?4%(#?+4I(? ... vs. ... F.*#?'8)#&+)(#74(H8&?(H#M#?4%(# ! FN(7(+?#>54#+==#)+C7=() FN(7(+?#>54#+==#)+C7=() . 0123(%45678*###############################3(%45678*-#9:#;+*"#/:9:# 1 vs. -1 / KFOD(4+I( FOD(4+I( FN(7(+?#>54#+==#)+C7=() FOD(4+I( ! 5678*###############################3(%45678*-#9:#;+*"#/:9:# +,-#./0123(%45678*###############################3(%45678*-#9:#;+*"#/:9:# ie. y = { 1, 1} objective: Predict y = { 1, 1} given x2R !"#$%&'()*+,-#./0123(%45678*###############################3(%45678*-#9:#;+*"#/:9:# A. Gramfort Statistical Learning and optimization for functional MRI data mining ! p! 8 fMRI supervised learning (Encoding) X y Image, sound, task m i t s Scanning Encoding fMRI volume Challenge: Predict the BOLD response from the stimuli descriptors Objective: Predict y given X or learn a function f : X -> y A. Gramfort [Thirion et al. 06, Kay et al. 08, Naselaris et al. 11, Nishimoto et al. 2011, Schoenmakers et al. 13 ...] Statistical Learning and optimization for functional MRI data mining 9 Learning the hemodynamic response function (HRF) for encoding and decoding models thanks to Fabian Pedregosa Michael Eickenberg Data-driven HRF estimation for encoding and decoding models, Fabian Pedregosa, Michael Eickenberg, Philippe Ciuciu, Bertrand Thirion and Alexandre Gramfort, Neuroimage 2015 PDF: https://hal.inria.fr/hal-00952554/en Code: https://pypi.python.org/pypi/hrf_estimation fMRI paradigm and HRF HRF: Hemodynamic response function A. Gramfort Statistical Learning and optimization for functional MRI data mining 11 fMRI paradigm and HRF A. Gramfort Statistical Learning and optimization for functional MRI data mining 12 General Linear Model (GLM) y Observed BOLD = A. Gramfort Design Matrix Activation coefficients Noise + Statistical Learning and optimization for functional MRI data mining 13 Basis constrained HRF Hemodynamic response function (HRF) is known to vary substantially across subjects, brain regions and age. D. Handwerker et al., “Variation of BOLD hemodynamic responses across subjects and brain regions and their effects on statistical analyses.,” Neuroimage 2004. S. Badillo et al., “Group-level impacts of within- and between-subject hemodynamic variability in fMRI,” Neuroimage 2013. Two basis-constrained models of the HRF: FIR and 3HRF A. Gramfort Statistical Learning and optimization for functional MRI data mining 14 Rank1-GLM A. Gramfort Statistical Learning and optimization for functional MRI data mining 15 Rank1-GLM From 1 HRF per condition From 1 HRF shared between all conditions A. Gramfort Statistical Learning and optimization for functional MRI data mining 16 Rank1-GLM Assuming 1 HRF shared between all conditions and a different amplitude/scale per condition this leads to: A. Gramfort Statistical Learning and optimization for functional MRI data mining 17 Rank1-GLM argminh, ky Xvec(h T )k2 subject to khk1= 1 and hh, href i > 0 =) solved locally using quasi-Newton methods Challenge: This optimization problem is not big yet it needs to be done tens of thousands of time (typically 30,000 to 50,000 times for each voxel) Remark: Worked better than alternated optimization or 1st order methods A. Gramfort Statistical Learning and optimization for functional MRI data mining 18 Results Cross-validation score in two different datasets S.Tom et al., “The neural basis of loss aversion in decision-making under risk,” Science 2007. K. N. Kay et al., “Identifying natural images from human brain activity.,” Nature 2008. A. Gramfort Statistical Learning and optimization for functional MRI data mining 19 Results Measure: voxel-wise encoding score. Correlation with the BOLD at each voxel on left-out data. R1-GLM (FIR basis) improves voxel-wise encoding score on more than 98% of the voxels. A. Gramfort Statistical Learning and optimization for functional MRI data mining 20 Results A. Gramfort Statistical Learning and optimization for functional MRI data mining 21 Results A. Gramfort Statistical Learning and optimization for functional MRI data mining 22 Convolutional Networks Map the Architecture of the Human Visual System work of Michael Eickenberg joint work with Bertrand Thirion and Gaël Varoquaux “Seeing it all: Convolutional network layers map the function of the human visual system” Michael Eickenberg, Alexandre Gramfort, Gaël Varoquaux, Bertrand Thirion (submitted) Convolutional Nets for Computer Vision [Krizhevski et al, 2012] A. Gramfort Statistical Learning and optimization for functional MRI data mining 24 Relating biological and computer vision Cat V1 orientation selectivity ConvNet Layer 1 [Hubel & Wiesel, 1959] [Sermanet 2013] Low Level ● V1 functionality comprises edge detection ● Convolutional nets learn edge detectors, color boundary detectors and blob detectors A. Gramfort Statistical Learning and optimization for functional MRI data mining 25 Can we use computer vision models and a large fMRI data to better understand human vision? Approach + [Kay et al, 2008] Nonlinear Feature Extraction Via Convolutional Net Layers Voxel-Wise Prediction Using Linear Model (Ridge Regression) Forward Model Setup: ● Encoding model [Naselaris et al., 2011] ● Make sure complexity resides in feature extraction A. Gramfort Statistical Learning and optimization for functional MRI data mining 27 Convolutional Net Forward Models A. Gramfort Statistical Learning and optimization for functional MRI data mining 28 Convolutional Net Forward Models A. Gramfort Statistical Learning and optimization for functional MRI data mining 29 Best Predicting Layers per Voxel A. Gramfort Statistical Learning and optimization for functional MRI data mining 30 Score Fingerprints per Region of Interest A. Gramfort Statistical Learning and optimization for functional MRI data mining 31 Score Fingerprints per Region of Interest A. Gramfort Statistical Learning and optimization for functional MRI data mining 32 Score Fingerprints per Region of Interest A. Gramfort Statistical Learning and optimization for functional MRI data mining 33 Score Fingerprints per Region of Interest A. Gramfort Statistical Learning and optimization for functional MRI data mining 34 Fingerprints summary statistic A. Gramfort Statistical Learning and optimization for functional MRI data mining 35 Fingerprints summary statistic Photos A. Gramfort Videos Statistical Learning and optimization for functional MRI data mining 36 Synthesizing Brain activation maps If our model is strong enough, we can use it to reproduce known experiments Generate BOLD response, do GLM analysis New stimuli A. Gramfort Convolutional net forward model Activation Maps Statistical Learning and optimization for functional MRI data mining 37 High-level Validation: Faces / Places Convolutional Net Forward Model Activation Maps GLM Contrast Maps A. Gramfort Statistical Learning and optimization for functional MRI data mining 38 Faces vs Places: Ground Truth Stimuli from [Kay 2008] Close-up faces and scenes A. Gramfort Contrast of stimuli from [Kay 2008] Close-up faces and scenes Statistical Learning and optimization for functional MRI data mining 39 Faces vs Places Simulation on [Kay 2008] Left out stimuli A. Gramfort BOLD ground truth Statistical Learning and optimization for functional MRI data mining 40 Fast Optimal Transport Averaging of Neuroimaging Data Joint work with: Gabriel Peyré Marco Cuturi [Fast Optimal Transport Averaging of Neuroimaging Data Alexandre Gramfort, Gabriel Peyré, Marco Cuturi, Proc. IPMI 2015] The overall goal What is an “average activation”? Functional neuroimaging experiment 20 subjects A. Gramfort Statistical Learning and optimization for functional MRI data mining 42 with Magnetoencephalography (MEG) From sensors to sources at every ms for each subject Dipoles dSPM LCMV V1 V2d A. Gramfort Statistical Learning and optimization for functional MRI data mining 43 Motivation Motivation 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 4 points in R2 x1 , x2 , x3 , x4 13.1.15 a 2D flat brain with 4 activations… Imagine A. Gramfort Statistical Learning and optimization for functional MRI data mining 2 44 Motivation Mean 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 Their mean is (x1 + x2 + x3 + x4) /4. A. Gramfort Statistical Learning and optimization for functional MRI data mining 45 Motivation Computing Means Consider for each point the function ∥· − A. Gramfort 2 x i ∥2 Statistical Learning and optimization for functional MRI data mining 46 Motivation Computing Means The mean is the A. Gramfort 1 argmin 4 !4 i=1∥· − 2 x i ∥2 . Statistical Learning and optimization for functional MRI data mining 47 Motivation Now if the domain is not flat: you have a ground metric Means in Metric Spaces A. Gramfort Means in Metric Spaces Statistical Learning and optimization for functional MRI data !mining 48 From points to probability measures to Probability Measures 1.2 1 0.8 0.6 0.4 0.2 0 −0.2 −0.2 0 0.2 0.4 0.6 0.8 1 1.2 Assume that each datum is now an empirical measure. What could be the mean of these 4 measures? A. Gramfort Statistical Learning and optimization for functional MRI data mining 49 From points to probability measures Should preserve the uncertainty & take into account the metric A. Gramfort Statistical Learning and optimization for functional MRI data mining 50 Problem formulation Problem of interest Given a discrepancy function ∆ between probabilities, ! compute their mean: argmin i ∆(·, νi) • The idea is useful, sometimes tractable & appears in ◦ Bregman clustering for histograms [Banerjee’05].. ◦ TopicIf modeling & al.’03].. Remark: discrepancy[Blei is a squared Riemanian distance it’s a Fréchet mean. problems (k-means). ◦ Clustering A. Gramfort Statistical Learning and optimization for functional MRI data mining 51 Optimal Transport The Optimal Transport Approach ν µ x y D(x, y) (Ω, D) Optimal Transport distances rely on 2 key concepts: • A metric D : Ω × Ω → R+ ; • Π(µ, ν): joint probabilities with marginals µ, ν. A. Gramfort 13.1.15 Statistical Learning and optimization for functional MRI data mining 52 20 Example of joint probabilities Joint Probabilities of (µ, ν) ...on the real line µ(x) P ν(y) 0.5 0 −2 −1 3 0 1 x 2 0.4 y 0.2 1 2 0 3 4 −1 5−2 5 4P (x, y) 0.6 0 Π(µ, ν) = probability measures on Ω with marginals µ and ν. A. Gramfort 2 Statistical Learning and optimization for functional MRI data mining 53 Example of joint probabilities Joint Probabilities of (µ, ν) ...on the real line µ(x) P ν(y) 0.5 0 −2 −1 3 0 1 x 2 0.4 y 0.2 1 2 0 3 4 −1 5−2 5 P (x, y) 4 0.6 0 Π(µ, ν) = probability measures on Ω2 with marginals µ and ν. A. Gramfort Statistical Learning and optimization for functional MRI data mining 54 Optimal Transport Optimal Transport Distance ν µ x y D(x, y) (Ω, D) p-Wasserstein distance for p ≥ 1 is: ! #1/p " " p Wp(µ, ν) = inf . D(x, y) dP (x, y) P ∈Π(µ,ν) Ω×Ω [Monge-Kantorovich, Kantorovich-Rubinstein,Wasserstein, Earth Mover’s Distance, Mallows ...] A. Gramfort Statistical Learning and optimization for functional MRI data mining 55 (in which case one can easily scretized ODF used for di↵usion imaging data [10]. For p ws U (a, b) (known as the transportation polytop (Ω, D) ! (Ω, D) t). The p-Wasserstein distance ν= b δ ! th νthe = b δ root of the tein distance W (a, b) between a and b is p p selow) the set of dto⇥the d nonnegative matrices such that t is equal optimum ar program, known as a transportation problem [4, §7.2]. A t 2equal p n×m n×m s are a and b respectively: p to Transportation Polytope misona dW variables, (µ, ν) can be cast as a linear program in R can be cast a linear program in R: cost: network flowν)problem on aas bipartite graph with p Wp (µ, distance matrix M raised element-wise to the power p), and p def def d⇥d T def p n×m in hT, M i, (2) p n×m ows U1.(a,Mb)XY (known as the transportation polytope of a and b =[D(x , y ) ] ∈ R (metric information) U (a, b) = {T 2 R | T 1 = a, T 1 = i j ij 1. M =[D(x , y ) ] ∈ R (metric information) d d i j ij XY polytope of r, c (a,b) • U (r, c) is the transportation+ s the set of d ⇥ d nonnegative matrices such that their row and Transportation Polytope (joint probabilities) pare2. ls equal to a and b respectively: 2. Transportation Polytope (joint probabilities) n×m T naturally has Me constraints . induced by a and b, one U (r, c) = {P ∈ R | P 1m = r, P 1n = c} Optimal Transport in dimension d m j =1 j yj m j =1 j yj + n×m T 1 (in def n×m T b}whic 6= |b|1 Uand non-empty when |a| = |b| d⇥d T 1 U (a, b) = {P ∈ R a, P 1 = | P 1 = U (a, b) = {P ∈ R a, 1 = b} P 1 = m|a, nP (a, b) = {T 2 R | T 1 = T 1 = b}. + m n d d + + If the total masses Tof a and b at the matrix ab /|a| belongs to that set). The p1 de constraints above is not useful because Toy Example n = m = 3 p induced by a and b, one naturally has that U (a, b) Example: raised to the power p that (written Wp (a, b) below) is e ⎡ ⎤ m have been proposed in when |a|1 = |b|1 (in which case one c 1 6= |b|1 and non-empty 2 .1 0 .1 ( )* '( ) problem metric Optimal Transport (OT) on d v pleteness. In the viT computer ⎢ belongs ⎥ to that.2set). .4 at the matrix ab /|a| The p-Wasserstein 1 13.1.15 30 13.1.15 1 0 .1 ∈ U .2 , .1 T P = .1 ⎣ ⎦ e raised by: (i) equalityW p (a, b) .6 below) .5 is equal to the to relaxing the powerthe p.2(written p .1 .3 T p def 2 p Tametric 1d a, T 1 b in EquaOptimal Transport (OT) b, problem variables, Wp d(a, b) = OT(a, M ) on = d min hT, M he total mass of the solution T 2U (a,b) A. Gramfort Statistical Learning and optimization for functional MRI data mining 56 U (a, b) = {T 2 R+ | T 1d = a, T 1d = b}. Optimal Transport dimension the constraints induced by a and b, onein naturally has that U (a, b)d is em (Ω, D) ν= !m j =1 bj δyj |a|1 6= |b|1 and non-empty when |a|1 = |b|1 (in which case one can e p abT /|a| belongs to that set). The p-Wasserstein n×m that the matrix Wp (µ, ν) can be cast as a linear program in R dist : 1 b) raised to the power p (written Wpp (a, b) below) is equal to the opti 2 arametric Optimal Transport (OT) problem on d variables, def p n×m Optimal transport problem reads: 1. MXY =[D(xi, yj ) ]ij ∈ R Wpp (a, b) p def = OT(a, b, M ) = (metric information) min hT, M p i, 2. Transportation PolytopeT (joint 2U (a,b) probabilities) d X d X p p p the transport plan is T T M meterizedhT, byMthei = marginals a, b and matrix M . n×m T ij ij U (a, b) = {P ∈ R | P 1 = a, P 1 = b} i=1 j=1 + m n al Transport for Unnormalized Measures. If the total masses of a a Problem: No solution if d|b|1 , the definition provided above is not useful bec namely |a|113.1.15 6=X ) = ;. Several of1 the OT problem have been proposed in |a|1 = extensions |ai | = 6 |b| g; we recall them i=1 here for the sake of completeness. In the compute terature, [23] proposed to handle that case by: (i) relaxing the equ T aints of U (a, b) to inequality constraints T 1 a, T 1d b in E Need to add and remove mass d 1);A. Gramfort (ii) adding Statistical an equality constraint onfunctional the total mass Learning and optimization for MRI data mining of the 57 solu + mal Transport Averaging of Neuroimaging Data se distance D(i, !) = D(!, i) to any 5 caling of all vectors by that constant. We define and non-normalized data t non-negative point ! as a bu↵er when comparing d S , where S = {u 2 R , |u| 1}. d d 1 + an Unnormalized Measures al ofofKantorovich’s formulation in the Add a virtual point ! whose distance to element i in ⌦ d to a classic optimal transport problem, Distances on Sd ). Let 2 R+ such that i 1 the smoothing N !) = D(!, i) =measures ias sing approach of [6] {bLet , · ·p· , b 0.} For of D(i, Ntwo non-negative elements a and b of on Sd ,(⌦, weD) j j j j analysis in the next section, we only by 1, namely N vectors in S . Let = 1 b be b , 1 j n |b | Scale each observation so that d 1 1 |b|1 d0. Their p-Kantorovich distance raised 1 s) agoal 2 Rin that their Our this section is tototal find,mass given a vector of + such [a, |a| 1] Map each to (Kind of feature map) a 1 nstant. This assumption alleviates the onent p, a vector a inSd which minimizes the sum of p not require toj treat separately the ⇤oes Kp p to all the b , M d+1⇥d+1 Use),ascomparing metric M̂ a, ,when M̂ where = b 2 TRd . Note 2 R+ . (3) also 1 +0 N collections of data. dealing with finite h i X ⇥ ⇤ j 1 u j p b d ts metric b )all argminproperties OT(ofa1Wasserstein , M̂ ). (P1) nt to=considering vectors in|u|R1+ ,suchj distances: N j=1 u2S d all associates vector [a; 1 We |a|1define ] 2 ⌃d+1 can be vectors by athat constant. d ehich Sd = {ustandard 2 R+ , |u|Wasserstein the distance using M̂ a huge linear program 1 1}. … but between Kantorovich for points in Sd 58 and A. Gramfort Statistical Learning anddistances optimization for functional MRI data mining n [7], which canEarth be easily modified to incorporate constraints on ponds to the Mover’s Distance [23], and use a high ty will prove useful in the next section. opic regularization of order 1/ , which has also the merit strategy of [7] is to regularize directly the optimal transport pro 1) strongly convex. is set in our experiments to 100/m [Cuturi NIPS 2013] opic penalty, whose weight is parameterized by a parameter > Idea: Regularize cost with entropy Smoothing to speed things up n(M ) is the median of all pairwise distances {Mij }ij . p def OT (a, b, M ) = p min hT, M i 1 H(T ), 2 (p-Kantorovich Means with Constrained Mass) T 2U (a,b) 0. A Kantorovich mean with a target mass ⇢ 1 of Strongly convex with unique T minimum 1 ) stands N for H(T the entropy of the matrix element of t b , · · · , b } def in SdPis the unique vectorseen a inasSan such that: d 2 sizeProblem d , H(T reads: ) = ij tij log(tij ). As shown by [7], the regularize 1 X oblem OT admits a unique optimal solution. j pAs such, OT (a a 2 argmin OT (a, b , M̂ ). ↵erentiable function of a N whose gradient can be recovered thro a2Sd j n of the corresponding smoothed dual optimal transport. Withou |a|1 =⇢ urther on this approach, we propose to simply replace all expressio In practice: solved with an exponentiated gradient with optimal transport problem OT in our formulations by theirth sm nanprojection Algorithm 1 an implementation of [7, Alg.1]. Unlike in the dual (matrix-matrix computations and part OT . ider a fixed step-length exponentiated gradient descent, element wise multiplications which are GPGPU friendly) alization step. We set the default mass of the barycente A. Gramfort Statistical Learning and optimization for functional MRI data mining 59 BA45 MT n = 100 d = 10242 Results fMRI • • • 20 subjects Left hand button press [Pinel et al. 2007] Averaging of standardized effect size Sharp activation foci & less amplitude reduction A. Gramfort Statistical Learning and optimization for functional MRI data mining 61 Results MEG subjects [Henson et al. 2011] • •2016subjects handpresentation button pressof faces and scrambled faces • •LeftVisual Averaging dSPM source estimates of of standardized effect size • •Averaging V1 A. Gramfort Statistical Learning and optimization for functional MRI data mining 62 Results MEG • Contrast between faces and scrambled faces With Tesla K40 GPU card (< a minute of computation) A. Gramfort Statistical Learning and optimization for functional MRI data mining 63 “Philosophical” Conclusion • The world of neuroimaging is full of challenging maths and computer science problems ... • … look at the data to find the relevant ones • … but don’t be scared if they are not well posed "An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem. ~ John Tukey" A. Gramfort Statistical Learning and optimization for functional MRI data mining 64 Some refs: Fabian Pedregosa, Michael Eickenberg, Philippe Ciuciu, Bertrand Thirion and Alexandre Gramfort, Data-driven HRF estimation for encoding and decoding models, Neuroimage 2015 Michael Eickenberg, Alexandre Gramfort, Gaël Varoquaux, Bertrand Thirion, Seeing it all: Convolutional network layers map the function of the human visual system, Neuroimage 2016 Alexandre Gramfort, Gabriel Peyré, Marco Cuturi, Fast Optimal Transport Averaging of Neuroimaging Data, Proc. IPMI 2015 1 position to work on scikit-learn available ! Contact http://alexandre.gramfort.net GitHub : @agramfort Twitter : @agramfort