Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Compressed Sensing Meets Machine Learning - Classification of Mixture Subspace Models via Sparse Representation Allen Y. Yang
Feb. 25, 2008. UC Berkeley
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
What is Sparsity Sparsity A signal is sparse if most of its coefficients are (approximately) zero.
(a) Harmonic functions
(b) Magnitude spectrum
Figure: 2-D DCT transform.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Sparsity in spatial domain gene microarray data [Drmanac et al. 1993]
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Sparsity in human visual cortex [Olshausen & Field 1997, Serre & Poggio 2006]
1 2 3
Feed-forward: No iterative feedback loop. Redundancy: Average 80-200 neurons for each feature representation. Recognition: Information exchange between stages is not about individual neurons, but rather how many neurons as a group fire together.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Sparsity and `1 -Minimization “Black gold” age [Claerbout & Muir 1973, Taylor, Banks & McCoy 1979]
Figure: Deconvolution of spike train.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Sparse Support Estimators Sparse support estimator [Donoho 1992, Meinshausen & Buhlmann 2006, Yu 2006, Wainwright 2006, Ramchandran 2007, Gastpar 2007]
Basis pursuit [Chen & Donoho 1999]: Given y = Ax and x unknown, x∗ = arg min kxk1 , subject to y = Ax The Lasso (least absolute shrinkage and selection operator) [Tibshirani 1996] x∗ = arg min ky − Axk2 , subject to kxk1 ≤ k
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Taking Advantage of Sparsity What generates sparsity? (d’apr` es Emmanuel Cand` es) Measure first, analyze later. Curse of dimensionality. 1
Numerical analysis: sparsity reduces cost for storage and computation.
2
Regularization in classification:
(a) decision boundary
(b) maximal margin
Figure: Linear support vector machine (SVM)
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Our Contributions 1
Classification via compressed sensing
2
Performance in face recognition
3
Extensions Outlier rejection Occlusion compensation
4
Distributed pattern recognition in sensor networks.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Problem Formulation in Face Recognition 1
Notations Training: For K classes, collect training samples {v1,1 , · · · , v1,n1 }, · · · , {vK ,1 , · · · , vK ,nK } ∈ RD . Test: Present a new y ∈ RD , solve for label(y) ∈ [1, 2, · · · , K ].
2
Construct RD sample space via stacking
Figure: For images, assume 3-channel 640 × 480 image, D = 3 · 640 · 480 ≈ 1e6. 3
Assume y belongs to Class i [Belhumeur et al. 1997, Basri & Jacobs 2003]
y
= =
αi,1 vi,1 + αi,2 vi,2 + · · · + αi,n1 vi,ni , Ai αi ,
where Ai = [vi,1 , vi,2 , · · · , vi,ni ]. Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Introduction
1
Classification via Sparse Representation
Distributed Pattern Recognition
Nevertheless, i is the variable we need to solve. Global representation:
α1 α2
y
=
[A1 , A2 , · · · , AK ] .. , .
=
Ax0 .
αK
2
Over-determined system: A ∈ RD×n , where D n = n1 + · · · + nK . x0 encodes membership of y: If y belongs to Subject i, x0 = [ 0 ···
0 αi 0 ··· 0 ]T
∈ Rn .
Problems to face Solving for x0 in RD is intractable. True solution x0 is sparse: Average
Allen Y. Yang
1 K
terms non-zero.
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Dimensionality Redunction 1
Construct linear projection R ∈ Rd×D , d is the feature dimension, d D. . ˜ 0 ∈ Rd . ˜ y = Ry = RAx0 = Ax ˜ ∈ Rd×n , but x0 is unchanged. A
2
Holistic features Eigenfaces [Turk 1991] Fisherfaces [Belhumeur 1997] Laplacianfaces [He 2005]
3
Partial features
4
Unconventional features Downsampled faces Random projections
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
`0 -Minimization 1
Solving for sparsest solution via `0 -Minimization ˜ x0 = arg min kxk0 s.t. ˜ y = Ax. x
k · k0 simply counts the number of nonzero terms. 2
`0 -Ball `0 -ball is not convex. `0 -minimization is NP-hard.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
`1 /`0 Equivalence 1
Compressed sensing: If x0 is sparse enough, `0 -minimization is equivalent to (P1 )
˜ min kxk1 s.t. ˜ y = Ax.
kxk1 = |x1 | + |x2 | + · · · + |xn |. 2
`1 -Ball
`1 -Minimization is convex. Solution equal to `0 -minimization.
3
`1 /`0 Equivalence: [Donoho 2002, 2004; Candes et al. 2004; Baraniuk 2006] ˜ 0 , there exists equivalence breakdown point (EBP) ρ(A), ˜ if kx0 k0 < ρ: Given ˜ y = Ax `1 -solution is unique x1 = x0
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
`1 -Minimization Routines Matching pursuit [Mallat 1993] 1 2 3
˜ with y: i = arg max hy, vj i. Find most correlated vector vi in A ˜ ←A ˜ˆi , xi ← hy, vi i, y ← y − xi vi . A Repeat until kyk < .
Basis pursuit [Chen 1998] 1 2
Assume x0 is m-sparse. ˜ as a basis Select m linearly independent vectors Bm in A †
xm = Bm y. 3 4
˜ if improve ky − Bm xm k. Repeat swapping one basis vector in Bm with another vector in A If ky − Bm xm k2 < , stop.
˜ 0 + z ∈ Rd , where kzk2 < Quadratic solvers: ˜ y = Ax x∗
=
˜ 2} arg min{kxk1 + λky − Axk
[Lasso, Second-order cone programming]: More expensive. Matlab Toolboxes `1 -Magic by Cand` es at Caltech. SparseLab by Donoho at Stanford. cvx by Boyd at Stanford. Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Classification
1
Project x1 onto face subspaces: α1 0
0 α2
0 0
δ1 (x1 ) = .. , δ2 (x1 ) = . , · · · , δK (x1 ) = .. . .. . . 0
2
0
αK
˜ i (x1 )k2 for Subject i: Define residual ri = k˜ y − Aδ id(y) = arg mini=1,··· ,K {ri }
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
(1)
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
AR Database 100 Subjects (Illumination and Expression Variance)
Table: I. Nearest Neighbor Dimension Eigen [%] Laplacian [%] Random [%] Down [%] Fisher [%]
30 68.1 73.1 56.7 51.7 83.4
54 74.8 77.1 63.7 60.9 86.8
Table: II. Nearest Subspace 130 79.3 83.8 71.4 69.2 N/A
540 80.5 89.7 75 73.7 N/A
30 64.1 66 59.2 56.2 80.3
30 73 73.4 54.1 51.4 86.3
Allen Y. Yang
54 84.3 85.8 70.8 73 93.3
130 82 84.3 80 77 N/A
540 85.1 90.3 83.3 82.1 N/A
Table: IV. `1 -Minimization
Table: III. Linear SVM Dimension Eigen [%] Laplacian [%] Random [%] Down [%] Fisher [%]
54 77.1 77.5 68.2 67.7 85.8
130 89 90.8 81.6 83.4 N/A
<
[email protected]>
540 92 95.7 88.8 90.3 N/A
30 71.1 73.7 57.8 46.8 87
54 80 84.7 75.5 67 92.3
130 85.7 91 87.6 84.6 N/A
540 92 94.3 94.7 93.9 N/A
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Sparsity vs. Non-sparsity: `1 and SVM decisively outperform NN and NS. 1
Our framework seeks sparsity in representation of y.
2
SVM seeks sparsity in decision boundaries on A = [v1 , · · · , vn ].
3
NN and NS do not enforce sparsity.
`1 -Minimization vs. SVM: Performance of SVM depends on the choice of features. 1
Random project performs poorly with SVMs.
2
`1 -Minimization guarantees performance convergence with different features.
3
At lower-dimensional space, Fisher features outperform. Table: IV. `1 -Minimization
Table: III. Linear SVM Dimension Eigen [%] Laplacian [%] Random [%] Down [%] Fisher [%]
30 73 73.4 54.1 51.4 86.3
Allen Y. Yang
54 84.3 85.8 70.8 73 93.3
130 89 90.8 81.6 83.4 N/A
<
[email protected]>
540 92 95.7 88.8 90.3 N/A
30 71.1 73.7 57.8 46.8 87
54 80 84.7 75.5 67 92.3
130 85.7 91 87.6 84.6 N/A
540 92 94.3 94.7 93.9 N/A
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Randomfaces Blessing of Dimensionality [Donoho 2000] In high-dimensional data space RD , with overwhelming probability, `1 /`0 equivalence holds for random projection R.
Unconventional properties: 1
Domain independent!
2
Data independent!
3
Fast to generate and compute!
Reference: Yang et al. Feature selection in face recognition: A sparse representation perspective. Berkeley Tech Report, 2007.
Allen Y. Yang
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Variation: Outlier Rejection `1 -Coefficients for invalid images
Outlier Rejection When `1 -solution is not sparse or concentrated to one subspace, the test sample is invalid. . K · maxi kδi (x)k1 /kxk1 − 1 Sparsity Concentration Index: SCI(x) = ∈ [0, 1]. K −1
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Variation: Occlusion Compensation
1
Sparse representation + sparse error y = Ax + e
2
Occlusion compensation: y= A
|
I
x = Bw e
Reference: Wright et al. Robust face recognition via sparse representation. UIUC Tech Report, 2007.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Distributed Pattern Recognition
Figure: d-Oracle: Distributed object recognition via camera wireless networks.
Key components: 1 Each sensor only observes partial profile of the event: Demand a global classification framework. 2
Individual sensor obtains limited classification ability: Sensors become active only when certain events are locally detected.
3
The network configuration is dynamic: Global classifier needs to adapt to change of active sensors.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Problem Formulation for Distributed Action Recognition Architecture 8 sensors distributed on human body. Location of the sensors are given and fixed. Each sensor carries triaxial accelerometer and biaxial gyroscope. Sampling frequency: 20Hz.
Figure: Readings from 8 x-axis accelerometers and x-axis gyroscopes for a stand-kneel-stand sequence.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Challenges for Distributed Action Recognition 1
Simultaneous segmentation and classification.
2
Individual sensors not sufficient to classify all human actions.
3
Simulate sensor failure and network congestion by different subsets of active sensors.
4
Identity independence: The prior examples of the subject for testing are excluded as part of training data.
Figure: Same actions performed by two subjects.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Mixture Subspace Model for Distributed Action Recognition 1
Training samples are segmented manually with correct labels.
2
On each sensor node i, normalize the vector form (via stacking) vi = [x(1), · · · , x(h), y (1), · · · , y (h), z(1), · · · , z(h), θ(1), · · · , θ(h), ρ(1), · · · , ρ(h)]T ∈ R5h
3
Full body motion v1
y 1
!
.. .
Training sample: v =
Test sample: y = .. ∈ R8·5h .
v8 4
y8
Mixture subspace model y
v1
y = .. = .
.. .
1
y8
Allen Y. Yang
v8
<
[email protected]>
v1
! ,··· , 1
.. .
v8
! x = Ax. n
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Localized Classifiers Distributed Sparse Representation y v ! 1 1 .. = .. ,··· , . . y8
v8
1
! y1 =(v1,1 ,··· ,v1,n )x .. .. x ⇔ . . v8 n y8 =(v8,1 ,··· ,v8,n )x v1
On each sensor node i: 1
Given a (long) test sequence at time t, apply multiple duration hypotheses: yi ∈ R5h .
2
Choose Fisher features Ri ∈ R10×5h : ˜ i x ∈ R10 ˜ yi = Ri yi = Ri Ai x = A
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Localized Classifiers 1
Equivalently, define Ri0 = (0 · · · Ri · · · 0) y 1
˜ yi = (0 · · · Ri · · · 0) .. = (0 · · · Ri · · · 0) . y8
2
v1
.. .
v8
v1
! ,··· ,
! .. x = Ri0 Ax ∈ R10 .
v8
1
n
For all segmentation hypotheses, apply sparsity concentration index (SCI) threshold σ1 :
(a) valid segmentation
(b) invalid segmentation
Local Sparsity Threshold σ1 If SCI(x) > σ1 , sensor i becomes active and transmits ˜ yi ∈ R10 . ˜ yi provides a segmentation hypothesis at time t and length hi .
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Adaptive Global Classifier Adaptive classification for a subset of active sensors(Suppose 1, . . . , L at time t and hi ) R1 ···
0 ··· 0
Define global feature matrix R 0 = .. . . .. . . . 0 ··· R ˜y y L 1
1
.. : .
··· 0
A1
.. = R 0 .. = R 0 .. x = R 0 Ax . . . ˜ yL
y8
A8
Global segmentation: Regardless of L active sensors, given a global threshold σ2 , If SCI(x) > σ2 , accept y as global segmentation with label given by x. Distributed Classification via Compressed Sensing 1
Reformulate adaptive classification via feature matrix R 0 :
R1 ···
Local: R 0 = (0 · · · Ri · · · 0)
0 ··· 0
Global: R 0 = .. . . .. . . .
.. ⇔ R 0 y = R 0 Ax .
0 ··· RL ··· 0
2
The representation x and training matrix A remain invariant.
3
Segmentation, recognition, and outlier rejection are unified on x.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Experiment Algorithm only depends on two parameters: (σ1 , σ2 ). Data communications between sensors and station are 10-D action features. Precision vs Recall: Sensors Prec [%] Rec [%]
2 89.8 65
7 94.6 61.5
2,7 94.4 82.5
1,2,7 92.8 80.6
1- 3, 7,8 94.6 89.5
1- 8 98.8 94.2
Confusion Tables:
(a) Sensor 1-8
(b) Sensor 7
Reference: Yang et al. Distributed Segmentation and Classification of Human Actions Using a Wearable Motion Sensor Network. Berkeley Tech Report, 2007.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion 1
Sparsity is important for classification of HD data.
2
A new recognition framework via compressed sensing.
3
In HD feature space, choosing an “optimal” feature becomes not significant.
4
Randomfaces, outliers, occlusion.
5
Distributed pattern recognition in body sensor networks.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Future Directions Distributed camera networks
Biosensor networks in health care
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning
Conclusion
Introduction
Classification via Sparse Representation
Distributed Pattern Recognition
Conclusion
Acknowledgments Collaborators Berkeley: Shankar Sastry, Ruzena Bajcsy UIUC: Yi Ma UT-Dallas: Roozbeh Jafari MATLAB Toolboxes `1 -Magic by Cand` es at Caltech. SparseLab by Donoho at Stanford. cvx by Boyd at Stanford. References Robust face recognition via sparse representation. Submitted to PAMI, 2008. Distributed segmentation and classification of human actions using a wearable motion sensor network. Berkeley Tech Report 2007.
Allen Y. Yang
<
[email protected]>
Compressed Sensing Meets Machine Learning