# Group Equivariant Convolutional Networks in Medical Image Analysis

## Introduction

AI has been one of the most heated topic in this planet recently and greatly changed people's life. Past years has witnessed the widely applications of convolutional neural networks (CNNs) in medical image analysis, including several tasks such as classification, segmentation, which greatly advanced diagnosis and treatment in clinical practice.
CNNs' success rely on reasonable architectures of networks, and strong computational power, as well as large scale datasets. But in some scenarios like healthcare system, large scale data is hard to obtain. Thus extensive data augmentation should be implemented to achieve better performance.
Different from natural images, some medical images exhibit not only translational symmetry but rotation and reflection symmetry. However, CNNs failed to make good use of these symmetries.
Group equivariant convolutional networks (G-CNNs) [1] was proposed in 2016 as a generalization of CNNs, using G-convolutions to enjoy a substantially higher degree of weight sharing, which can be implemented with negligible computational overhead for discrete groups generated by translations, reflections and rotations. This architecture has been used in medical image analysis and proved that it can improve the performance in multi-tasks
This paper will give a brief review for the applications of G-CNNs in medical image analysis and show some details about how it improves the models.

## Prelimiaries

### Symmetries

Cohen et al. [1:1] explained the concept of symmetry as transformation that maintains the object invariant or equivalent. Cohen's PhD thesis [2] further explained that the set of all symmetries can be considered as a label function where the equivalence classes induced by the group are the same as the classes.
Figure 1: Example of a label function
Figure 1 [2:1] is an example of a label function $L:\mathcal{X}\to \mathcal{C}$$L:\mathcal{X}\to \mathcal{C}$L:XrarrCL: \mathcal{X} \rightarrow \mathcal{C} (in green) from input space $\mathcal{X}$$\mathcal{X}$X\mathcal{X} to label space $\mathcal{C}$$\mathcal{C}$C\mathcal{C}. A symmetry transformation $g:\mathcal{X}\to \mathcal{X}$$g:\mathcal{X}\to \mathcal{X}$g:XrarrXg: \mathcal{X} \rightarrow \mathcal{X} of $L$$L$LL is drawn in blue.
As $g$$g$gg is a symmetry of $L$$L$LL and each class is a union of orbits of $g$$g$gg (connected components of the blue graph). That is, ${L}^{-1}\left({c}_{1}\right)=\left\{{x}_{1},{x}_{2}\right\}\cup \left\{{x}_{3},{x}_{4}\right\}$${L}^{-1}\left({c}_{1}\right)=\left\{{x}_{1},{x}_{2}\right\}\cup \left\{{x}_{3},{x}_{4}\right\}$L^(-1)(c_(1))={x_(1),x_(2)}uu{x_(3),x_(4)}L^{-1}\left(c_{1}\right)=\left\{x_{1}, x_{2}\right\} \cup\left\{x_{3}, x_{4}\right\} and ${L}^{-1}\left({c}_{2}\right)=$${L}^{-1}\left({c}_{2}\right)=$L^(-1)(c_(2))=L^{-1}\left(c_{2}\right)= $\left\{{x}_{5},{x}_{6},{x}_{7}\right\}$$\left\{{x}_{5},{x}_{6},{x}_{7}\right\}${x_(5),x_(6),x_(7)}\left\{x_{5}, x_{6}, x_{7}\right\}. In this case, we can reduce the minimum number of labels required from 7 to 3 in the noiseless setting if we know that $g$$g$gg is a symmetry of $L$$L$LL.
And [1:2] also takes sampling grid of images as an example, which is easier to understand: for a sampling grid of images ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2}, giving an operation of flipping: $-{\mathbb{Z}}^{2}=$$-{\mathbb{Z}}^{2}=$-Z^(2)=-\mathbb{Z}^{2}= $\left\{\left(-n,-m\right)\mid \left(n,m\right)\in {\mathbb{Z}}^{2}\right\}={\mathbb{Z}}^{2}$$\left\{\left(-n,-m\right)\mid \left(n,m\right)\in {\mathbb{Z}}^{2}\right\}={\mathbb{Z}}^{2}${(-n,-m)∣(n,m)inZ^(2)}=Z^(2)\left\{(-n,-m) \mid(n, m) \in \mathbb{Z}^{2}\right\}=\mathbb{Z}^{2}. So the flipping operation is with prosperity of symmetry of the sampling grid of images.
We call a set of transformations with properties of symmetry as symmetry group.

### Representations

After the introduction symmetry, let us go to "representations". When the symmetry transformations act on vectors, we have a representation of the group. In [2:2], group representation is defined as the appropriate notion of group.
To be easier, a group action $\rho \left(g\right)$$\rho \left(g\right)$rho(g)\rho(g) that acts on $\mathcal{X}$$\mathcal{X}$X\mathcal{X} is the representation of group element $g\in G$$g\in G$g in Gg \in G.
Feature maps of CNNs as spaces of functions can also be regarded as one kind of representation.

### Equivariance

Equivariance is a kind of important symmetry.
Define $\mathrm{\Phi }$$\mathrm{\Phi }$Phi\Phi as an action, which maps one representation to another and is structure preserving.
For $G$$G$GG-spaces, $\mathrm{\Phi }$$\mathrm{\Phi }$Phi\Phi has to be equivariant if:
$\mathrm{\Phi }\left({T}_{g}x\right)={T}_{g}^{\mathrm{\prime }}\mathrm{\Phi }\left(x\right)$$\mathrm{\Phi }\left({T}_{g}x\right)={T}_{g}^{\mathrm{\prime }}\mathrm{\Phi }\left(x\right)$Phi(T_(g)x)=T_(g)^(')Phi(x)\Phi\left(T_{g} x\right)=T_{g}^{\prime} \Phi(x)
where ${T}_{g}$${T}_{g}$T_(g)T_{g} is transformation of $g$$g$gg on $x$$x$xx, then input the result to $\mathrm{\Phi }$$\mathrm{\Phi }$Phi\Phi,while ${T}_{g}^{\mathrm{\prime }}$${T}_{g}^{\mathrm{\prime }}$T_(g)^(')T_{g}^{\prime} is a transformation of $g$$g$gg on $\mathrm{\Phi }\left(x\right)$$\mathrm{\Phi }\left(x\right)$Phi(x)\Phi(x). Equivariance means that the above two transformations are equal.
[2:3] also pointed out that equivariance is transitive: if ${\mathrm{\Phi }}_{l}:{\mathcal{X}}_{l}\to {\mathcal{X}}_{l+1}$${\mathrm{\Phi }}_{l}:{\mathcal{X}}_{l}\to {\mathcal{X}}_{l+1}$Phi_(l):X_(l)rarrX_(l+1)\Phi_{l}: \mathcal{X}_{l} \rightarrow \mathcal{X}_{l+1} is equivariant with respect to ${\rho }_{l}$${\rho }_{l}$rho_(l)\rho_{l} and ${\rho }_{l+1}$${\rho }_{l+1}$rho_(l+1)\rho_{l+1}, and ${\mathrm{\Phi }}_{l+1}:{\mathcal{X}}_{l+1}\to {\mathcal{X}}_{l+2}$${\mathrm{\Phi }}_{l+1}:{\mathcal{X}}_{l+1}\to {\mathcal{X}}_{l+2}$Phi_(l+1):X_(l+1)rarrX_(l+2)\Phi_{l+1}: \mathcal{X}_{l+1} \rightarrow \mathcal{X}_{l+2} is equivariant with respect to ${\rho }_{l+1}$${\rho }_{l+1}$rho_(l+1)\rho_{l+1} and ${\rho }_{l+2}$${\rho }_{l+2}$rho_(l+2)\rho_{l+2}, then their composition ${\mathrm{\Phi }}_{l+1}\circ {\mathrm{\Phi }}_{l}:{\mathcal{X}}_{l}\to {\mathcal{X}}_{l+2}$${\mathrm{\Phi }}_{l+1}\circ {\mathrm{\Phi }}_{l}:{\mathcal{X}}_{l}\to {\mathcal{X}}_{l+2}$Phi_(l+1)@Phi_(l):X_(l)rarrX_(l+2)\Phi_{l+1} \circ \Phi_{l}: \mathcal{X}_{l} \rightarrow \mathcal{X}_{l+2} is equivariant with respect to ${\rho }_{l}$${\rho }_{l}$rho_(l)\rho_{l} and ${\rho }_{l+2}$${\rho }_{l+2}$rho_(l+2)\rho_{l+2}. Take CNNs for an example, if each layer of CNNs is of equivariance, the whole neural network is equivariant.
In machine learning, generally, equivariance is of more importance than invariance, as we can judge spatial relationships based on equivariant features rather than invariant features.

### The group $p4$$p4$p4p 4$p4$ and $p4m$$p4m$p4mp 4m$p4m$

[1:3] proposed the concept of two types of groups, $p4$$p4$p4p 4 and $p4m$$p4m$p4mp 4m. For a sampling grid of images ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2}, the group $p4$$p4$p4p 4 consists of translations and rotations by 90 degrees about any center of rotation in a square grid. The parameterization of this group can be represented in terms of three integers $r,u,v$$r,u,v$r,u,vr, u, v is
$g\left(r,u,v\right)=\left[\begin{array}{ccc}\mathrm{cos}\left(r\pi /2\right)& -\mathrm{sin}\left(r\pi /2\right)& u\\ \mathrm{sin}\left(r\pi /2\right)& \mathrm{cos}\left(r\pi /2\right)& v\\ 0& 0& 1\end{array}\right]$$g\left(r,u,v\right)=\left[\begin{array}{ccc}\mathrm{cos}\left(r\pi /2\right)& -\mathrm{sin}\left(r\pi /2\right)& u\\ \mathrm{sin}\left(r\pi /2\right)& \mathrm{cos}\left(r\pi /2\right)& v\\ 0& 0& 1\end{array}\right]$g(r,u,v)=[[cos(r pi//2),-sin(r pi//2),u],[sin(r pi//2),cos(r pi//2),v],[0,0,1]]g(r, u, v)=\left[\begin{array}{ccc} \cos (r \pi / 2) & -\sin (r \pi / 2) & u \\ \sin (r \pi / 2) & \cos (r \pi / 2) & v \\ 0 & 0 & 1 \end{array}\right]
where $0\le r<4$$0\le r<4$0 <= r < 40 \leq r<4 and $\left(u,v\right)\in {\mathbb{Z}}^{2}$$\left(u,v\right)\in {\mathbb{Z}}^{2}$(u,v)inZ^(2)(u, v) \in \mathbb{Z}^{2}. The group operation is given by matrix multiplication.
The group $p4m$$p4m$p4mp 4 m consists of all compositions of translations, mirror reflections, and rotations by 90 degrees about any center of rotation in the grid. The parameterization of this group can be represented in terms of three integers $m,r,u,v$$m,r,u,v$m,r,u,vm, r, u, v is
$g\left(m,r,u,v\right)=\left[\begin{array}{ccc}\left(-1{\right)}^{m}\mathrm{cos}\left(\frac{r\pi }{2}\right)& -\left(-1{\right)}^{m}\mathrm{sin}\left(\frac{r\pi }{2}\right)& u\\ \mathrm{sin}\left(\frac{r\pi }{2}\right)& \mathrm{cos}\left(\frac{r\pi }{2}\right)& v\\ 0& 0& 1\end{array}\right]$$g\left(m,r,u,v\right)=\left[\begin{array}{ccc}\left(-1{\right)}^{m}\mathrm{cos}\left(\frac{r\pi }{2}\right)& -\left(-1{\right)}^{m}\mathrm{sin}\left(\frac{r\pi }{2}\right)& u\\ \mathrm{sin}\left(\frac{r\pi }{2}\right)& \mathrm{cos}\left(\frac{r\pi }{2}\right)& v\\ 0& 0& 1\end{array}\right]$g(m,r,u,v)=[[(-1)^(m)cos((r pi)/(2)),-(-1)^(m)sin((r pi)/(2)),u],[sin((r pi)/(2)),cos((r pi)/(2)),v],[0,0,1]]g(m, r, u, v)=\left[\begin{array}{ccc} (-1)^{m} \cos \left(\frac{r \pi}{2}\right) & -(-1)^{m} \sin \left(\frac{r \pi}{2}\right) & u \\ \sin \left(\frac{r \pi}{2}\right) & \cos \left(\frac{r \pi}{2}\right) & v \\ 0 & 0 & 1 \end{array}\right]
where $m\in \left\{0,1\right\},0\le r<4$$m\in \left\{0,1\right\},0\le r<4$m in{0,1},0 <= r < 4m \in\{0,1\}, 0 \leq r<4 and $\left(u,v\right)\in {\mathbb{Z}}^{2}$$\left(u,v\right)\in {\mathbb{Z}}^{2}$(u,v)inZ^(2)(u, v) \in \mathbb{Z}^{2}.

## Group Equivariant Convolutional Networks

### Structure of Symmetric Feature Maps

The map from images to stacks of feature maps with $K$$K$KK channels in a CNN can be modeled as $f:{\mathbb{Z}}^{2}\to {\mathbb{R}}^{K}$$f:{\mathbb{Z}}^{2}\to {\mathbb{R}}^{K}$f:Z^(2)rarrR^(K)f: \mathbb{Z}^{2} \rightarrow \mathbb{R}^{K} supported on a bounded domain. Every pixel coordinate $\left(x,y\right)\in {\mathbb{Z}}^{2}$$\left(x,y\right)\in {\mathbb{Z}}^{2}$(x,y)inZ^(2)(x, y) \in \mathbb{Z}^{2}, the stack of feature maps returns a $K$$K$KK-dimensional vector $f\left(x,y\right)$$f\left(x,y\right)$f(x,y)f(x, y).
A transformation $g$$g$gg acting on a set of feature maps is shown as below:
$\left[{L}_{g}f\right]\left(x\right)=\left[f\circ {g}^{-1}\right]\left(x\right)=f\left({g}^{-1}x\right)$$\left[{L}_{g}f\right]\left(x\right)=\left[f\circ {g}^{-1}\right]\left(x\right)=f\left({g}^{-1}x\right)$[L_(g)f](x)=[f@g^(-1)](x)=f(g^(-1)x)\left[L_{g} f\right](x)=\left[f \circ g^{-1}\right](x)=f\left(g^{-1} x\right)
Feature maps in a $G$$G$GG-CNN are functions on the group $G$$G$GG rather than on the group ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2}. And $x$$x$xx, the element of ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2} can also be replaced by $h$$h$hh, an element of $G$$G$GG .,.
And the following two figures[1:4] are feature maps of $p4$$p4$p4p4 and $p4m$$p4m$p4mp4m and their rotations by $m$$m$mm:
Figure 2: A p4 feature map and its rotation by r
Figure 3: A p4m feature map and its rotation by r

### The Structure of Group Equivariant Convolutional Networks

The first layer of Group Equivariant Convolutional Networks is used to lift the images (a sampling grid of images ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2}) to feature maps, which can be defined as: $\left[f\star \psi \right]\left(g\right)=\sum _{y\in {\mathbb{Z}}^{2}}\sum _{k}{f}_{k}\left(y\right){\psi }_{k}\left({g}^{-1}y\right)$$\left[f\star \psi \right]\left(g\right)=\sum _{y\in {\mathbb{Z}}^{2}} \sum _{k} {f}_{k}\left(y\right){\psi }_{k}\left({g}^{-1}y\right)$[f***psi](g)=sum_(y inZ^(2))sum_(k)f_(k)(y)psi_(k)(g^(-1)y)[f \star \psi](g)=\sum_{y \in \mathbb{Z}^{2}} \sum_{k} f_{k}(y) \psi_{k}\left(g^{-1} y\right)
And all layers after the first can be defined as: $\left[f\star \psi \right]\left(g\right)=\sum _{h\in G}\sum _{k}{f}_{k}\left(h\right){\psi }_{k}\left({g}^{-1}h\right)$$\left[f\star \psi \right]\left(g\right)=\sum _{h\in G} \sum _{k} {f}_{k}\left(h\right){\psi }_{k}\left({g}^{-1}h\right)$[f***psi](g)=sum_(h in G)sum_(k)f_(k)(h)psi_(k)(g^(-1)h)[f \star \psi](g)=\sum_{h \in G} \sum_{k} f_{k}(h) \psi_{k}\left(g^{-1} h\right)
The difference is that the function is on the plane ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2} or on the discrete group $G$$G$GG.
Both these two functions are proved to be equivariant in [1:5].

### 3D Group Convolutions

[3] proposed a 3D Group Convolution to solve 3D images' analysis like CT scans. The 3D group convolution is implemented as $\mathrm{GConv}3d\left(\mathrm{\Psi },f\right)=\mathrm{Conv}3d\left({\mathrm{\Psi }}^{+},f\right)$$\mathrm{GConv}3d\left(\mathrm{\Psi },f\right)=\mathrm{Conv}3d\left({\mathrm{\Psi }}^{+},f\right)$GConv 3d(Psi,f)=Conv 3d(Psi^(+),f)\operatorname{GConv} 3 d(\Psi, f)=\operatorname{Conv} 3 d\left(\Psi^{+}, f\right)
where $\mathrm{\Psi }$$\mathrm{\Psi }$Psi\Psi is the filter bank.
The $\mathrm{map}\mathrm{\Psi }↦{\mathrm{\Psi }}^{+}$$\mathrm{map}\mathrm{\Psi }↦{\mathrm{\Psi }}^{+}$map Psi|->Psi^(+)\operatorname{map} \Psi \mapsto \Psi^{+}can be defined as an indexing operation of $\mathrm{\Psi }$$\mathrm{\Psi }$Psi\Psi using a precomputed array of indices. This review will not include the details of mathematics of 3D Group Convolution.

## Applications in Medical Image Analysis

### Classification

General process of classification consists of three steps by using G-CNNs:
1. Convert an image to feature maps
2. Convert feature maps to feature maps
3. Convert feature maps to a label.
In the final layer, a group-pooling layer is used to ensure that the output is invariant as a function on the plan for classification tasks.[4]
Example1: [4:1] proposed equivariant DenseNet architecture for the $p4$$p4$p4p 4 group, consisting of 5 Dense Blocks (D.B.) alternated with Transition Blocks (T.B.). The main process is as below:
1. Define the G-convolution on image $\left({\mathbb{Z}}^{2}\to \mathbit{G}\right)$$\left({\mathbb{Z}}^{2}\to \mathbit{G}\right)$(Z^(2)rarr G)\left(\mathbb{Z}^{2} \rightarrow \boldsymbol{G}\right)-convolution:
$\left[f\ast \psi \right]\left(g\right)=\sum _{y\in {\mathbb{Z}}^{2}}\sum _{k=1}^{K}{f}_{k}\left(y\right){\psi }_{k}\left({g}^{-1}y\right)$$\left[f\ast \psi \right]\left(g\right)=\sum _{y\in {\mathbb{Z}}^{2}} \sum _{k=1}^{K} {f}_{k}\left(y\right){\psi }_{k}\left({g}^{-1}y\right)$[f**psi](g)=sum_(y inZ^(2))sum_(k=1)^(K)f_(k)(y)psi_(k)(g^(-1)y)[f * \psi](g)=\sum_{y \in \mathbb{Z}^{2}} \sum_{k=1}^{K} f_{k}(y) \psi_{k}\left(g^{-1} y\right)
where $g=\left(r,t\right)$$g=\left(r,t\right)$g=(r,t)g=(r, t) is a roto-translation (in case $G=p4$$G=p4$G=p4G=p 4 ) or roto-reflection-translation (in case $G=p4m\right)$$G=p4m\right)$G=p4m)G=p 4 m).
1. Define the $\left(G\to G\right)$$\left(G\to G\right)$(G rarr G)(G \rightarrow G)-convolution:
$\left[f\ast \psi \right]\left(g\right)=\sum _{h\in G}\sum _{k=1}^{K}{f}_{k}\left(h\right){\psi }_{k}\left({g}^{-1}h\right)$$\left[f\ast \psi \right]\left(g\right)=\sum _{h\in G} \sum _{k=1}^{K} {f}_{k}\left(h\right){\psi }_{k}\left({g}^{-1}h\right)$[f**psi](g)=sum_(h in G)sum_(k=1)^(K)f_(k)(h)psi_(k)(g^(-1)h)[f * \psi](g)=\sum_{h \in G} \sum_{k=1}^{K} f_{k}(h) \psi_{k}\left(g^{-1} h\right)
1. Define a projection layer: $G\to {\mathbb{Z}}^{2}$$G\to {\mathbb{Z}}^{2}$G rarrZ^(2)G \rightarrow \mathbb{Z}^{2}
The final layer of the model is a $p4\to {\mathbb{Z}}^{2}$$p4\to {\mathbb{Z}}^{2}$p4rarrZ^(2)p 4 \rightarrow \mathbb{Z}^{2} group pooling layer followed by a sigmoid activation. The four orientations in $p4$$p4$p4p 4 are illustrated through primary colors. A ${\mathbb{Z}}^{2}\to p4$${\mathbb{Z}}^{2}\to p4$Z^(2)rarr p4\mathbb{Z}^{2} \rightarrow p 4 kernel (left), $p4\to p4$$p4\to p4$p4rarr p4p 4 \rightarrow p 4 kernel and $p4\to {\mathbb{Z}}^{2}$$p4\to {\mathbb{Z}}^{2}$p4rarrZ^(2)p 4 \rightarrow \mathbb{Z}^{2} kernel illustrate how equivariance arises in the model.
Figure 4: Equivariant DenseNet architecture, the figure is from [^5]
Example2: [5] proposed a kind of dynamic group convolution to improve the performance of breast tumor classification.The dynamic group convolutional is parameterized as a dynamic combination rather than a fixed one of multiple kernels. And the operation was generalized to an equivariant one.
For $\mathcal{K}:\mathcal{X}\to \mathcal{Y}$$\mathcal{K}:\mathcal{X}\to \mathcal{Y}$K:XrarrY\mathcal{K}: \mathcal{X} \rightarrow \mathcal{Y} and $F:\mathcal{X}\to \mathcal{Y}$$F:\mathcal{X}\to \mathcal{Y}$F:XrarrYF: \mathcal{X} \rightarrow \mathcal{Y} defined on $G$$G$GG, the dynamic group equivariant convolution is defined as follows:
$\left(\mathcal{K}F\right)\left(g\right)={\int }_{G}\mathcal{A}\left(g,\stackrel{~}{g}\right)K\left({g}^{-1}\stackrel{~}{g}\right)\left(\stackrel{~}{g}\right)F\left(\stackrel{~}{g}\right)d{\mu }_{\overline{g}}^{-}$$\left(\mathcal{K}F\right)\left(g\right)={\int }_{G} \mathcal{A}\left(g,\stackrel{~}{g}\right)K\left({g}^{-1}\stackrel{~}{g}\right)\left(\stackrel{~}{g}\right)F\left(\stackrel{~}{g}\right)d{\mu }_{\overline{g}}^{-}$(KF)(g)=int_(G)A(g, tilde(g))K(g^(-1)( tilde(g)))( tilde(g))F( tilde(g))dmu_( bar(g))^(-)(\mathcal{K} F)(g)=\int_{G} \mathcal{A}(g, \tilde{g}) K\left(g^{-1} \tilde{g}\right)(\tilde{g}) F(\tilde{g}) d \mu_{\bar{g}}^{-}
where $\mathcal{A}$$\mathcal{A}$A\mathcal{A} is an attention map obtained from dynamic attention operator $\mathcal{R}\left[F\right]$$\mathcal{R}\left[F\right]$R[F]\mathcal{R}[F], $K\left({g}^{-1}\stackrel{~}{g}\right)$$K\left({g}^{-1}\stackrel{~}{g}\right)$K(g^(-1)( tilde(g)))K\left(g^{-1} \tilde{g}\right) is a stack of kernels ${\left\{{K}_{i}\left({g}^{-1}\stackrel{~}{g}\right)\right\}}_{i\in k}$${\left\{{K}_{i}\left({g}^{-1}\stackrel{~}{g}\right)\right\}}_{i\in k}${K_(i)(g^(-1)( tilde(g)))}_(i in k)\left\{K_{i}\left(g^{-1} \tilde{g}\right)\right\}_{i \in k} concatenated by matrix form: $\left[{K}_{1}\left({g}^{-1}\stackrel{~}{g}\right){K}_{2}\left({g}^{-1}\stackrel{~}{g}\right)\cdots {K}_{k}\left({g}^{-1}\stackrel{~}{g}\right)\right]$$\left[{K}_{1}\left({g}^{-1}\stackrel{~}{g}\right){K}_{2}\left({g}^{-1}\stackrel{~}{g}\right)\cdots {K}_{k}\left({g}^{-1}\stackrel{~}{g}\right)\right]$[K_(1)(g^(-1)( tilde(g)))K_(2)(g^(-1)( tilde(g)))cdotsK_(k)(g^(-1)( tilde(g)))]\left[K_{1}\left(g^{-1} \tilde{g}\right) K_{2}\left(g^{-1} \tilde{g}\right) \cdots K_{k}\left(g^{-1} \tilde{g}\right)\right].
The paper proved that the dynamic group convolution operator's equivariance can be maintained by giving an equivariant constraint on $\mathcal{R}$$\mathcal{R}$R\mathcal{R}.
The results and comparisons on PCAM Dataset is shown as below, which proved that dynamic group convolution can led to improvement in breast tumor classification:
 Method Group $\mathrm{#}$$\mathrm{#}$#\#$\mathrm{#}$ Params AUC DenseNet ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2}${\mathbb{Z}}^{2}$ $128\text{}\mathrm{K}$$128\text{}\mathrm{K}$128K128 \mathrm{~K} $95.5$$95.5$95.595.5$95.5$ DenseNet+ ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2}${\mathbb{Z}}^{2}$ $128\text{}\mathrm{K}$$128\text{}\mathrm{K}$128K128 \mathrm{~K} $95.1$$95.1$95.195.1$95.1$ P4-DenseNet $p4$$p4$p4p 4$p4$ $125\text{}\mathrm{K}$$125\text{}\mathrm{K}$125K125 \mathrm{~K} $94.5$$94.5$94.594.5$94.5$ P4M-DenseNet $p4m$$p4m$p4mp 4 m$p4m$ $119\text{}\mathrm{K}$$119\text{}\mathrm{K}$119K119 \mathrm{~K} $\mathbf{9}\mathbf{6}\mathbf{.}\mathbf{3}$$\mathbf{9}\mathbf{6}\mathbf{.}\mathbf{3}$96.3\mathbf{9 6 . 3}$\mathbf{9}\mathbf{6}\mathbf{.}\mathbf{3}$ Dy-DenseNet ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2}${\mathbb{Z}}^{2}$ $221\text{}\mathrm{K}$$221\text{}\mathrm{K}$221K221 \mathrm{~K} $95.8$$95.8$95.895.8$95.8$ Dy-DenseNet+ ${\mathbb{Z}}^{2}$${\mathbb{Z}}^{2}$Z^(2)\mathbb{Z}^{2}${\mathbb{Z}}^{2}$ $221\text{}\mathrm{K}$$221\text{}\mathrm{K}$221K221 \mathrm{~K} $95.6$$95.6$95.695.6$95.6$ Dy-P4-DenseNet $p4$$p4$p4p 4$p4$ $202\text{}\mathrm{K}$$202\text{}\mathrm{K}$202K202 \mathrm{~K} $95.1$$95.1$95.195.1$95.1$ Dy-P4M-DenseNet $p4m$$p4m$p4mp 4 m$p4m$ $198\text{}\mathrm{K}$$198\text{}\mathrm{K}$198K198 \mathrm{~K} $\mathbf{9}\mathbf{6}\mathbf{.}\mathbf{7}$$\mathbf{9}\mathbf{6}\mathbf{.}\mathbf{7}$96.7\mathbf{9 6 . 7}$\mathbf{9}\mathbf{6}\mathbf{.}\mathbf{7}$
Example3: [3:1] applied 3D GCNNs on Nodule classification and evaluated performance by using Free-Response Operating Characteristic (FROC)[]. ${D}_{4},{D}_{4h},O,{O}_{h}$${D}_{4},{D}_{4h},O,{O}_{h}$D_(4),D_(4h),O,O_(h)D_{4}, D_{4 h}, O, O_{h} are different types of 3D groups.
The results of and comparison is shown as below, from which we can see the 3D G-CNNs can improve classification performance:
 $\mathbf{N}$$\mathbf{N}$N\mathbf{N}$\mathbf{N}$ ${\mathbb{Z}}^{3}$${\mathbb{Z}}^{3}$Z^(3)\mathbb{Z}^{3}${\mathbb{Z}}^{3}$ ${\mathrm{D}}_{4}$${\mathrm{D}}_{4}$D_(4)\mathrm{D}_{4}${\mathrm{D}}_{4}$ ${\mathrm{D}}_{4h}$${\mathrm{D}}_{4h}$D_(4h)\mathrm{D}_{4 h}${\mathrm{D}}_{4h}$ $\mathbf{O}$$\mathbf{O}$O\mathbf{O}$\mathbf{O}$ ${\mathbf{O}}_{h}$${\mathbf{O}}_{h}$O_(h)\mathbf{O}_{h}${\mathbf{O}}_{h}$ 30 $0.252$$0.252$0.2520.252$0.252$ $0.398$$0.398$0.3980.398$0.398$ $0.382$$0.382$0.3820.382$0.382$ $0.562$$0.562$0.5620.562$0.562$ $0.514$$0.514$0.5140.514$0.514$ 300 $0.550$$0.550$0.5500.550$0.550$ $0.765$$0.765$0.7650.765$0.765$ $0.759$$0.759$0.7590.759$0.759$ $0.767$$0.767$0.7670.767$0.767$ $0.733$$0.733$0.7330.733$0.733$ 3,000 $0.791$$0.791$0.7910.791$0.791$ $0.849$$0.849$0.8490.849$0.849$ $0.844$$0.844$0.8440.844$0.844$ $0.830$$0.830$0.8300.830$0.830$ $0.850$$0.850$0.8500.850$0.850$ 30,000 $0.843$$0.843$0.8430.843$0.843$ $0.867$$0.867$0.8670.867$0.867$ $0.880$$0.880$0.8800.880$0.880$ $0.873$$0.873$0.8730.873$0.873$ $0.869$$0.869$0.8690.869$0.869$

### Segmentation

General process of segmentation consists of three steps by using G-CNNs:
1. Convert an image to feature maps,
2. Convert feature maps to feature maps
3. Convert feature maps to a segmentation.
In the final layer, a group-pooling layer is used to ensure equivariant as a function on the plane (for segmentation tasks, where the output is supposed to transform together with the input).[4:2]
Example1: [6] proposed a Rota-Net to segment simultaneous gland and lumen segmentation in colon histology images, incorporating the inherent rotational symmetry within histology images into an encoder-decoder based network by utilizing G-CNNs, specifically using the symmetry group of rotations by multiples of 90∘.
The main process is as below:
1. Define the $\left({\mathbb{Z}}^{2}\to G\right)$$\left({\mathbb{Z}}^{2}\to G\right)$(Z^(2)rarr G)\left(\mathbb{Z}^{2} \rightarrow G\right)-convolution on image $f:{\mathbb{Z}}^{2}\to {\mathbb{R}}^{K}$$f:{\mathbb{Z}}^{2}\to {\mathbb{R}}^{K}$f:Z^(2)rarrR^(K)f: \mathbb{Z}^{2} \rightarrow \mathbb{R}^{K} $\left[f\star w\right]\left(g\right)=\sum _{y\in {\mathbb{Z}}^{2}}\sum _{k=1}^{K}{f}_{k}\left(y\right){w}_{k}\left({g}^{-1}y\right)$$\left[f\star w\right]\left(g\right)=\sum _{y\in {\mathbb{Z}}^{2}} \sum _{k=1}^{K} {f}_{k}\left(y\right){w}_{k}\left({g}^{-1}y\right)$[f***w](g)=sum_(y inZ^(2))sum_(k=1)^(K)f_(k)(y)w_(k)(g^(-1)y){[f \star w](g)=\sum_{y \in \mathbb{Z}^{2}} \sum_{k=1}^{K} f_{k}(y) w_{k}\left(g^{-1} y\right) }
where ${w}_{k}$${w}_{k}$w_(k)w_{k} denotes kernel $k$$k$kk, with corresponding input channel ${f}_{k}$${f}_{k}$f_(k)f_{k} and $g$$g$gg is a roto-translation.
1. Define the $\left(G\to G\right)$$\left(G\to G\right)$(G rarr G)(G \rightarrow G)-convolution on feature maps $f:\mathbb{G}\to {\mathbb{R}}^{K}$$f:\mathbb{G}\to {\mathbb{R}}^{K}$f:GrarrR^(K)f: \mathbb{G} \rightarrow \mathbb{R}^{K} as:
$\left[f\star w\right]\left(g\right)=\sum _{h\in \mathbb{G}}\sum _{k=1}^{K}{f}_{k}\left(h\right){w}_{k}\left({g}^{-1}h\right)$$\left[f\star w\right]\left(g\right)=\sum _{h\in \mathbb{G}} \sum _{k=1}^{K} {f}_{k}\left(h\right){w}_{k}\left({g}^{-1}h\right)$[f***w](g)=sum_(h inG)sum_(k=1)^(K)f_(k)(h)w_(k)(g^(-1)h)[f \star w](g)=\sum_{h \in \mathbb{G}} \sum_{k=1}^{K} f_{k}(h) w_{k}\left(g^{-1} h\right)
1. Define the projection layer: $G\to {\mathbb{Z}}^{2}$$G\to {\mathbb{Z}}^{2}$G rarrZ^(2)G \rightarrow \mathbb{Z}^{2}
The architecture of Rota-Net is as the figure [6:1]:
Figure 5: The architecture of Rota-Net: the yellow box within the input means the part of the image considered at the output. The number at the top of each operation means the number of feature maps produced.
Example2: [7] propose a 3D left-right-reflection equivariant network to segment the anatomical structures of the brain, exploiting the left and right symmetry property of the brain. The segmentation model was 3D U-Net[8], which is a classical segmentation model and it incorporated RE convolutions.
For a multi-channel function $\mathbit{f}$$\mathbit{f}$f\boldsymbol{f} of spatial location $\mathbit{x}\in {\mathbb{Z}}^{3}$$\mathbit{x}\in {\mathbb{Z}}^{3}$x inZ^(3)\boldsymbol{x} \in \mathbb{Z}^{3}, i.e., a $3\mathrm{D}$$3\mathrm{D}$3D3 \mathrm{D} image or feature maps, suppose $R$$R$RR is left-right reflection and $R\mathbit{f}=\mathbit{f}\left({R}^{-1}\mathbit{x}\right)=\mathbit{f}\left(R\mathbit{x}\right)$$R\mathbit{f}=\mathbit{f}\left({R}^{-1}\mathbit{x}\right)=\mathbit{f}\left(R\mathbit{x}\right)$Rf=f(R^(-1)x)=f(Rx)R \boldsymbol{f}=\boldsymbol{f}\left(R^{-1} \boldsymbol{x}\right)=\boldsymbol{f}(R \boldsymbol{x}) where ${R}^{-1}$${R}^{-1}$R^(-1)R^{-1} is the inverse of $R$$R$RR and ${R}^{-1}=R$${R}^{-1}=R$R^(-1)=RR^{-1}=R.
The difference between in G-CNNs between example 1 and example 2 is that the feature map is 3D, and the transformation is left-right-reflection. The comparison of result is shown as below [7:1]:
Figure 6: Example brain tissue segmentation. (A), (E): original and reflected testing images. (B), (F): manual delineations. (C), (G): results of the conventional U-Net trained with reflection augmentation. (D), (H): the results of the RE U-Net.
Example3: [9] proposed a new model of two-pathway-group CNN architecture for brain tumor segmentation, making use of local features and global contextual features at the same time. Equivariance has been maintained to reduce instabilities and overfitting parameter sharing, which could improve the segmentation performance. The following figure is from [9:1]
Figure 7: Two-Pathway-Group CNN architecture (2PG-CNN) showing that the input patch is processed by two different group CNNs. The four blocks in a feature map of both CNNs show a p4 group features map that inherits group CNN properties. The upper CNN represents a local feature map and the lower CNN shows a global feature map.

## Conclusions and Future Opportunities

This paper reviews the basic theory of G-CNNs, and gives some typical examples of its application on medical images. Due to the characteristics of lesions, tissues or structures, medical images are very suitable for using G-CNNs. It can be seen from many examples that compared with the standard CNN, G-CNNs can effectively improve the performance of the model as a more generalized one.
In future studies, we can design the combination of G-CNNs and classification or segmentation models according to the specific characteristics of lesions, tissues, structures due to the more complex symmetries in 3D images. In addition, we can also analyze the latest state-of-the-arts models and explore the possibility of combining with G-CNNs.
Besides, the examples in this review focus on classification and segmentation, but there are also tasks of generation and reconstruction in medical image analysis. Equavariant neural networks can also combine with methods in these tasks. Additionally, frame equavariance may be used in clinical videos analysis.
And the equavariance not only can be explored in CNNs, but also in Transformers. As the great success of transformers has achieved these days, there would also are opportunities for the research of Transformers and their exploiting with equavariance, as well as potential applications in medical image analysis.

## Reference

1. Cohen, T., & Welling, M. (2016, June). Group equivariant convolutional networks. In International conference on machine learning (pp. 2990-2999). PMLR.(http://proceedings.mlr.press/v48/cohenc16.html) ↩︎ ↩︎ ↩︎ ↩︎ ↩︎ ↩︎
2. Van Ginneken, B., Armato III, S. G., de Hoop, B., van Amelsvoort-van de Vorst, S., Duindam, T., Niemeijer, M., ... & Prokop, M. (2010). Comparing and combining algorithms for computer-aided detection of pulmonary nodules in computed tomography scans: the ANODE09 study. Medical image analysis, 14(6), 707-722.(https://www.sciencedirect.com/science/article/pii/S1361841510000587) ↩︎ ↩︎
3. Veeling, B. S., Linmans, J., Winkens, J., Cohen, T., & Welling, M. (2018, September). Rotation equivariant CNNs for digital pathology. In International Conference on Medical image computing and computer-assisted intervention (pp. 210-218). Springer, Cham (https://link.springer.com/chapter/10.1007/978-3-030-00934-2_24) ↩︎ ↩︎ ↩︎
4. Li, Y., Cao, G., & Cao, W. (2020, December). A dynamic group equivariant convolutional networks for medical image analysis. In 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (pp. 1056-1062). IEEE.(https://ieeexplore.ieee.org/abstract/document/9313601) (https://www.sciencedirect.com/science/article/pii/S1361841510000587) ↩︎
5. Graham, Simon, David Epstein, and Nasir Rajpoot. "Rota-net: Rotation equivariant network for simultaneous gland and lumen segmentation in colon histology images." European Congress on Digital Pathology. Springer, Cham, 2019. (https://link.springer.com/chapter/10.1007/978-3-030-23937-4_13) ↩︎ ↩︎
6. Han, Shuo, Jerry L. Prince, and Aaron Carass. Reflection-equivariant convolutional neural networks improve segmentation over reflection augmentation.Medical Imaging 2020: Image Processing. Vol. 11313. International Society for Optics and Photonics, 2020. (https://www.spiedigitallibrary.org/conference-proceedings-of-spie/11313/1131337/Reflection-equivariant-convolutional-neural-networks-improve-segmentation-over-reflection-augmentation/10.1117/12.2549399.full) ↩︎ ↩︎
7. Çiçek, Ö., Abdulkadir, A., Lienkamp, S. S., Brox, T., & Ronneberger, O. (2016, October). 3D U-Net: learning dense volumetric segmentation from sparse annotation. In International conference on medical image computing and computer-assisted intervention (pp. 424-432). Springer, Cham. (https://link.springer.com/chapter/10.1007/978-3-319-46723-8_49) ↩︎
8. Razzak, M. I., Imran, M., & Xu, G. (2018). Efficient brain tumor segmentation with multiscale two-pathway-group conventional neural networks. IEEE journal of biomedical and health informatics, 23(5), 1911-1919.（https://ieeexplore.ieee.org/abstract/document/8481481） ↩︎ ↩︎

# Recommended for you

Co-Tuning: An easy but effective trick to improve transfer learning
Co-Tuning: An easy but effective trick to improve transfer learning
Transfer learning is a popular method in the deep learning community, but it is usually implemented naively (eg. copying weights as initialization). Co-Tuning is a recently proposed technique to improve transfer learning that is easy to implement, and effective to a wide variety of tasks.
5 points
0 issues