Authors: Olaf Ronneberger, Philipp Fischer, and Thomas Brox Computer Science Department and BIOSS Centre for Biological Signalling Studies, University of Freiburg, Germany
conditionally accepted at MICCAI 2015
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data sugmentation to use the available annotated samples moreefficiently. The architecture consists of a contracting path to capture context and s symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method(a sliding-window convolution network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on the transmitted light microscopy images(please contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation(based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net.
摘要: 有大量的同意，成功训练的深度网络需要数千个有标注的训练样本.在本文中，我们提出了一个网络和一个训练策略，该策略依赖于数据增强，数据增强更高效的使用可获得的标注样本. 该架构包括一个捕捉上下文的contracting path和能够精确定位的expanding path. 我们发现这样的网络可以从很少的图像中端到端的训练，并且在EM图像组神经结构的分割的ISBI挑战中优于之前最好的方法(滑动窗口卷积网络).使用相同的网络，在透射光显微镜图像训练，我们以明显优势赢得了ISBI细胞跟踪挑战2015. 此外，网络是很快的. 一张512x512图像的分割在最近的GPU上不到一秒钟.Larger patches require more max-pooling layers that reduce the localization accuracy, while small patches allow the network to see only little context. Good localization and the use of context are possible at the same time.
较大的patches需要更多的max-pooling层，这降低了定位精度,而较小patches允许网络只看到很少的上下文. 同时获得好的定位和使用上下文是可能的.We modify and extend "fully convolutional network" such that it works with few training images and yields more precise segmentations; see Figure 1. The main idea in "fully convolutional network" is to supplement a usual contracting network by successive layers, where pooling operators are replaced by upsampling operators. In order to localize, high resolution features from the contracting path are combined with the upsampled output. A successive convolution layer can then learn to assemble a more precise output based on this information.
我们修改并扩展了”全卷积网络”，使得它能使用很少的训练图像并且产生更精确的分割; 参见Figure 1.
为了定位，来自contracting path的高分辨率特征与上采样输出相结合. 然后，一个连续卷积层可以基于这些信息学习出更精确的输出.
One important modification in our architecture is that in the upsampling part we have also a large number of feature channels, which allow the network to propagate context information to higher resolution layers. As a consequence, the expansive path is more or less symmetric to the contracting path, and yields a u-shaped architecture.
我们架构中的一个重要修改就是在上采样部分我们还有大量的特征通道，这些通道允许网络将上下文信息传播到更高分辨率的层. 因此，expansive path和contracting path或多或少是对称，产生了U型结构.As for our tasks there is very little training data available, we use excessive data augmentation by applying elastic deformations to the available training images. This allows the network to learn invariance to such deformations, without the need to see the transformations in the annotated image corpus. This is particularly important in biomedical segmentation, since deformation used to be the most common variation in tissue and realistic deformations can be simulated efficiently.
对于我们的任务，可用的训练数据很少，我们通过对可用的训练图像应用弹性形变来使用过多的数据增强. 这允许网络学习这种变形的不变性，而无需在带注释的图像语料库中看到这种转换. 这在生物医学分割中尤其重要，因为变形是组织中最常见的变化，并且可以有效的模拟真实的变形.Another challenge in many cell segmentation tasks is the separation of touching objects of the same class, see Figure 3. To this end, we propose the use of a weighted loss, where the separating background labels between touching cells obtain a large weight in the loss function.
在很多细胞分割任务中的另一个挑战是对同一类touching objects的分离，见Figure 3. 为此，我们建议使用加权损失, 在touching cells之间的分离背景标签在损失函数中获得一个大的权重.
The network architecture is illustrated in Figure 1. It consists of a contracting path (left side) and an expansive path (right side). The contracting path follows the typical architecture of a convolutional network. It consists of the repeated application of two 3x3 convolutions (unpadded convolutions), each followed by a rectied linear unit (ReLU) and a 2x2 max pooling operation with stride 2 for downsampling. At each downsampling step we double the number of feature channels. Every step in the expansive path consists of an upsampling of the feature map followed by a 2x2 convolution (\up-convolution”) that halves the number of feature channels, a concatenation with the correspondingly cropped feature map from the contracting path, and two 3x3 convolutions, each fol- lowed by a ReLU. The cropping is necessary due to the loss of border pixels in every convolution. At the nal layer a 1x1 convolution is used to map each 64- component feature vector to the desired number of classes. In total the network has 23 convolutional layers.
Data AugmentationData augmentation is essential to teach the network the desired invariance and robustness properties, when only few training samples are available. In case of microscopical images we primarily need shift and rotation invariance as well as robustness of the training samples seem to be the key concept to train a segmentation network with few annotated images. We generate smooth deformations using random displacement vectors on a coarse 3 by 3 grid. The displacements are sampled from a Gaussian distribution with 10 pixels standard deviation. Per-pixel displacements are then computed using bicubic interpolation. Drop-out layers at the end of the contracting path perform further implicit data augmentation.
当只有很少的训练样本可用时，数据增强对于教会网络期望的不变性和鲁棒性是必不可少的. 对于显微图像, 我们主要需要平移和旋转不变性，以及训练训练样本的鲁棒性似乎是训练少量注释分割网络的关键概念. 我们使用随机位移向量在粗糙的3x3网格上产生平滑变形.位移是从10像素标注差的高斯分布采样得到.然后使用双三次插值计算每个像素位移. 在contracting path的结尾drop-out层进一步做隐式数据增强.
疑问: contracting path的drop-out层怎么做的?