The development of segmentation algorithms for liver tumors in CT scans has found growing attention in recent years. The validation of these methods, however, is often treated as a subordinate task. In this article, we review existing approaches and present first steps towards a new methodology that evaluates the quality of an algorithm in relation to the variability of manual delineations. 

We obtained three manual segmentations for 50 liver lesions and computed the results of a segmentation algorithm. We compared all four masks with each other and with different ground truth estimates and calculated scores according to the validation framework from the MICCAI challenge 2008. Our results show some cases where this more elaborate evaluation reflects the segmentation quality in a more adequate way than traditional approaches. The concepts can also be extended to other similar segmentation problems.