1 Perception-based energy functions in seam-cutting Nan Li , Tianli Liao, and Chao Wang Abstract—Image stitching is challenging in consumer-level photography, due to alignment difficulties in unconstrained shooting environment. Recent studies show that seam-cutting approaches can effectively relieve artifacts generated by local misalignment. Normally, seam-cutting is described in terms of energy minimization, however, few of existing methods consider human perception in their energy functions, which sometimes causes that a seam with minimum energy is not most invisible 7 in the overlapping region. In this paper, we propose a novel 1 perception-based energy function in the seam-cutting frame- 0 work, which considers the nonlinearity and the nonuniformity 2 of human perception in energy minimization. Our perception- n based approach adopts a sigmoid metric to characterize the a perception of color discrimination, and a saliency weight to J simulatethathumaneyesinclinetopaymoreattentiontosalient 2 objects. In addition, our seam-cutting composition can be easily Fig.1. Acomposedresultcomparisonbetweendifferentenergyfunctions.(a) 2 implemented into other stitching pipelines. Experiments show Overlappingregion.(b)Composedresultcorrespondingtothenormalenergy that our method outperforms the seam-cutting method of the function.(c)Composedresultcorrespondingtoourperception-basedenergy ] normalenergyfunction,andauserstudydemonstratesthatour function. V composed results are more consistent with human perception. C Index Terms—Image stitching, seam-cutting, energy function, . human perception. required. Jia and Tang [17] associated the smoothness term s with gradient smoothness and gradient similarity, to reduce c [ structure complexity along the seam. Zhang et al. [18] com- I. INTRODUCTION bined alignment errors and a Gaussian-metric color difference 1 IMAGE stitching is a well studied topic in computer vi- in their energy function, to handle misaligned areas with v 1 sion [1], which mainly consists of alignment [2]–[5], com- similar colors. However, few of existing methods consider 4 position [6]–[10] and blending [11]–[13]. In consumer-level human perception in their energy functions, which sometimes 1 photography, it is difficult to achieve perfect alignment due causesthataseamwithminimumenergyisnotmostinvisible 6 to unconstrained shooting environment, so image composition in the overlapping region. 0 becomesthemostcrucialsteptoproduceartifacts-freeresults. . Seam-cuttinghasalsobeenappliedinimagealignment.Gao 1 Seam-cutting [14]–[18] is a powerful composition method, etal.[22]proposedaseam-drivenimagestitchingframework, 0 which intends to find an invisible seam in the overlapping which finds a best homography warp from some candidates 7 region of aligned images. Mainstream algorithms usually 1 with minimal seam costs instead of minimal alignment errors. : express the problem in terms of energy minimization and Zhang and Liu [23] combined homography and content- v minimize it via graph-cut optimization [19]–[21]. Normally, i preserving warps to locally align images, where seam costs X for a given overlapping region of aligned images, different areusedasaqualitymetrictopredicthowwellahomography r energy functions correspond to different seams, and certainly enables plausible stitching. Lin et al. [24] proposed a seam- a correspond to different composed results (see Fig. 1). Con- guided local alignment, which iteratively improves warping versely,inordertoobtainaplausiblestitchingresult,wedesire by adaptive feature weighting according to their distances to todefineaperception-consistentenergyfunction,suchthatthe current seams. most invisible seam possesses the minimum energy. Inthispaper,weproposeanovelseam-cuttingmethodviaa Recently, many efforts have been devoted to seam-cutting perception-basedenergyfunction,whichtakesthenonlinearity by penalizing the photometric difference using various energy and the nonuniformity of human perception into account. Our functions. A Euclidean-metric color difference is used in [14] proposed method consists of three stages (see Fig. 2). In to define the smoothness term in their energy function, and the first stage, we calculate a sigmoid-metric color difference a gradient difference is taken into account in [15]. Eden et of the given overlapping region as the smoothness term, to al. [16] proposed an energy function that allows for large characterize the perception of color discrimination. Then, we motions and exposure differences, but the camera setting is calculate an average pixel saliency of the given overlapping region as the saliency weight, to simulate that human eyes N. Li is with the Center for Applied Mathematics, Tianjin University, Tianjin300072,China.E-mail:[email protected]. incline to pay more attention to salient objects. Finally, we T. Liao is with the Center for Combinatorics, Nankai University, Tianjin minimize the perception-based energy function by the graph- 300071,China.Email:[email protected]. cut optimization, to obtain the seam and the corresponding C. Wang is with the Department of Software, Nankai University, Tianjin 300071,China.Email:[email protected]. composed result. Experiments show that our method outper- 2 Fig.2. Aprocesscomparisonbetweenthenormalseam-cuttingframeworkandourproposedseam-cuttingframework.(a)Overlappingregion.(b)Euclidean- metriccolordifference.(c)Sigmoid-metriccolordifference.(d)Averagepixelsaliency.(e)(f)Correspondingseams.(g)(h)Correspondingstitchingresults. formstheseam-cuttingmethodofthenormalenergyfunction, where N ⊂ P ×P is a neighborhood system of pixels. The and a user study demonstrates that our composed results are data term D (l ) represents the cost of assigning a label l to p p p more consistent with human perception. a pixel p∈P, and the smoothness term S (l ,l ) represents p,q p q Majorcontributionsofthepaperaresummarizedasfollows. thecostofassigningapairoflabels(l ,l )toapairofpixels p q 1) Weproposedanovelperception-basedenergyfunctionin (p,q)∈N. the seam-cutting framework. The data term is defined as 2) Our composition method can be easily implemented into D (1)=0, D (0)=µ, if p∈∂I ∩∂P, p p 0 other stitching pipelines. D (0)=0, D (1)=µ, if p∈∂I ∩∂P, (2) p p 1 D (0)=D (1)=0, otherwise, p p II. APPROACH whereµisaverylargepenaltytoavoidmislabeling,∂I ∩∂P In this section, we first show more details of the normal k is the common border of I (k =0,1) and P (marked in red seam-cuttingframework,thenanovelperception-basedenergy k andbluerespectivelyinFig.1(a)).Infact,thedatatermD (l ) functionisdescribed,andfinallyweproposeourseam-cutting p p fixes the endpoints of the seam as the intersections of the two framework. colored polylines. The smoothness term is defined as A. Normal Seam-cutting Framework 1 Given a pair of aligned images denoted by I and I , let S (l ,l )= |l −l |(I (p)+I (q)), (3) 0 1 p,q p q 2 p q ∗ ∗ P be their overlapping region and L = {0,1} be a label set, where “0” corresponds to I0 and “1” corresponds to I1, then I∗(·)=(cid:107)I0(·)−I1(·)(cid:107)2, (4) a seam means assigning a label l ∈ L to each pixel p ∈ P. p whereI (·)denotestheEuclidean-metriccolordifference(see ∗ The goal of seam-cutting is to find a labeling l (i.e., a map Fig. 2(b)). from P to L) that minimizes the energy function Finally, the normal energy function (1) is minimized by E(l)= (cid:88)D (l )+ (cid:88) S (l ,l ), (1) graph-cut optimization [19] to obtain the seam (see Fig. 2(e)) p p p,q p q and the composed result (see Fig. 2(g)). Obviously, the defi- p∈P (p,q)∈N 3 (a) (b) (c) (d) (a) (b) (c) (d) Fig.3. Toyexample.(a)(c)VisualizationsofEuclidean-metriccolordifference Fig.4. Toyexample.(a)Visualizationofsigmoid-metriccolordifference.(c) andsigmoid-metriccolordifference.(b)(d)Correspondingseams. Visualizationofaveragepixelsaliency.(b)(d)Correspondingseams. nition of the energy function plays the most important role in Now, the smoothness term is modified as the seam-cutting framework. 1 S˜ (l ,l )= |l −l |(I (p)+I (q)), (6) p,q p q 2 p q † † B. Perception-based Energy Function I (·)=sigmoid(I (·)), (7) † ∗ In experiments, the seam denoted by l , that minimizes the ∗ where I (·) denotes the sigmoid-metric color difference. normal energy function (1) is sometimes not most invisible in † Fig. 2(c) shows that I (·) forces the misalignment area more P. In other words, there exists a seam denoted by l , that is † † distinguishable from the alignment area than I (·), which more invisible but has a greater energy than l (see Fig. 2 (e) ∗ ∗ effectively helps the seam avoid crossing the misalignment and(f)).Therefore,wedesiretodefineaperception-consistent area. energy function, such that the most invisible seam possesses 2) Saliency weights: Fig. 4 shows another toy example the minimum energy. where l is not most invisible. In fact, seams l and l shown 1) Sigmoid metric: Fig. 3 shows a toy example where l is ∗ ∗ † ∗ in (b) and (d) respectively, both cross the local misalignment notmostinvisible.Infact,theseaml shownin(b)crossesthe ∗ area. Though the energy of l is greater, it is more invisible local misalignment area (marked in light blue in (a)), because † than l in aspect of human perception, because the location the Euclidean-metric color difference does not give it a large ∗ where its artifact arises is less remarkable than l . enough penalty. In contrast, the seam l shown in (d) avoid ∗ † Inparticular,theperceptionofimagesisnonuniform,which the local misalignment area (marked in red in (c)), because meansthathumaneyesinclinetopaymoreattentiontosalient the sigmoid-metric color difference successfully distinguish it objects. Thus artifacts in salient regions are more remarkable from the alignment area. than artifacts in non-salient regions. In order to benefit from In particular, the perception of colors is nonlinear as it has these observations, we define a saliency weight a color discrimination threshold, which means human eyes cannot differentiate some colors from others even if they are (cid:26) 0, if p|q ∈∂ P, (cid:93) W = (8) different. Let τ denote the threshold, the perception of color p,q 1+ ω(p)+ω(q), otherwise, 2 discrimination can be characterized as where ω(·) denotes the average pixel saliency of P (see • if I∗(·)<τ, color difference is invisible, Fig. 2(d)). We normalize W in the range of [1,2] to avoid • if I∗(·)≈τ, sensitivity of discrimination rises rapidly, over-penalizing saliency weipg,hqts. As stitching results are usu- • if I∗(·)>τ, color difference is visible. ally cropped into rectangles in consumer-level photography, We want to define a quality metric to measure the visibility we assign W =0 if either p or q is located in the common p,q of color difference, such that the cost of invisible terms border∂ P ofthecanvasandP (markedingreeninFig.2(a)). (cid:93) approximateszerowhilethecostofvisibletermsapproximates Finally, the perception-based energy function is defined as one. Fortunately, the sigmoid function E˜(l)= (cid:88)D (l )+ (cid:88) W ·S˜ (l ,l ), (9) 1 p p p,q p,q p q sigmoid(x)= , (5) 1+e−4κ(x−τ) p∈P (p,q)∈N is a suitable quality metric for our purpose. where W rises the penalty of S˜ (l ,l ) according to ω(·). p,q p,q p q Next, we will show how to determine the parameters τ and Fig. 2(f) shows that the endpoints of the seam have more κ. Briefly, given an overlapping region P of aligned images, freedom on ∂ P than the seam shown in Fig. 2(e). (cid:93) the threshold τ plays the role of roughly dividing P into an alignmentareaandamisalignmentareabyitscolordifference, C. Proposed Seam-cutting Framework which is similar to determine a threshold to divide a binary Ourseam-cuttingframeworkissummarizedinAlgorithm1. imageintoabackgroundregionandaforegroundregion.Thus, weemploythewell-knownOstu’salgorithm[25]todetermine asuitableτ withthemaximumbetween-classvariance.Onthe III. EXPERIMENTS other hand, κ represents how rapidly the sensitivity of color Inourexperiments,first,weuseSIFT[27]toextract/match discrimination rises around τ. Normally, κ=1/(cid:15) will have a features,useRANSAC[28]todetermineaglobalhomography good practical performance, where (cid:15) is the width of bins of and align input images. Then, for the overlapping region, we the histogram used in Ostu’s algorithm. useOstu’salgorithm[25]toestimateathresholdτ ((cid:15)=0.06), 4 Fig.5. Anexperimentalcomparisonbetweenthenormalseam-cuttingframeworkandourperception-basedseam-cuttingframework.Allstitchingresultsare croppedintorectangles. Algorithm 1 Perception-based seam-cutting framework. Input: An overlapping region P of aligned images I and I . 0 1 Output: A stitching result. 1) Calculate I (P) in Eq. (4); ∗ 2) Calculate τ in Eq. (5) via Ostu’s algorithm [25]; 3) Calculate I (P) in Eq. (7) and S˜ in Eq. (6); † p,q 4) Calculateω(P)viasalientobjectdetection[26]andW p,q in Eq. (8); 5) Calculate D (P) in Eq. (2); p 6) MinimizeE˜(l)inEq.(9)viagraph-cutoptimization[19], and blend I and I via gradient domain fusion [12]. 0 1 andusesalientobjectdetection[26]tocalculatepixelsaliency Fig.6. Userstudy.Redrepresentsthenormalseam-cuttingframeworkwins. Blue represents our perception-based seam-cutting framework wins. Yellow weights. Finally, we use graph-cut optimization [19] to obtain representsaneven. a seam, and blend aligned images via gradient domain fu- sion [12] to create a mosaic. (make a choice from 3 options: 1. A is better than B, 2. B Fig. 5 shows some experimental comparisons between two is better than A, 3. A and B are even). Fig. 6 shows the user seam-cutting frameworks. Input images in the second group study result, which demonstrates that our stitching results win come from the dataset in [23]. Due to unconstrained shooting most users’ favor. environment,thereexistlargeparallaxintheseexamples,such thataglobalhomographycanhardlyalignthem.Insuchcases, the normal seam-cutting framework fails to produce artifact- IV. CONCLUSION free results, while our perception-based seam-cutting frame- In this paper, we propose a novel perception-based energy work successfully creates plausible mosaics. More results function in the seam-cutting framework, to handle image and original input images are available in the supplementary stitching challenges in consumer-level photography. Experi- material. ments show that our method outperforms the seam-cutting Inordertoinvestigatewhetherourproposedmethodismore method of the normal energy function, and a user study consistent with human perception, we conduct a user study demonstrates that our results are more consistent with human for comparing two seam-cutting frameworks. We invite 15 perception. In the future, we plan to generalize our method in participants to rank 15 unannotated groups of stitching results the seam-driven framework to deal with image alignment. 5 REFERENCES [15] A. Agarwala, M. Dontcheva, M. Agrawala, S. Drucker, A. Colburn, B. Curless, D. Salesin, and M. Cohen, “Interactive digital photomon- [1] R. Szeliski, “Image alignment and stitching: A tutorial,” Technical tage,” ACM Transactions on Graphics, vol. 23, no. 3, pp. 294–302, ReportMSR-TR-2004-92,MicrosoftResearch,2004. 2004. [2] R. Szeliski and H.-Y. Shum, “Creating full view panoramic image [16] A. Eden, M. Uyttendaele, and R. Szeliski, “Seamless image stitching mosaics and environment maps,” in Proceedings of the 24th Annual ofsceneswithlargemotionsandexposuredifferences,”inProc.IEEE Conference on Computer Graphics and Interactive Techniques, ser. Conf.Comput.VisionPatternRecognit.,vol.2,2006,pp.2498–2505. SIGGRAPH ’97. ACM Press/Addison-Wesley Publishing Co., 1997, [17] J. Jia and C.-K. Tang, “Image stitching using structure deformation,” pp.251–258. IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 4, pp. 617–631, [3] M.BrownandD.G.Lowe,“Automaticpanoramicimagestitchingusing Apr.2008. invariant features,” Int. J. Comput. Vision, vol. 74, no. 1, pp. 59–73, [18] G. Zhang, Y. He, W. Chen, J. Jia, and H. Bao, “Multi-viewpoint 2007. panoramaconstructionwithwide-baselineimages,”IEEETransactions [4] J. Gao, S. J. Kim, and M. S. Brown, “Constructing image panoramas onImageProcessing,vol.25,no.7,pp.3099–3111,2016. usingdual-homographywarping,”inProc.IEEEConf.Comput.Vision [19] Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy min- PatternRecognit.,2011,pp.49–56. imization via graph cuts,” IEEE Trans. Pattern Anal. Mach. Intell., [5] J. Zaragoza, T.-J. Chin, M. S. Brown, and D. Suter, “As-projective- vol.23,no.11,pp.1222–1239,Nov.2001. as-possible image stitching with moving DLT,” in Proc. IEEE Conf. [20] Y. Boykov and V. Kolmogorov, “An experimental comparison of min- Comput.VisionPatternRecognit.,2013,pp.2339–2346. cut/max-flowalgorithmsforenergyminimizationinvision,”IEEETrans. [6] S.Peleg,“Eliminationofseamsfromphotomosaics,”ComputerGraph- PatternAnal.Mach.Intell.,vol.26,no.9,pp.1124–1137,Sept.2004. icsandImageProcessing,vol.16,no.1,pp.90–94,1981. [21] V.KolmogorovandR.Zabin,“Whatenergyfunctionscanbeminimized [7] M.-L. Duplaquet, “Building large image mosaics with invisible seam viagraphcuts?”IEEETrans.PatternAnal.Mach.Intell.,vol.26,no.2, lines,” in Aerospace/Defense Sensing and Controls. International pp.147–159,Feb.2004. SocietyforOpticsandPhotonics,1998,pp.369–377. [22] J. Gao, Y. Li, T.-J. Chin, and M. S. Brown, “Seam-driven image [8] J.Davis,“Mosaicsofsceneswithmovingobjects,”inProc.IEEEConf. stitching,”Eurographics,pp.45–48,2013. Comput.VisionPatternRecognit.,1998,pp.354–360. [23] F.ZhangandF.Liu,“Parallax-tolerantimagestitching,”inProc.IEEE [9] A.A.EfrosandW.T.Freeman,“Imagequiltingfortexturesynthesisand Conf.Comput.VisionPatternRecognit.,2014,pp.3262–3269. transfer,” in Proceedings of the 28th Annual Conference on Computer [24] K.Lin,N.Jiang,L.-F.Cheong,M.Do,andJ.Lu,“Seagull:Seam-guided GraphicsandInteractiveTechniques,ser.SIGGRAPH’01. ACM,2001, localalignmentforparallax-tolerantimagestitching,”inProc.14thEur. pp.341–346. Conf.Comput.Vision,2016,pp.370–385. [10] A.MillsandG.Dudek,“Imagestitchingwithdynamicelements,”Image [25] N. Otsu, “A threshold selection method from gray-level histograms,” andVisionComputing,vol.27,no.10,pp.1593–1602,2009. Automatica,vol.11,no.285-296,pp.23–27,1975. [11] P.J.BurtandE.H.Adelson,“Amultiresolutionsplinewithapplication [26] J.Zhang,S.Sclaroff,Z.Lin,X.Shen,B.Price,andR.Mech,“Minimum to image mosaics,” ACM Transactions on Graphics, vol. 2, no. 4, pp. barrier salient object detection at 80 fps,” in Proc. IEEE Int. Conf. on 217–236,1983. Comput.Vision,2015,pp.1404–1412. [12] P. Pe´rez, M. Gangnet, and A. Blake, “Poisson image editing,” ACM [27] D.G.Lowe,“Distinctiveimagefeaturesfromscale-invariantkeypoints,” TransactionsonGraphics,vol.22,no.3,pp.313–318,2003. Int.J.Comput.Vision,vol.60,no.2,pp.91–110,2004. [13] A.Levin,A.Zomet,S.Peleg,andY.Weiss,“Seamlessimagestitching [28] M.A.FischlerandR.C.Bolles,“Randomsampleconsensus:aparadigm inthegradientdomain,”inProc.8thEur.Conf.Comput.Vision,2004, for model fitting with applications to image analysis and automated pp.377–389. cartography,”Commun.ACM,vol.24,no.6,pp.381–395,1981. [14] V. Kwatra, A. Scho¨dl, I. Essa, G. Turk, and A. Bobick, “Graphcut textures:imageandvideosynthesisusinggraphcuts,”ACMTransactions onGraphics,vol.22,no.3,pp.277–286,2003.