Decomposition of bipartite and multipartite unitary gates into the product of controlled unitary gates Lin Chen1,2 and Li Yu2,3,∗ 1Department of Mathematics, Beijing University of Aeronautics and Astronautics, Beijing 100191, P. R. China 2Singapore University of Technology and Design, 20 Dover Drive, Singapore 138682 3National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan (Dated: March 19, 2015) We show that any unitary operator on the dA ×dB system (dA ≥ 2) can be decomposed into the product of at most 4dA − 5 controlled unitary operators. The number can be reduced to 5 2dA−1 when dA is a power of two. We also prove that three controlled unitaries can implement a bipartite complex permutation operator, and discuss the connection to an analogous result on 1 0 classicalreversiblecircuits. Wefurthershowthatanyn-partiteunitaryonthespaceCd1⊗···⊗Cdn 2 istheproductof atmost [2Qnj=−11(2dj−2)−1]controlled unitarygates, each ofwhich iscontrolled from n−1 systems. We also decompose any bipartite unitary into the product of a simple type of r bipartite gates and some local unitaries. We derive dimension-independent upper bounds for the a M CNOT-gate cost or entanglement cost of bipartite permutation unitaries (with the help of ancillas of fixed size) as functions of the Schmidt rank of the unitary. It is shown that such costs under a simple protocol are related to the log-rank conjecture in communication complexity theory via the 8 link of nonnegative rank. 1 ] PACSnumbers: 03.65.Ud,03.67.Lx,03.67.Mn h p - I. INTRODUCTION taryU astheminimumbipartitedepthamongallunitary t n circuits for U that do not use ancillas. Formally, it is a u Theimplementationofunitaryoperationsisakeytask c(U):=min{k|U =U U ···U , U ∈S}, (1) 1 2 k i q in quantum information processing. Unitary operators [ can be implemented by passive linear optical devices [1]. whereS isthe setofbipartite controlledunitariesonthe It is known that any unitary operation on two or more samespacethatU actson. Studyingtheboundsforc(U) 2 partiescanbedecomposedintotheproductofcontrolled and the corresponding decomposition of U is the main v 8 unitary gates [2, 3]. Two-qubit controlled unitaries can problem in this paper. Indeed, it is a special case of the 0 beimplementedwithhighcoherenceanddynamicalcou- problem of quantum circuit decomposition using general 7 pling [4]. Suppose that a bipartite unitary U on systems controlled unitaries with the help of local unitaries. It 2 A,B istheproductofk bipartitecontrolledunitaries,in- is special in the sense that there are only two systems 0 terspersed with local unitaries [5]. We call the integer k but the general problem allows many systems. There . 1 as the bipartite depth of the circuit under the bipartite has been study on decompositions using CNOT or other 0 cut A-B. The depth, width and total number of basic two-qubitcontrolledgates,orspecificclassesoftwo-qudit 5 gates are often quantities of interest in quantum circuit controlled gates [2, 3, 6, 7]. For example, Shende et al. 1 design, where the basic gates refer to some fixed type [8] shows that any three-qubit unitary can be written : v of two-qubit gates such as the controlled-NOT (CNOT) as the product of 20 CNOT gates and some one-qubit i gate. For implementing the same unitary operation, it unitaries. Another motivation to study the problem is X is conceivable that there may be a tradeoff between the to better understand the structure of nonlocal unitaries r a depth and the total number of basic gates. Nonetheless and the resources needed to implement them, see the the bipartite depth does give anupper bound for the to- comment just before Section IIIA. tal number of basic gates, as discussed in Sec. V of this We restrictto bipartite controlledgates as the type of paper. The nonlocal gates need much longer time than nonlocalgatesinthe definitionofbipartite depth for the local gates to implement, because the systems may be followingreasons. First,itiseasytodefine,andasmaller far fromeach other. Then the bipartite depth is a rough class of gates seems not powerful enough. It is hard to measure of time needed by the circuit. By allowinglocal find a larger class of easily definable gates that do not unitary freedom in the definition of controlled unitaries includeallbipartitegates. TheFourierhierarchy[9]con- (in Sec. II), from now on we will drop the phrase “inter- cerns the number of tensor products of Hadamard gates spersed with local unitaries” from the definition of the inacircuitthatalsocontainsbasis-preservinggates. The bipartite depth. basis-preservinggatesarealsocalledthecomplexpermu- We define the bipartite depth of a given bipartite uni- tation gates, and are discussed later in this paper. They permute among computational-basis states and apply a phase to each state. However the basis-preserving gates are generally nonlocal with respect to a bipartite parti- ∗Electronicaddress: [email protected] tion of the qubits. If we modify the definition of Fourier 2 hierarchy and apply it to the bipartite scenario so as to edge. InSec.III westudy the decompositionofbipartite allow some finite set of bipartite gates and arbitrary lo- unitary operators using controlled unitaries, and com- cal gates, then such a set of bipartite gates would have ment on the connections with results in the literature. a discrete set of entangling power,which is not desirable In Sec. IV we define the “controlled-type” multipartite for defining a smooth depth measure. Second, the con- unitaries and discuss the decomposition of multipartite trolled unitaries are analogous to some components in operators into the product of these gates. We also show protocolswithlocaloperationsandclassicalcommunica- thatthreecontrolled-permutationmatricesareenoughto tion(LOCC).Theyareamajortypeofprotocolsstudied decompose any complex permutation matrix. In Sec. V inquantuminformationtheory. TheLOCCprotocolsof- we define the standard gates and discuss the decompo- ten allow projective measurements on some subsystems. sition of bipartite unitaries using these gates and local Aprojectivemeasurementandthesubsequentclassically unitaries. In Sec. VI we discuss the relationshipbetween controlled unitary operations can be made part of a co- the Schmidt rank of the unitary and the form of the de- herentquantumcircuitbyrewritingthemasacontrolled composition, and we discuss bipartite permutation uni- unitary. Thus our measure is analogousto the rounds of taries in particular. In Sec. VII we discuss the use of classical communications in such protocols. local ancillas. We conclude in Sec. VIII. Generally we consider unitaries acting on d × d A B dimensional systems. The results of [2, 3] imply that c(U) ≤ µd4 when d = d , where µ is a positive con- II. PRELIMINARIES A A B stant, and the type of bipartite controlled gates used are limited to controlled-increment gates. In Theorem In this section we introduce the preliminary knowl- 4, we obtain a tighter bound c(U) ≤ 4d −5 for arbi- A edge used in the paper. Denote the computational-basis trary d ,d at the cost of allowing the use of arbitrary A B states of the bipartite Hilbert space H = H ⊗H by A B controlled-unitarygatesinthe decomposition. Thesame |i,ji,i = 1,··· ,d , j = 1,··· ,d . Let I and I be A B A B theoremshowsthatthe boundcanbe further reducedto theidentityoperatorsonthe spacesH andH ,respec- A B 2d −1 when d is a power of 2. We also prove that A A tively. Any bipartite unitary gate U acting on H has c(U) ≤ 3 when U is a complex permutation matrix in Schmidt rank (denoted as Sch(U)) equal to n if there is Theorem 7, based on the concept of absolute singularity an expansion of the form U = n A ⊗B where the studiedinLemma6. Thisresultisappliedtoclassicalre- j=1 j j dA ×dA matrices A1,··· ,An aPre linearly independent, versiblecircuits[10,11]inCorollary8. Theaboveresults andthed ×d matricesB ,··· ,B arealsolinearlyin- B B 1 n are based on the sandwich form of bipartite unitaries, dependent. An equivalent definition is in [13, 14], where constructed in Definition 2 and Lemma 3. We further it is called the operator-Schmidtrank. Next, U is a con- generalize our observationto multipartite systems based trolled unitary gate, ifU isequivalentto dA |jihj|⊗U on the generalized sandwich form. We show that any n- j=1 j partite unitaryonthe spaceCd1⊗···⊗Cdn has a gener- or dj=B1Vj ⊗ |jihj| via local unitaries. PTo be specific, alized[2 n−1(2d −2)−1]-sandwichforminProposition U iPs a controlled unitary from the A or B side, respec- j=1 j tively. In particular, U is controlled in the computa- 9. We alsQoproposeamoreefficientgeneralizedsandwich form for n = 4 in Proposition 10. In Proposition 11, tional basis from the A side if U = dj=A1|jihj| ⊗ Uj. we show that any n-partite complex permutation uni- Bipartite unitary gates of Schmidt raPnk two or three tary has a generalized(2n−1)-sandwichform composed are in fact controlled unitaries [15–17]. We have gen- of controlled-complex-permutationunitaries. eralized controlled unitaries to block-controlled unitary gates [16]. We split the space H into a direct sum: We also discuss the decompositionof any unitary gate A H = ⊕m H , m > 1, DimH = m , and H ⊥ H using “standard” gates proposed in Definition 12. They A i=1 i i i i j for distinct i,j =1,··· ,m. Then U is a block-controlled effectively only act ontwo qubits as controlledunitaries, unitary (BCU) gate controlled from the A side, if U is and may be more easily carried out in experiments. locally equivalent to m mi |u ihu |⊗V where We show that any bipartite unitary is the product of i=1 j,k=1 ij ik ijk 2(dA−1)2⌊d2B⌋+(2dA−3)(dB−1)⌊d2A⌋ standard gates {|ui,1i,··· ,|ui,mii} isPan oPrthonormal basis of Hi. Note interspersed with local unitaries in Proposition 15. The that the Vijk are not necessarily unitary. By definition number reduces to three for dA = dB = 2, which is the every controlled unitary with dA,dB ≥ 2 is a BCU. The smallestnumberofcontrolledunitariesneededforthede- BCU will be used in the proof of Theorem 4, as well as composition of two-qubit unitary gates [12]. In Sec. VI in the decomposition of any bipartite unitary into the we discuss the relationship between the Schmidt rank of product of three BCUs in Corollary 5. theunitaryandthenumberofcontrolledunitariesneeded to decompose it. We give a class of examples where the number of controlled unitaries is upper bounded by a III. DECOMPOSITION OF BIPARTITE constant, but the Schmidt rank of the target unitary is UNITARY OPERATORS arbitrarily large. Therestofthepaperisorganizedasfollows. InSec.II It is known [12] that three controlled gates are suffi- we introduce some definitions and preliminary knowl- cient and necessary for the decomposition of a general 3 two-qubit unitary, and there is always a decomposition and the party that does the controlling alternates between using 3 CNOT gates and some one-qubit unitaries. For A for odd i and B for even i. implementing a two-qubit SWAP gate by local unitaries (ii) We refer to the m-A form of a bipartite unitary U, and some number of CNOT gates without the use of an- in the sense that U = U U ···U , where any U is a 1 2 m i cillas (this conditionof no ancillas is implied throughout controlled unitary controlled from the A side. the paper unless stated otherwise), three CNOT gates are necessary and sufficient [12]. We generalize this fact Using this definition we present the following result as to the SWAP gates of arbitrary dimension. the first step to our question. Lemma 1 Denote the two-qudit SWAP gate acting on Lemma 3 (i)Any2×dB unitaryhasa3-sandwichform; d×d system as SWAPd. Then (ii) Any 2×dB unitary has a 3-A form; (i) the product of the SWAP gate and any controlled (iii)Thereexistsa2×2unitarythatcannotbetheproduct d unitary has Schmidt rank d2; of two controlled unitaries. (ii) For implementing a SWAP gate by local unitaries d Proof. (i)Forany2×d unitaryM,therearetwolocal and some number of controlled unitary gates, three con- B unitaries E,F on H such that M = (I ⊗E)U(I ⊗ trolled unitaries are necessary and sufficient. B A A F), where U = 1 |iihj|⊗U and U is a d ×d i,j=0 ij 00 B B Proof. (i) There are orthonormal bases of HA and diagonal matrix.PSince U is unitary, the columns of U10 HB (denotedby{|ii}A and{|ji}B)suchthatthe matrix are pairwise orthogonal, and the rows of U01 are also representation of the SWAP gate in such bases has el- pairwise orthogonal. Let V,W be two d ×d unitaries d B B ements of the form hi|Ahj|BU|kiA|liB = δilδjk. Because such that both VU10 and U01W are diagonal matrices theSWAP gateeffectivelyperformsthephysicalswapof withallelementsrealandnon-negative. LetU =|0ih0|⊗ d 1 two systems, which is basis-independent, the above par- IB+|1ih1|⊗V and U2 =|0ih0|⊗IB+|1ih1|⊗W be two ticular matrix representation is invariant under simulta- controlled unitaries from the A side, we have neousunitarysimilaritytransform(simultaneousunitary changeofbasis)onthetwolocalsystems. Thenassertion U =U UU = U00 U01W . (2) (i) follows from straightforwardcomputation, by writing 3 1 2 (cid:18)VU10 VU11W (cid:19) the matrix for the SWAP gate in the form above and d assuming one of the local bases is the local controlling Since U is unitary, we have U01W = VU10. The ma- basis for the controlled unitary. trix U3 is a 2 × dB bipartite unitary of Schmidt rank (ii) Any controlled unitary on H has Schmidt rank at at most 3, so it is a controlled unitary from the B side most d. It follows from assertion (i) that the SWAP [15, 16]. We have proved that U is the product of three d gate is the product of at least three controlled unitaries. controlled unitaries U1†,U3, and U2†. There exist suit- It is knownthat the SWAPd gate is the product of three able local unitaries S = IA ⊗ XB and T = IA ⊗ YB, controlledunitarygates[18]. Soassertion(ii)holds. This so that SU3T is controlled in the computational basis completes the proof. ⊓⊔ of H . Hence U = (U†S†)(SU T)(T†U†) is a decom- B 1 3 2 position with each of the three parts controlled in the For the two-qubit SWAP gate, using the general con- computational basis of H or H . Therefore M = trolled unitaries in its decomposition does not save any A B controlled unitary compared to using CNOT gates. One (I ⊗E)(U†S†) (SU T) (T†U†)(I ⊗F) isexactlya A 1 3 2 A might expect that this is the general case, i.e., the im- h3-sandwich form.iHence thhe assertion holdsi. plementation of a bipartite unitary is the same when we (ii)Fromtheproofof(i),weknowthatany2×d uni- B use controlled unitaries or only CNOT gates. However, tary U has a 3-sandwich form. Let U = V V V where 1 2 3 the two-qubit gate exp(iaσ1 ⊗ σ1) with the Pauli ma- V1,V3 are controlled unitaries controlled in the compu- trix σ = 0 1 any a 6= kπ/4, k ∈ Z cannot be im- tational basis of HA, and V2 is a controlled unitary con- 1 (cid:18)1 0(cid:19) trolled in the computational basis of H . Since V is B 2 plemented using one CNOT gate and single qubit gates controlled in the computational basis of H , one can B only,sincetheentanglingpowerofsuchgateis notequal write V = 1 |iihj|⊗V where all V are diagonal 2 i,j=0 ij ij to that of the CNOT gate. We will show in Theorem 4 matrices. BPy multiplying V2 with two suitable diagonal thatforthegenerald×dbipartitesystemthatusingcon- controlled unitaries respectively from the left and right trolled unitaries might be better than the d-dimensional side, we can make allentries of V ,V and V realand 00 01 10 CNOT gates, in the sense that they require fewer such non-negative,andtheentriesofV realandnon-positive. 11 two-qudit gates. For this purpose we introduce a special Since V is unitary, we have V = −V and V = V . 2 00 11 01 10 decomposition of bipartite unitaries. So V has Schmidt rank at most two. It is controlled 2 from the A side [15]. The inverse of all diagonal unitary Definition 2 (i) We refer to the m-sandwich form of a operators taken above are also diagonal, so they can be bipartite unitary U, in the sense that U = U U ···U , absorbed by V and V . The latter are still controlled 1 2 m 1 3 where each U is a controlled unitary, being controlled in unitaries from the A side in the computational basis. So i the computational basis on the respective Hilbert space, U =V V V is a 3-A form and the assertion holds. 1 2 3 4 (iii)The assertionfollowsfromLemma1,whichshows where X′ is a unitary acting on H. So X is a BCU thatthe two-qubitSWAP gateis aproductofthreecon- controlled from the A side, and trolledunitaries,andnofewer. Thiscompletestheproof. ⊓⊔ U =XW†V†. (7) When dB = 2 namely the unitary acts on two-qubit ByregardingW′ asa2×yd bipartiteunitaryandusing B states, assertion (ii) has been proved as the statement Lemma3,weobtainthatW′ hasa3-sandwichform. Let that any two-qubit unitary has the so-called canonical form[19, 20]. It has been shownthatany two-qubituni- (W′)† =CTD, (8) tary does not have Schmidt rank three [13]. For readers’ reference,theSchmidt-rank-threemultiqubitunitaryhas whereC,D areboththedirectsumoftwounitarieseach been investigated and constructed in [15, 17]. of order yd , and B Now we are in a position to give an upper bound of c(U) and the associated method of decomposing the bi- ydB T = W ⊗|iihi| (9) partite unitary U. i Xi=1 Theorem 4 Let U be a bipartite unitary on the dA×dB with some unitaries Wi of order two. So C,D and T system. Then can all be regardedas 2y×d bipartite unitaries on the B (i) U has a (2⌈log2dA⌉+1−1)-sandwich form. Hence subspace H ⊗H . Using (5), (7), and (8), we have A2 B c(U)≤2⌈log2dA⌉+1−1≤4dA−5, (3) U =X(CTD+IA2⊥ ⊗IB)V† =(XC˜)T˜(D˜V†), (10) for any d ≥ 2. In particular, c(U) ≤ 2d −1 when d where A A A is an integer power of 2. (ii) If all bipartite unitaries on the dA×dB system with C˜ =C+IA2⊥ ⊗IB, (11) aod(d2ddA−≥13)-hsaanvedw(i2cdhAf−orm1)-fsoarnadnwyicehvefonrmds,≥th2e.n U has D˜ =D+IA2⊥ ⊗IB, (12) A A T˜ =T +IA2⊥ ⊗IB. (13) Proof. (i)Onecaneasilyshowthatthesecondinequal- ity in (3) holds. In particular its equality holds when Itfollowsfrom(9)thatT canberegardedasacontrolled dA = 2n+1 with any nonnegative integer n. Since the unitary on HA2 ⊗HB, controlled from the B side in the firstinequalityin(3)andthelastassertionof(i)bothfol- computationalbasis. Thisfactand(13)implythatT˜isa low from the first assertionof (i), it is sufficient to prove controlled unitary from the B side in the computational the latter. The assertion is trivial if d or d = 1, so basis. Next, it follows from (6) and (11) that XC˜ is a A B we assume d ,d ≥ 2. The proof is by induction over BCU, i.e., A B d . The assertion for d =2 with any d ≥2 is proven inALemma 3. In the follAowing we prove tBhe assertion for XC˜ =X1+X2, (14) a fixed d ≥ 3, under the induction hypothesis that the A where the bipartite unitaries X and X act on the sub- k×d bipartite unitary with any 2 ≤ k ≤ d −1 and 1 2 B A spaces H⊥ and H, respectively. Since DimH =y and d ≥2hasag(k)–sandwichform,whereforanypositive A1 B DimH⊥ = d −y, they are both smaller than d for integer j we define A1 A A any y = 1,2,··· ,⌊d /2⌋. It follows from the induction A g(j)=2⌈log2j⌉+1−1. (4) hypothesis that X1 and X2 have g(y) and g(dA − y)- sandwich forms, respectively. We have two decomposi- Let H ,H ⊆ H be two subspaces spanned by the tion A1 A2 A first y (y ≤ ⌊d /2⌋) and 2y computational basis kets, respectively. LeAt V =I ⊗I +V′ be a BCU where V′ g(y) g(dA−y) A1 B X = X , X = X , (15) is a bipartite unitary on the subspace H = H⊥ ⊗H . 1 1,i 2 2,i A1 B iY=1 iY=1 Let where for any odd and even i, the X is a controlled j,i W =W′+IA2⊥ ⊗IB (5) unitary from the A and B side, respectively. Then so is X +X , because X and X act on the subspaces 1,i 2,i 1,i 2,i be another BCU, where W′ is a bipartite unitary on the H⊥ and H, respectively. It follows from (4) and the subspaceHA2⊗HB,andIA2⊥ istheidentityoperatoron condition y ≤ ⌊dA/2⌋ that g(y) ≤ g(dA −y). This in- tthheestuopbsypdacerHowA⊥s2.ofWtheecamnafitrnixdparsoudituacbtlUeVV,,stuhcehntohnazterino equality, (14) and (15) imply XC˜ = gi=(y1)(X1,i+X2,i)· B g(dA−y) (I ⊗I +X ). These faQcts imply that XC˜ entries occur only in the first 2ydB columns. Then we j=g(y)+1 A1 B 2,j can find a suitable W such that the matrix product Qhas a g(dA−y)-sandwichform. Next using the same ar- gumentexceptthat(11)isreplacedby(12),onecanshow X :=UVW =I ⊗I +X′, (6) that D˜V† also has a g(d −y)-sandwich form. Third it A1 B A 5 followsfrom(4)thatg(j)isoddforanypositiveintegerj. d d2 freerealparametersinit,lessthanwhatisinacon- B A Fourthin the paragraphbelow (13), we haveshownthat trolled gate from the A side (so that a larger number of T˜ is a controlled unitary from the B side in the compu- thesewouldbeusediftheyareusedinsteadofcontrolled tational basis. Applying these four facts to (10) implies gates from the A side). Note that for two adjacent con- that the unitary U has an x-sandwich form where trolled gates, we have overestimated the number of free parameters,sincewhentheyarebothcontrolledfromthe x = min 2g(dA−y)+1 A side, the change of controlling basis on HA could be 1≤y≤⌊dA/2⌋(cid:0) (cid:1) viewed as a change in either of the controlled gates, and =2g(⌈d /2⌉)+1=g(d ). (16) A A generally,abipartitediagonalgatebetweentwoadjacent controlledgates can be absorbed into any of the two ad- The last two equalities in (16) follow from (4), and the jacent controlled gates. But such issues only affect the fact that ⌈log d ⌉ =⌈log (d +1)⌉ for odd d ≥3. So 2 A 2 A A count above by a lower order factor. (16) is exactly the first assertion of (i). We commentonthe connectionwiththe resultsin the (ii)Theproofisbyinductionoverevend ≥2. Theas- A literature. Our Lemma 3(i) in the special case that d sertionford =2withanyd ≥2isproveninLemma3. B A B is an integer power of 2 is the same as Theorem 10 of In the following we prove the assertion for a fixed even Shende et al. [8] (see also [21]). Our Theorem 4(i) in d ≥ 4, under the induction hypothesis that the k×d A B the case that d is an integer power of 2 can also be de- bipartiteunitarywithanyevenk ∈[2,d −1]andd ≥2 A A B rivedby recursivelyapplyingTheorem10of[8](the first hasa(2k−1)–sandwichform. Onecanverifythatthear- step of recursion is illustrated in Theorem 11 of [8], and gument fromthe paragraphbelow (4) to the secondsen- note that a gate controlled by multiple qubits belonging tence below (14) still applies here. We choose y = d /2 A to the same party is a controlled gate in our language). in the argument. If y is odd (respectively, even), then We abbreviate the details here. Therefore our result can the condition in (ii) (respectively, the induction hypoth- be viewed as a generalization of the results in [8] to the esis) implies that X and X in (14) both have(d −1)- 1 2 A general dimensions. Based on our result, it may be pos- sandwich forms, respectively. Hence (15) and the sub- sible to decompose any qudit circuit (with dimensions of sequent paragraph hold, except that g(j) is replaced by qudits notrequiredto be allequal)usingcontrolledtwo- 2j −1 for any positive integer j. Since d ≥ 2 is even, A quditunitaries. ThefollowingSec.IVcanbeviewedasa applying these facts to (10) implies that the unitary U stepinthisdirection,butwedonotdecomposethegates has an x-sandwich form where fullythere,allowingsomegatecontrolledbymultiplequ- x=2(d −1)+1=2d −1. (17) dits. There may be some extensions of the techniques A A in [8] to the case of higher dimensional qudits that can This completes the proof of assertion (ii). ⊓⊔ helpdecomposesuchmultiply-controlledgate. Thereare We do not know whether the condition in Theorem 4 some papers on decomposition of qudit circuits, such as (ii) can be satisfied, and we leave it as an open problem. [3, 6, 7]. It is possible that the methods in those papers As a byproduct of the theorem, it follows from (7) that may be combined with the results in this paper to give a better upper bound of the number of two-qudit (con- Corollary 5 Anybipartiteunitaryistheproductofthree trolled) gates needed. Apart from the application to cir- BCUs controlled from theA, B andA sides, respectively. cuit decomposition, the other potential application is to help study the nonlocal resource usage in implementing It is known that any two-qubit BCU is a controlled uni- nonlocal unitaries. Here the usage of nonlocal resources tary. Hence Lemma 3 (iii) implies that the two-qubit is to be optimized, and the local resources such as local CNOT gate cannot be the product of only two BCUs. unitaries are deemed as cheap. Section V is a step in In other word, the upper bound three in Corollary 5 is this direction,but itonlydiscussesthe costintermsofa tight. particular type of nonlocal gate (whose implementation The upper bound obtained in Theorem 4 is 4d −5 A cost is upper bounded by a constant), and not in terms and it is polynomially smaller than 4d4 obtained in [2]. A ofthemoreconventionalresourcessuchasentanglement. Compared to the latter, the implementation of a bipar- tite unitary by arbitrary controlled unitaries can indeed save quantum resources. Since the systems A and B are symmetricintheproblem,4d −5isalsoanupperbound A. Decomposition of complex permutation B matrices forthenumberofcontrolledgates. Weconsidertheopti- mality of the bound 4d −5 under the assumptions that A d ≤ d and that the number of controlled gates is a TheupperboundinTheorem4worksforarbitrarybi- A B function ofd only. By parametercounting,the 4d −5 partiteunitaries,anditincreaseslinearlywiththedimen- A A is already optimal up to a constant factor, because the sion. One may expect to have a constant upper bound entire unitary has d2d2 free real parameters in it, and forsomespecialbipartiteunitaries. Inthissubsectionwe A B eachcontrolledunitaryfromtheAsideandcontrolledin givesuchaboundforanycomplex permutation matrix in thecomputationalbasisofH hasd d2 freerealparam- Theorem7. Itis a unitary matrixwith one andonly one A A B etersinit,whileeachcontrolledgatefromtheB sidehas nonzero element on each row and column. When the 6 nonzero elements have no phases and are equal to one, absolutely singular, then the assertion follows from the it becomes the standard permutation matrix. The com- inductionhypothesisonX. Suppose X is absolutelysin- plex permutation matrix is mathematically known as a gular. By performing two suitable product permutation special monomial matrix, and has been used to charac- operators,respectively,fromtheleft-andright-handside terize the mutually unbiased bases [22, 23]. The com- of V, we may assume that V = 0 where j = 2,··· ,s, j,k plexpermutationgateisofinteresttothestudyofquan- k = t + 1,··· ,d , and d ≥ s > t ≥ 1. Since V is A A tum computation, as it is a somewhat classical part of not absolutely singular, we have s = t + 1. Using a a quantum circuit; see its use in the definition of the suitable product permutation operator on the left-hand Fourier hierarchy in [9]. The diagonal unitary, which is V 0 side of V, we may assume that V = 1 , where a special complex permutation matrix, can be efficiently (cid:18)V2 V3 (cid:19) simulated in terms of the Clifford+T basis by the al- V and V are, respectively, (s−1)d ×(s−1)d and 1 3 B B gorithm in [24]. We define the controlled-permutation (d −s+1)d ×(d −s+1)d submatrices. Since V A B A B matrices to be bipartite controlled unitaries controlled is not absolutely singular, neither are V and V . The 1 3 in the computational basis of one system and with the hypothesis induction implies that the assertionholds for termsonthecontrolledsidebeingpermutationmatrices. both V and V . Hence the assertion holds for V. This 1 3 Thecontrolled-complex-permutation matrices aredefined completes the proof. similarly. Equivalence of the lemma to Hall’s marriage To study the decomposition of complex permutation theorem. We use the combinatorial formulation of matrices,wepresentapreliminarylemma,whichisactu- Hall’s marriage theorem in [25]. It involves some given ally a form of the Hall’s marriage theorem [25]. Suppose elements, each of which may be in one or more of some V = dA |jihk|⊗V is a bipartite operator on the given sets. There is a marriage condition that says the j,k=1 j,k spacePH = HA ⊗HB. We say that V is absolutely sin- numberofdistinctelementscontainedinksetsisatleast gular if there are integers j ,··· ,j and k ,··· ,k with k, for any integer k ≥ 0. A system of distinct represen- 1 s 1 t s+t>d ,suchthatV =0. Theabsolutesingularity tatives is a set of distinct elements, each of which is in A ja,kb of V is unchanged up to any product permutation oper- a different set. Hall’s marriage theorem says that a sys- ators on the left- and right-hand sides of V (a product tem of distinct representatives exists if and only if the permutation operator is of the form P ⊗Q , where P marriage condition is satisfied. Let us now describe the A B A andQ arelocalpermutationoperators;inwhatfollows equivalence of the current lemma to the above theorem. B we only need Q to be an identity matrix). Hence an Take the sets to be the big rows of V labeled by j, and B absolute singular V is locally equivalent to another bi- the elements to be the big columns labeled by k, and partite operator whose left-upper sdB ×tdB submatrix let an element k be in a set j if and only if the Vj,k is is zero. Evidently an absolutely singular operator is sin- nonzero. Thenthemarriageconditioncorrespondstothe gular, but the converse is not true. We characterize the definitionofabsolutesingularity,andasystemofdistinct absolute singularity as follows. representatives corresponds to a sequence of dA distinct big column labels k (i = 1,...,d ) such that V is i A i,ki Lemma 6 V = dA |jihk| ⊗ V is not absolutely nonzero. This establishes the equivalence. ⊓⊔ j,k=1 j,k singular if and onPly if there are dA distinct integers Theorem 7 Any bipartite complex permutation unitary k ,··· ,k , such that the blocks V ,··· ,V are 1 dA 1,k1 dA,kdA has a 3-sandwich form, composed of controlled-complex- all nonzero. permutation matrices. In particular, if the unitary is a Proof. Wefirstpresentamatrix-basedproof,andthen permutation matrix, the 3-sandwich form is composed of provideaproofofthe equivalence ofthe lemma to Hall’s controlled-permutation matrices. marriagetheorem, which is knownto have severaldiffer- Proof. The second claim implies the first claim, since ent proofs. anycomplexpermutationunitaryistheproductofaper- Matrix-based proof. The “if” partfollows fromthe mutationmatrixandadiagonalunitary,thelattercanbe definition of absolute singularity. Let us provethe asser- absorbedintooneofthecontrolled-permutationmatrices tion in the “only if” part. Assume V is not absolutely inthedecompositionofthecomplexpermutationunitary. singular. This assumption and the assertion are both Therefore it suffices to prove the second claim. Suppose unchanged up to any product permutation operators on U is a bipartitepermutationunitaryonthe d ×d sys- the left- and right-hand sides of V. We will refer to the A B tem. dB×dB blocksinV stillasVj,ksincethereisnoconfusion. Let U = dA |jihk| ⊗ U . Since it is not ab- The assertion is trivial for d = 1,2. Next we shall use j,k=1 j,k A solutely singuPlar, it follows from Lemma 6 that there induction over d . The induction hypothesis is that the A are d distinct integers k ,··· ,k , such that the assertionholdswhend isreplacedby2,··· ,d −1,and A 1 dA A A blocks U ,··· ,U are all nonzero. There are lwuetewlyilslinpgrouvlaert,hweeamssaeyrtaisosnumfoertdhAa.tVSincisenVonizsenrootupabtosoa- two contr1o,kll1ed-permduAt,kadtAion matrices V = dA |jihj|⊗ 11 j=1 suitableproductpermutationoperatorontheright-hand V and W = dA |jihj| ⊗ W from Pthe A side, j j=1 j sideofV. IfthesubmatrixX = dj,Ak=2|jihk|⊗Vj,k isnot such that the firPst entry of any one of the blocks P 7 V U W ,··· ,V U W of VUW is one. If the table are to be transferred to row j of the table af- 1 1,k1 k1 dA dA,kdA kdA d = 2 then VUW is a controlled-permutation unitary ter the permutation gate T. The first, second, and third B from the B side. So the assertion holds. We use the in- controlled-permutation gates permutes among elements duction over d ≥ 2. We have VUW = X ⊗|1ih1|+Y, inthesamerow,column,androw,respectively. Eachcol- B whereX isapermutationmatrixonH ,andY isaper- umn ofthe rectangulartable after the firstgate contains A mutationmatrixonH ⊗|1i⊥. Theinductionhypothesis elementsthatareto be permutedunder the secondgate. A on Y implies that Y = Y Y Y , where Y ,Y and Y are Each permutation in a column corresponds to one per- 1 2 3 1 2 3 controlled-permutation matrices from A, B and A side, mutation matrix in the decomposition of M as the sum respectively. Hence of permutation matrices. The argument above roughly describestheproofin[26]forCorollary8. Incomparison, U =V†(I ⊗|1ih1|+Y )(X ⊗|1ih1|+Y ) A 1 2 our matrix-based approach for obtaining the circuit de- ·(I ⊗|1ih1|+Y )W†, (18) composition hints at some connections to the sandwich A 3 form of general unitaries of the sort in Lemma 3 and whichis a3-sandwichformof U composedofcontrolled- Theorem 4. permutation matrices. So we have proved the second claim. This completes the proof. ⊓⊔ It is known that the SWAP gate defined in Lemma 1 d IV. DECOMPOSITION OF MULTIPARTITE hasadecompositionusingthreebipartitecontrolledgates UNITARY OPERATORS [18]. Inagreementwiththeconstructionin[18],Theorem 7showsthatthe threegatescanbe chosenascontrolled- Inthissection,westudythedecompositionofn-partite permutation gates in a 3-sandwich form. unitaryoperatorsU onthespace⊗n H withDimH = Thetheoremalsohasimplicationsforclassicalcircuits. j=1 j i d . We define a generalized m-sandwich form of U to be Defineaclassical reversible circuit (classical permutation i adecompositionoftheformU =U U ···U ,whereany gate)tobeaclassicalcircuitthatisapermutationonthe 1 2 m U isacontrolledunitarycontrolledinthecomputational allowed set of input data. In the bipartite case, suppose i basis from n−1 fixed systems. For example, U may be d and d are the number of possible states on the sys- 1 A B controlledfromthesystemsofH ,··· ,H , U maybe tems A and B, respectively, then we say the circuit acts 1 n−1 2 controlled from the systems of H ,··· ,H ,H , etc. ona d ×d system. Forexample, whenn andn are 1 n−2 n A B A B Thecomputationalbasisin⊗n H consistsoftheprod- thenumberofbitsonthetwosystems,wehavedA =2nA uct states |j ,··· ,j i wherejj=1=j1,··· ,d for each i. and dB =2nB. From Theorem 7, and noting that in the The word “fi1xed” mneans the chioices of conitrolling par- proof of Theorem 7 there is no requirement of coherence tiesarefixedforeachgateU . Suchchoicesareafunction inboththetargetunitaryandthecontrolledunitariesin i of the generalized m-sandwich form that we choose. In the decomposition, we have theresultsinthissection,wealwaysfixsuchchoices. We have Corollary 8 Suppose T is a classical reversible cir- cuit on a d × d bipartite system, then T can be A B Proposition 9 Any n-partite unitary has a generalized implemented using the product of 3 bipartite classical [2 n−1(2d −2)−1]-sandwich form. controlled-permutation gates. j=1 j Q Note the classical controlled-permutation gates are con- Proof. Letf(n)=2 n−1(2d −2)−1. Theassertionis j=1 j trolled in the computational basis, as one would expect. trivial for n =1, and fQollows from Theorem 4 for n= 2. Corollary 8 is also stated in Sec. 3.15 and Appendix E We use the induction on n. Assume that any (n−1)- of [26], where the proof approach is by considering the partiteunitaryhasageneralizedf(n−1)-sandwichform. permutationaccomplishedbythecircuitanddirectlyus- Let U be an n-partite unitary. By regarding H = H A 1 ing the Birkhoff-von Neumann theorem (explained be- andH =⊗n H inTheorem4,weobtainthe(4d −5)- B j=2 j 1 low), which has an integer-arithmetic version that says sandwich form thefollowing: Anymatrixofsizen×nwithnon-negative integerentriesandwithrowandcolumnsums equaltoq 4d1−5 U = U , (19) canbedecomposedasthesumofqpermutationmatrices j of size n×n. Such a statement appears in [27], and a jY=1 simple proof is by repeated use of Hall’s marriage theo- where U is controlled in the computational basis of H j A rem,eachtime findingapermutationmatrix,whichisto for odd j, and of H for even j, respectively. In par- B be subtracted from the original matrix, and this process ticular, the computational basis in the latter is realized terminates when the resulting matrix becomes the zero by performing suitable unitaries on H that can be ab- B matrix. The construction of the 3 classical permutation sorbed by the U with odd j. Then j gates is as follows: arrange the d ×d computational- A B basis states in a rectangular table with d rows and d A B d1 columns, and define a matrix M to contain integer ele- U = |kihk|⊗U , ∀ odd j, (20) j jk ments Mij that indicate how many elements in row i of Mk=1 8 where each U is a unitary on H . From the induction 3 parties. Therefore, at least when d << d and d jk B B C A assumption, Ujk has ageneralized[2 jn=−21(2dj−2)−1]- is a large constant (say dA ≥20), Proposition 10 gives a smaller number than Proposition 9. sandwich form. Then (20) implies thQat Uj with any odd j has a generalized[2 n−1(2d −2)−1]-sandwichform. The proofsofthe results aboveimply that, if wecould j=2 j reducethenumberofbipartitecontrolledunitariesinthe SinceUj withanyevenQjisacontrolledunitarycontrolled sandwich form in Theorem 4, then the number of mul- inthecomputationalbasisofH ,(19)impliesthatU has B tipartite controlledunitaries in the generalizedsandwich a generalized m-sandwich form where form could also be reduced. In particular, from Theo- n−1 rem. 7, we have m = (2d −2)[2 (2d −2)−1]+2d −3 1 j 1 jY=2 Proposition 11 Any n-partite complex permutation = f(n). (21) unitary has a generalized (2n−1)-sandwich form com- posed of controlled-complex-permutation unitaries con- This completes the proof. ⊓⊔ trolled by n−1 parties. The proof above first divides the systems into two groups of one party and (n − 1) parties each. When Proof. It suffices to consider permutation unitaries, n ≥ 4, there are also other ways of dividing the systems for the same reasonas stated in the proof of Theorem 7. at the first step that may give rise to fewer gates in the From Theorem 7, the claim holds for n = 2. The proof generalized sandwich form. The following result is for is by induction over n. The induction hypothesis is that the case of n=4. theclaimholdswhennisreplacedbyanypositiveinteger lessthann. Nowconsidern≥3,andtakeabipartitecut Proposition 10 Any unitary on four parties A,B,C,D ofthefirstn−1partiesversusthelastparty. FromThe- has a generalized [4(d d −1)(2d +2d −5)−4d +5]- orem 7, the permutation unitary has a 3-sandwichform, A B A C A sandwich form. andthefirstandthelastgatesinthe3-sandwichformare a controlled permutation controlled from the first n−1 Proof. Let U be a unitary on these four parties. By parties. Themiddlegateinthe3-sandwichformisacon- regarding H and H in Theorem 4 as H and H A B AB CD trolledpermutationcontrolledfromthelastparty,soitis respectively, we obtain the following sandwich form of the formU ⊗|1ih1|+U |2ih2|, where the permutation 1 2 operators U and U on the first n−1 parties can each 4dAdB−5 be decompo1sed into22(n−1)−1 controlled-permutation U = U , (22) j gatescontrolledbyn−2parties,andthechoicesofthose jY=1 controlling n−2 parties are always the same for the de- whereUj iscontrolledinthecomputationalbasisofHAB compositions of U1 and U2, according to the induction foroddj,andinthecomputationalbasisofH foreven hypothesis. Thereforethepermutationunitaryonnpar- CD j, respectively. Then ties has a generalized (2n−1)-sandwich form composed of controlled-permutation gates controlled by n−1 par- dAdB ties. The case with phases is similar, just adding the U = |kihk| ⊗U , ∀ odd j, (23) word “complex”. This completes the proof. ⊓⊔ j AB jk Mk=1 The result above has a corresponding statement for classical reversible circuits. In the special case that each where|kihk| areprojectorsontothecomputationalba- AB party is one bit, it is illustrated by a sample circuit in sis of H , and each U is a unitary on H . From AB jk CD Fig. 2of[28](note the sequenceoflines is oppositefrom Theorem 4, U has a generalized (4d − 5)-sandwich jk C that in the proof above). form. Then (23) implies that U with any odd j has a j generalized (4d −5)-sandwich form. Similarly, U with As mentioned in Sec. III, it is possible that the litera- C j any even j has a generalized (4d −5)-sandwich form. tureresultsonthedecompositionofquditcircuits[3,6,7] A Therefore U is the product of couldbecombinedwiththeresultsinthissectiontogive better upper bounds of the number of two-qudit gates. (2d d −2)(4d −5)+(2d d −3)(4d −5) A B C A B A = 4(d d −1)(2d +2d −5)−4d +5 (24) A B A C A V. DECOMPOSITION USING A SIMPLE TYPE unitaries that are controlled in the computational basis OF GATES of 3 parties. This completes the proof. ⊓⊔ Tocomparethe twoPropositionsabove,assumen=4 In this section, we apply our result on decomposi- in Proposition9 with the subscripts 1,2,3,4replaced by tionusingcontrolledunitariestothedecompositionusing A,B,C,D, respectively, and that d ≤ d ≤ d ≤ d . more basic type of gates defined below. One of our mo- A B C D ThenProposition9givesthatU istheproductof16(d − tivations is to characterize the nonlocal part of the cost A 1)(d −1)(d −1)−1 unitaries that are controlledfrom for implementing bipartite unitaries using some measure B C 9 with a fixed unit, rather than using the number of con- In the case d = d = 2, it is well known that three A B trolled unitaries which is a measure with its unit depen- Schmidt-rank-2 gates are sufficient and necessary for a dent on the dimensions. The cost measure that we use general two-qubit unitary [12], as mentioned in Sec. III. is the number of standard gates defined below, and we AnexamplethatneedsthreeSchmidt-rank-2gatesisthe do not allow any ancillary systems in the circuit. The two-qubit SWAP gate ([12], also see Lemma 1). Our case with ancillas will be discussed in Sec. VII. In the main result for general d ×d system is as follows. A B following definitions, I stands for the identity operator X on system X. Proposition 15 (i) Any bipartite unitary on d × d A B system is the product of f(d ,d ) standard gates inter- A B spersed with local unitaries on H or H , where Definition 12 A standard gate is a unitary acting on A B tUhAeBH=ilbe(rVtabsp⊕acIeAHB\AaBb),=whHerAe⊗HHABB =ofHthaeb ⊕forHmABU\a=b, f(dA,dB) = 2(dA−1)2⌊d2B⌋ and H ⊆ H and H ⊆ H are two-dimensional each, a A b B d A and V is a Schmidt-rank-2 unitary on the 2×2 space + (2d −3)(d −1)⌊ ⌋. (26) ab A B 2 H = H ⊗H . The V is called the nontrivial part of ab a b ab U. (ii) If the unitary is a controlled unitary controlled from the A side, then Notetheword“Schmidt-rank-2”abovecanbereplaced by “controlled”, as Schmidt-rank-2 unitaries are con- dB f(d ,d )=(d −1)⌊ ⌋. (27) A B A trolled unitaries ([15]; also see an alternative proof in 2 [17]), and two-qubit unitaries of Schmidt rank greater (iii) If the unitary is a complex permutation unitary, than 2 must have Schmidt rank 4 [13] and thus cannot then be controlled unitaries. The case of H being strictly AB largerthanH is useful, for example, inthe decomposi- d d ab f(d ,d )=2(d −1)⌊ B⌋+(d −1)⌊ A⌋. (28) tionofthe Toffoligate[29], andhasbeenexperimentally A B A 2 B 2 realized [30]. The definition above can be extended to a (iv) Ifthenontrivial partof thestandardgates is required more general definition below: tobeCNOT,thenatmost3(d −1)(d −1)suchstandard A B gatestogetherwithlocalpermutationgatescanimplement Definition 13 A bipartite elementary gate is a unitary any bipartite permutation unitary on d ×d space. A B actingontheHilbertspaceH ⊗H =(H ⊗H ⊕H )⊗ A B a C D (H ⊗H ⊕H )oftheform U =(V ⊗I )⊕I , Proof. (i). Let U be the bipartite unitary. Theorem 4 b E F ab CE AB\abCE where H and H are two-dimensional each, and V is a implies that U has the following sandwich form a b ab Schmidt-rank-2unitary,andH =H ⊕H . AB AB\abCE abCE 4dA−5 U = U , (29) In the following we consider the decomposition of bi- j partite unitary operators into the product of bipartite jY=1 standard gates defined in Definition 12 and arbitrary lo- where U is controlled in the computational basis of H j A calgates,withthegoalofminimizingthenumberofnon- foroddj,andinthe computationalbasisofH for even B localstandardgates. ThemoregeneralDefinition13will j, respectively. For all odd j, we have not be studied in this paper except that we define some gate cost using it in Definition 14 and raise some open dA questions. U = |kihk| ⊗U , j A jk Wedefinethefollowinggate-costmeasuresforabipar- Mk=1 tite unitary. dA = [|kihk| ⊗U ⊕(I −|kihk| )⊗I ], (30) A jk A A B Definition 14 Let H=H ⊗H be the complex Hilbert kY=1 A B space of a finite-dimensional bipartite quantum system, where|kihk| areprojectorsontothecomputationalbasis A with DimH = d and DimH = d . For any given A A B B ofH ,andeachU isaunitaryonH . Wecanapplya A jk B bipartite unitary U :H→H, local unitary U on H before performing other steps jdB B below. In order to implement U , the operator that re- j c (U) :=min{k|U =U U ···U , U ∈S }, s 1 2 k i s mains to be implemented is still given by (30) but with c (U) :=min{k|U =U U ···U , U ∈S }, (25) U becoming the identity matrix, and the other oper- e 1 2 k i e jdB ators U also changed but we still denote the changed jk whereS (respectively, S )is thesetofbipartiteunitaries matrices as U , with 1 ≤ k ≤ d −1. The U is to be s e jk A j on the same space that are equivalent to the standard implemented using the product of d −1 operators, as A (respectively, bipartite elementary) gates under local uni- shown in the second line of (30). Then each of the U jk taries. with 1 ≤ k ≤ d −1 can be assumed to be a diagonal A 10 unitary, because we can apply a suitable local unitary standardgateswiththenontrivialpartbeingCNOTcan similarity transform on H so that U is diagonal and implement (I −|jihj|)⊗I +|jihj|⊗P . Repeat this B jk A B j I is unchanged. By a local diagonal unitary gate on d −1 times for j =2,...,d , a controlled-permutation B A A H which only applies a phase on |ki , we can set the gate from the A side can be implemented using at most A A last diagonal element of U to be 1, while the I cor- (d −1)(d −1) such standard gates. The last result is jk B A B responding to the basis kets in H other than |ki are the same for the B side. Hence the claim follows. ⊓⊔ A A unchanged. Therefore we have ItcanbeverifiedthatinProposition15(iv),thephrase “together with local permutation gates” can be dropped U =diag(x(jk),x(jk),...,x(jk) ,1), (31) jk 1 2 dB−1 by allowing the nonlocal unitary to be implemented up tolocalpermutationsbeforeandafterit. Sinceapermu- wherex(jk) arecomplexphases,i=1,2,...,d −1. Then i B tationgateond-dimensionalspacerequiresatmostd−1 we choose ⌊dB⌋ standard gates as follows: transpositionsofthetype|jihk|+|kihj|+ |iihi|,the 2 i6=j,k Vr(jk) =IA⊗IB +(x(2jrk−)1−1)|kihk|A⊗|2r−1ih2r−1|B f2oduAr+lo2cdaBl−pe4rmloucatalttiroannsspoonsiHtioAnsoirnHtoBtaPrl.eqTuhireereafotrmetohset d total number of standard gates of the CNOT type and (jk) B +(x −1)|kihk| ⊗|2rih2r| , for 1≤r ≤⌊ ⌋, (32) 2r A B 2 the local transpositions is at most 3(dA −1)(dB −1)+ 2d +2d −4=3d d −d −d −1. Itcouldpotentially A B A B A B Each gate V(jk) applies phases on the two states |ki ⊗ befurtherreducedbyaconstantfactor,andthisislisted r A |2r −1i and |ki ⊗|2ri , but keeps other computa- as an open problem in the Conclusions. B A B tional basis states of H unchanged. It is easy to ver- From[18]andProposition15(ii),theSWAP gatehas AB d ify that such a gate has Schmidt rank at most 2 when a decomposition using 3(d−1)⌊d⌋ standardgates across 2 viewedasaunitaryactingonthe 2×2systemwithbasis the two systems, together with some local unitaries. On {|k′i ,|ki }×{|2r−1i ,|2ri }, where k′ 6= k. Hence the other hand, if we are not restricted to writing the A A B B for each (j,k) pair with odd j and 1 ≤ k ≤ d −1 , SWAP gateasaproductofsomegates,butconsiderthe A d we need ⌊dB⌋ standard gates to implement the opera- actualcostofimplementation,we couldalsomakeuseof 2 tor |kihk| ⊗U ⊕(I −|kihk| )⊗I in the last line of tensor products. Suppose d = m p , where m ≥ 1 is A jk A A B j=1 j (30). Therefore, for each odd j, Uj needs (dA −1)⌊d2B⌋ an integer and pj are primes. TQhen the SWAPd gate is standard gates to implement, assisted by local unitaries. thetensorproductoftheSWAPgatesonpj×pj systems. dSaimrdilagraltye,sftooriemacphlemeveennt,ja,sUsijstnedeebdys l(odcBal−un1i)t⌊adr2Aie⌋s.stTahne- uTshinegSW3(ApjP−ga1t)e⌊po2jn⌋pbji×paprjtitseysstteamndhaarsdagdaetecso,mtpoogseitthioenr assertion then follows by counting the numbers of U in with some local unitaries. Hence the total implementa- j (29)intermsofoddandevenj. Thiscompletestheproof tion cost is mj=13(pj−1)⌊p2j⌋ bipartite standardgates, of (i). together witPh some local unitaries. (ii). The claim follows from the proof of (i) by setting the upper bound for j in (29) to 1. (iii). The claimfollowsfromTheorem7 andthe result VI. THE ROLE OF SCHMIDT RANK IN of (ii) applied to the A and B sides. DECOMPOSITION OF BIPARTITE UNITARIES (iv). From Theorem 7 , every bipartite permutation unitary is the product of 3 controlled-permutation uni- The Schmidt rank of a bipartite unitary U sometimes taries, controlled from the A, B and A side, respec- determines the number of bipartite controlled unitaries tively. Every permutation on n elements is the prod- needed to decompose U, as it is proved in [15, 17] that uct of at most n − 1 transpositions (swap of two ele- c(U) = 1 when Sch(U) = 2 or 3. To investigate the ments). Define a controlled-transposition gate to be a relation between c(U) and Sch(U) for general bipartite bipartite unitary of the form |1ih1|A⊗IB+|2ih2|A⊗VB, unitary U, we discuss the different cases characterized where VB = |jihk|+|kihj|+ i6=j,k|iihi|, for some j 6= k by how large r :=Sch(U) is comparedto the dimensions ({|ii} is the computational bPasis of HB). For the spe- dA anddB. Ifr ≥min{dA,dB},thenitfollowsfromThe- cial case d = 2, up to a local permutation on H we orem 4 (applied to the A or B side) that c(U)≤4r−5. A B can write a controlled-permutationgate from the A side On the other hand if r < min{d ,d }, then we need A B as |1ih1|⊗I +|2ih2|⊗P , where P is a permutation to count the number of parameters in U. It is equal B 2 2 unitary on H . This controlled-permutation gate can to (d2 −r +d2)r, which is smaller than 2d d2 when B A B A B be written as the product of at most d −1 controlled- d ≤ d . A controlled unitary from the A side con- B A B transposition gates, which are standard gates with their tains d d2 parameters, and noting that there are some A B nontrivial part being the CNOT. For larger d , up to redundant parameters when counting consecutive con- A a local permutation on H we can write the controlled- trolled unitaries in a product, theoretically U could be B permutation gate as |1ih1|⊗I + |jihj|⊗P , where the product of only three controlled unitaries (or even B j j Pj are permutation unitaries on HBP. Take the subspace two when r is further restricted to smaller values). But span{|1i ,|ji } (2 ≤ j ≤ d ) as the A side space in the actual number may be higher. A possible class of A A A the d = 2 result above; we have that at most d −1 candidate examplesthat may needmore thanthree con- A B