Table Of Content

T-count optimization and Reed-Muller codes Matthew Amy∗ Institute for Quantum Computing, University of Waterloo, Waterloo, ON, Canada and David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada Michele Mosca† Institute for Quantum Computing, University of Waterloo, Waterloo, ON, Canada Department of Combinatorics & Optimization, 6 University of Waterloo, Waterloo, ON, Canada 1 Perimeter Institute for Theoretical Physics, Waterloo, ON, Canada and 0 Canadian Institute for Advanced Research, Toronto, ON, Canada 2 Weprovean equivalencebetween exactly minimizing thenumberof T gates in n-qubitquantum n circuits over CNOT and T gates, together with the Clifford group powers of T, and minimum a distancedecodinginthelength2n−1puncturedReed-Mullercodeofordern−4. Theequivalence J arisesfromananalysisoftheparticularformofphasepolynomialsgeneratedbyCNOTandT gates. 7 Asa consequence, we derive an algorithm for the optimization of T-count in {CNOT,T} quantum 2 circuitsbasedonReed-Mullerdecoders,alongwithanewupperboundofO(n2)onthenumberofT gatesrequiredtoimplementann-qubitunitaryoverCNOTandT gates. Wefurthergeneralizethis ] h result to show that minimizing finer angle Z-rotations corresponds to decoding lower order binary p Reed-Mullercodes. Inparticular,weshowthatminimizingthenumberofRz(π/2k)gatesincircuits - over{CNOT,Rz(π/2k)}correspondstominimumdistancedecodinginthelength2n−1punctured t n Reed-Mullercode of order (n−k−2). a u q [ I. INTRODUCTION 1 v 3 The synthesis and optimization of quantum circuits has generateda great deal of interest in recent years. 6 As qubit technologies become more stable and experimentalists increase the size of their systems, actually 3 running algorithms on these machines becomes a practical concern. Moreover, we want to know how to 7 efficiently run these algorithms on the given systems, or conversely how big and stable of a system we 0 need to run a particular algorithm. Given the prevalence of the circuit model within quantum computing, . 1 quantum circuit optimization is an important tool in answering these questions. 0 Due to the great affect of noise on quantum computations in the standard circuit model, much research 6 hasshifteditsfocusfromoptimizingphysicalcircuitstologicaloneswithrespecttoafault-toleranceschemes 1 meantto mitigatethe errorsdue tothis noise. These schemesusuallyhavestrikingdifferencesfromphysical : v gates in terms of resource costs. In particular, most of the common schemes implement Clifford group i X gates transversally – that is, by performing one physical gate on each physical qubit or group of qubits. This allows the logical operation to be performed precisely and with time proportional to the physical r a gate time. The additional operation needed to make a universal gate set is then typically implemented probabilisticallywithstate distillationandgate teleportation,aless accurateprocedurewhichrequiresboth additionaltime andspacecomparedto a singlephysicalgate. The twoqubit controlled-NOT(CNOT) gate, as an element of the Clifford group, is thus a relatively cheap operation in this paradigm, compared to the T =diag(1,eiπ4) gate which is commonly chosenas the canonicalnon-Cliffordgate. This is a reversalof the computational costs inherent in most physical implementations, where entangling gates are typically more difficult to implement than single qubit rotations, and hence requires different circuit optimizations.While alternative fault-tolerance methods such as Paetznick and Reichardt’s completely transversal Clifford+T ∗ [email protected] † [email protected] scheme [2] and anyonic quantum computing [3] are gaining in popularity, minimization of the number of T gates – called the T-count – in quantum circuits is an important and widely studied goal. WebuildonpreviousworkbyAmy,MaslovandMoscaonthereductionofT gatesinquantumcircuits. In [4]itwasshownthatunitariesimplementableoverCNOTandT gatesmaybedescribedinthecomputational basisasabit-wise(linear)permutationtogetherwithaphaserotationthatisan8throotofunitytoapower given by a pseudo-Boolean function of the input bits. This function, called the circuit’s phase polynomial, was shown to be expressible as a sum of linear Boolean functions, the multiplicity of each function giving the power of the T gate required to produce the term. This idea was later used in [1] to optimize both T-countandT-depth– the number ofstagesofparallelT-gatesina circuit–by computing acircuit’s phase polynomial, simplifying it, then synthesizing a new circuit from the polynomial with maximally parallelized T gates. While their benchmarks showed significant reduction of T gates, it was noted that this approach was not optimal, as it was shown that there exist distinct phase polynomials that give rise to the same unitary. In particular, it was observed that for all x ,x ,x ,x F , 1 2 3 4 2 ∈ eiπ4 Pf∈F42→F2f(x1,x2,x3,x4) =1=e0, where Fn F is the space of all n-bit linear Boolean functions. It was left as an open question as to 2 → 2 whetherthereexistothersuchidentities,andwhethersuchidentitiescanbeusedtofurtherreduceinstances of T gates. In this paper we prove that the identities on n qubits that are useful for reducing T-count is exactly the length2n 1puncturedReed-Mullercodeofordern 4. Indoingso,wederiveanewT-countoptimization − − algorithmbased onReed-Muller decoding which is optimal for CNOT and T gate circuits when a minimum distance decoder is used. We implemented this optimization algorithm as a module in the quantum circuit optimizer T-par [5] and tested it on general Clifford+T circuits with different Reed-Muller decoders. The results show modest savings in T-count while still remaining tractable for circuits of non-trivial size. As a consequence of this equivalence, we also obtain a new upper bound on the number of T gates required to implementacircuitover CNOT,T ,aswellasevidenceto theintractabilityofexactT-countminimization through a polynomial-tim{e equivale}nce to the minimum distance decoding problem for the length 2n 1 − punctured Reed-Muller code of order n 4. − Our proof naturally generalizes to the case when the T gate is replaced with a Z rotation by any angle of the form π/2k. These gate sets are closely related to the Clifford-cyclotomic gate sets studied in [6], and are widely used in quantum algorithms including Shor’s algorithm [7]. We show that the number of π/2k rotationgates for eachvalue of k can be minimized by decoding Reed-Muller codes of different orders, openingupthepossibilityofoptimizingsuchcircuitsatthehighlevel,beforedecomposingthemintoalower level gate set such as Clifford+T. Further generalizationto rotationangles that are arbitrary roots of unity is also possible, with any composite order 2x3y5z reducing to the case for a root of order 2x. ··· A. Related work Much work has gone into T-count and depth reduction in recent years. Amy et al. [4] identified the T-count and T-depth as important quantities in the efficiency of a logical quantum circuit, and gave new implementationsof2–4bitquantumoperationsreducingT-countanddepthfromthepreviouslybestknown. Their search-based algorithm was later optimized and extended by Gosset et al. [8] to directly optimize T- depth,leadingtoproofsofT-depthminimalityforvarious2–4bitcircuits. Selinger[9]showedthattheToffoli gate,aswellasageneralclassofClifford+T circuits,canbe parallelizedtoT-depth1withsufficientlymany ancillas. Constructions for adding controls to quantum gates were also given which lowered the T-count anddepth comparedto best knownpracticesusing Toffoli gates. Amy, MaslovandMosca laterused similar ideas to create an automated, polynomial-time tool for reducing and parallelizing T gates called T-par, whichusesmatroidpartitioning toparallelizethe T gates. Morerecently,Abdessaied,SoekenandDrechsler [10] studied the effect of Hadamard gates on T-count and depth reductions, developing a tool that reduces Hadamard gates in quantum circuits leading to further T gate optimizations. Maslov [11] examined Toffoli gateimplementationsuptorelativephaseandusedthemtodevelopnewdesignsformultiplecontrolToffolis using fewer ancillas, CNOT, and in some cases T gates, than standard designs. 2 A great deal of work optimizing T-count and depth in single qubit circuits has also been done recently, with seriesofworksonexact[12] andapproximate[13–15]minimal synthesis,as wellas repeat-until-success circuits [16, 17]. B. Overview The rest of the paper is organized as follows. Section II gives definitions and notation that will be used throughoutthepaper. SectionIIIdefinesthelinearphaseoperators,detailstheirrepresentationasweighted sums of linear Boolean functions and synthesis. Section IV defines an additive subgroup of Z2n−1 whose 8 cosetscorrespondtotheuniquelinearphaseoperators,thencharacterizesitsbinaryresidueasaReed-Muller code and gives applications. Section V formally proves the equivalence of this (binary residue) subgroup with the length 2n 1 punctured Reed-Muller code of order n 4. Finally Section VI generalizes the result to circuits over CN−OT gates and phase rotations with angles o−f the form π/2k, and Section VII details the experimental evaluation of our technique. II. PRELIMINARIES We assume basic knowledge of quantum computing, but no knowledge of coding theory. For background on quantum computing, the reader is referred to [18]. A. Quantum gates Inthispaperwewillprimarilybeinterestedintwogates: thecontrolled-NOT(CNOT: x y x x y ) and the T-gate (T : x eiπ4x x ). These two gates, together with P := T2 and Z := |T4i|giat7→es,|cio|m⊕prisie | i 7→ | i what we refer to for brevity as the CNOT,T gate set. We include the P and Z gates in this set to { } distinguish them from sequences of T gates which are generally much more expensive to implement in most fault-tolerance schemes. Given any power k Z of the T gate, we define a minimal T-gate expansion by 8 ∈ Tk :=Zk2Pk1Tk0 where k k k is the binary expansion of k. Note that T8 =I so Tk =Tk mod8 for all integers k. 2 1 0 By a circuit over a particular set of gates we mean a sequential list of gates taken from the set, each with a list of qubits the gate is to be applied to. In this way, two distinct circuits may correspond to the same unitary matrix– we callsuchcircuits equivalent in this case. The problemofoptimizing quantumcircuits is then to find, given a circuit, an equivalent circuit minimizing some cost function. In this work we consider optimizing strictlythe T-countofquantumcircuits,defined asthe numberofT gatesappearinginacircuit. In particular, we define the problem of T-gate minimization as follows: Problem 1 (T-gateminimization). Given a circuit over a gate set containing the T gate, find an equivalent circuit over the same gate set with a minimal number of T gates. For completeness we also define the Hadamard gate in sum-over-paths style, H : x 1 ( 1)x′ x′ . | i7→ √2 X − | i x′∈{0,1} The addition of the Hadamard gate to the CNOT,T gate set gives a universal set, in the sense that any { } unitaryoperatormaybeapproximatedefficientlywithgatesinthisset. Thesegatesformasetofgenerators of the Clifford+T gate set commonly used in fault-tolerant quantum computing. 3 B. Coding theory We provide only the necessary definitions from coding theory – for a more complete introduction, see MacWilliams & Sloane [19]. A length n binary linear code is a subspace C of Fn, where F is the unique 2-element field ( 0,1 , , ) 2 2 { } ⊕ · withaddition( )andmultiplication()modulo2. ThevectorsofC arecalledthecodewords ofC. Notethat F isthesetofB⊕ooleanvalueswithadd·itioncorrespondingtoexclusive-ORandmultiplicationcorresponding 2 to AND. We also define Boolean negation in F as x := 1 x. Addition, multiplication and negation are 2 ⊕ extended to vectors component-wise – that is, xy is the component-wise multiple of vectors x and y, as opposed to vector space multiplication. We denote binary vectors by boldface letters e.g., x = x x x Fn, and use them interchangeably 1 2··· n ∈ 2 as bit strings. In particular, we denote the n-qubit computational basis vectors by x where x is a binary vector/bitstring. The(Hamming)weight ofabinaryvector,denotedwt(x),x Fn,i|sdiefinedasthenumber of non-zero entries it contains, and the (Hamming) distance between two bi∈nary2 vectors x,y Fn is the ∈ 2 weight of their sum: d(x,y):=wt(x y). ⊕ Given a received vector x e where x C and e Fn is some error vector, we wish to find x – this ⊕ ∈ ∈ 2 process is known as decoding. In this work, we are only concerned with minimum distance decoding, as it relates directly to T-count optimization. Problem 2. (Minimum distance decoding for a binary linear code C) Given a received vector x Fn, find ∈ 2 a codeword y C such that for all z C, d(x,y) d(x,z). ∈ ∈ ≤ This problem is closely related to the more general closest vector problem (CVP) over a lattice, and in fact coincides with the CVP problem over the lattice C with the Hamming weight as the norm. Minimum distance decoding is commonly studied as it reasonably approximates maximum likelihood decoding when bit flip errors are independent of one another. Wegiveonemoredefinitionfromcodingtheorywhichwillberelevanttoourwork: themaximumdistance of any vector from a codeword, called the covering radius. Definition 1. The covering radius of a length n binary code C is ρ(C)= maxmind(x,y). x∈Fny∈C 2 C. Reed-Muller codes Many different presentations of the binary Reed-Muller codes ([20, 21]) are known; we use a presentation based on multivariate polynomials as it will provide a convenient setting for our proofs. For more details the reader is referred to [19]. Let F [v ,v ,...,v ] be the ring of polynomials in n variables over F . We use the symbols v ,v ,...,v 2 1 2 n 2 1 2 n to denote Boolean variables so as to differentiate them from elements of binary vectors. Given f F [v ,v ,...,v ] we define the evaluation vector of f, when viewed as an n-ary function, to be the lengt∈h 2 1 2 n 2n 1 vector of evaluations of f for inputs of non-zero Hamming weight – i.e. − (f(1,0,...,0),f(0,1,...,0),...,f(1,1,...,1)). We denote the evaluation vector of a polynomial function f by f. Since v2 =v for all v F , without loss 2 of generality we only consider polynomials f F [v ,v ,...,v ] that contain binary exp∈onents 0 or 1 on 2 1 2 n ∈ variables. Identifying the variable v with the Booleanfunction f(v ,v ,...,v )=v , we denote the evalua- i 1 2 n i ttihoenevveacltuoartioofnvviebcytovrio.fItfciasneqbueaelatsoilyLvye∈riFfin2evdy1t1hvay2t2·fo·r·vaynnny–Baogoaleinan, epxoplyonneonmtiiaaltifon=oLf ay∈BFon2ovle1ya1nv2yv2e·c·t·ovrnynis, defined as component-wise exponentiation. 4 (1,0,0,...,0) (0,1,0,...,0) (1,1,0,...,0) (0,0,1,...,0) ··· (1,1,1,...,1) 1 1 1 1 0 ··· 1 v1 1 0 1 0 ··· 1 v2 0 1 1 0 ··· 1 v1v2 0 0 1 0 ··· 1 v3 0 0 0 1 ··· 1 .. .. .. .. .. .. .. . . . . . . . v1v2···vn 0 0 0 0 ··· 1 TABLE I. Evaluation vectors for monomials overn Boolean variables. We define the total degree of a monomial vy1vy2 vyn to be the sum of its exponents: 1 2 ··· n n deg(v1y1v2y2···vnyn)=Xyi =wt(y). i=1 The degree of a polynomial function f F [v ,v , v ], denoted deg(f), is defined as the maximum total 2 1 2 n degreeofeachmonomial. TableIillustr∈atestheeva·lu··ationvectorsofthe2n monomialsonnvariables. Note that the set of non-constant monomial evaluation vectors forms a basis for the space F2n−1. 2 Definition 2. The punctured Reed-Muller code of order r and length 2n 1, denoted (r,n)∗, is the set of evaluation vectors for all polynomials f F [v ,v ,...,v ] of degree a−t most r. RM 2 1 2 n ∈ Thenon-punctured,length2n Reed-Mullercodeororderr isdefinedinasimilarfashion,usingevaluation vectors consisting of all 2n distinct evaluations for a given polynomial function instead. D. Statement of main result We now state the main result. Theorem 1. The T-count minimization problem over n-qubit, CNOT,T circuits is polynomial-time (in 2n) equivalent to minimum distance decoding for (n 4,n)∗{. } RM − WeproveTheorem1byshowingthatthenumberofT gatesina CNOT,T circuitcorrespondsdirectlyto thenumberofodd-weightelementsofatupleinZ2n−1. Moreover,{this tuple}canbecomputedfromacircuit 8 orvice versain polynomial-time. An equivalencerelationis then definedbetweentuples thatimplement the same unitary, which we show is in direct correspondence with cosets of the (n 4,n)∗ code. We then RM − usethisequivalencetoutilizethewealthofresearchperformedonReed-Mullerdecodingforquantumcircuit optimization. III. LINEAR PHASE OPERATORS In this section we introduce linear phase operators as the sub class of unitaries implementable by CNOT,T which require T gates. We review their representation as pseudo-Boolean polynomial func- { } tions anddefine the canonicalT-countfora particularpolynomial. Finally weshowthat aminimal T-count implementation of a linear phase operator corresponds to a minimal weight vector of a vector space coset. We define (n) to be the set of diagonal 2n 2n unitaries implementable over CNOT,T – we restrict 4 P × { } our attention to this subset as any circuit over CNOT,T may be decomposed into a diagonal unitary { } followed by a permutation implementable using only CNOT gates. Amy et al. [4, Lemma 2] showed that each such unitary U (n) has the effect of applying a pseudo-Boolean function P to a computational 4 basis state x , viewed∈aPs a vector x=x x x Fn, and kicking the result into the phase: | i 1 2··· n ∈ 2 U : x eiπ4P(x) x . | i7→ | i 5 Moreover,it wasshownthat the phase polynomial P :Fn Z necessarilyhas a presentationas a weighted 2 → 8 sum of (non-zero) linear Boolean functions: P(x)= a ι(y x y x y x ), X y· 1 1⊕ 2 2⊕···⊕ n n y∈Fn\{0} 2 wherethecoefficientsa areintegersmodulo8. Wecallthe(2n 1)-tuplea=(a ,a ,...,a ) Z2n−1 an implementation of P, aynd converselydenote the phase polynom−ial defined by a 1set2of coeffi2cni−en1ts∈a 8Z2n−1 by P . As the function P involves both F and Z arithmetic, we use the natural inclusion ι of F ∈in Z8 to a 2 8 2 8 coerce the binary valued result of y x y x y x into an integer – we leave the inclusion implicit 1 1 2 2 n n ⊕ ⊕···⊕ whenever no confusion is likely to arise. Wecallunitariesin (n)π/4linearphaseoperators,astheymaybeexpressedasasequenceof(π/4)phase 4 P rotationsconditionedonlinearBooleanfunctions ofthe input basisstate. We dropthe π/4until SectionVI when we generalize the result to π/2k linear phase operators. Given a particular phase polynomial P, we denote the linear phase operator with phase polynomial P by U . P Example1. Thedoubly-controlledZ gateisaπ/4linearphaseoperatorwithphasefunctionP(x ,x ,x )= 1 2 3 4 x x x . Using the identity 2 xy =x+y+7(x y) mod 8 [9], the phase function may be presented as 1 2 3 · · ⊕ the following weighted sum of linear Boolean functions: P(x ,x ,x )=x +x +7(x x )+x +7(x x )+7(x x )+(x x x ). 1 2 3 1 2 1 2 3 1 3 2 3 1 2 3 ⊕ ⊕ ⊕ ⊕ ⊕ Writing the coefficients above as a 7-tuple over Z we get (1,1,7,1,7,7,1). Note that this implementation 8 corresponds to the following circuit, taken from [1]. The state of each qubit after an update is shown to illustrate the relation between the state of a qubit as a Boolean function of the inputs and the application of phase gates. x x x x x x x x 1 3 1 2 3 1 2 1 x1 T ⊕ T† ⊕ ⊕ T ⊕ T† x1 x2 T x2 • • • • x x x 2 3 3 x3 ⊕ T† T x3 • • Amy, Maslov and Mosca [1] showed that a linear phase operator U can be synthesized over CNOT,T P given an implementation a Z2n−1 of P in time polynomial in the number of non-zero ent{ries of a –} ∈ 8 moreover,thisnumberislinearinthesizeofthecircuit. Theirprocedureapplieseach(non-trivial)phaseshift eiπ4ay·ι(y1x1⊕y2x2⊕···⊕ynxn) by first computing y1x1 y2x2 ynxn, then applying Tay anduncomputing ⊕ ⊕···⊕ y x y x y x . Recall that 1 1 2 2 n n ⊕ ⊕···⊕ Tk :=Zk2Pk1Tk0 wherek k k isthebinaryexpansionofk. Sinceeachy x y x y x isalinearfunctionofthebasis 2 1 0 1 1 2 2 n n ⊕ ⊕···⊕ statex x x ,eachvaluey x y x y x maybecomputedsolelywithCNOTgates,givingatotal 1 2 n 1 1 2 2 n n ··· ⊕ ⊕···⊕ T-count equal to the number of odd elements of a – we call this the T-count of an implementation. While in this work we are only concerned with the T-count of the synthesized circuit, T-depth can be minimized while keeping T-count the same by parallelizing this process through matroid partitioning [1]. TheauthorsusedthissynthesisalgorithmtooptimizeT-countin CNOT,T circuitsbyfirstcomputingan { } implementation(setofcoefficients)fortheassociatedlinearphaseoperatorU fromthecircuitinpolynomial P time, then synthesizing an equivalent circuit. The remaining linear permutation is also computed and synthesizedseparatelyinpolynomialtime. Thisprocedurehasthecrucialpropertythatthesetofcoefficients computed have T-count at most the T-count of the original circuit – often much lower due to coefficients in the phase polynomial adding and cancelling – hence the resulting circuit has equal or lesser T-count. In particular, we have the following lemma, which relates the T-count of a CNOT,T circuit to the T-count { } of an implementation of the associated phase operator. Lemma1. LetU bealinearphaseoperator in (n). Thereexistsacircuitover CNOT,T implementing P 4 U with T-count k if and only if there exists aP Z2n−1 such that P(x) = P (x) {for all x }Fn, and a has P ∈ 8 a ∈ 2 T-count at most k. 6 IV. DECODING-BASED T-COUNT OPTIMIZATION WhileeffectiveatreducingT-count,itwasnotedthattheprocedurein[1]doesnotalwaysfindtheminimal T-count,as the phase polynomial P in questionmay have many different representationsas a weightedsum of linear Boolean functions. For instance, 4 x +4 x +4 (x x )=0 mod 8 1 2 1 2 · · · ⊕ for all values of x ,x F , so P(x ,x )=4 x +4 x +4 (x x ) is an alternative presentation of the 1 2 2 1 2 1 2 1 2 ∈ · · · ⊕ zero-everywhere (π/4) phase polynomial. This opens up the possibility of further T-count optimization by first finding an implementation of the target phase polynomial with a minimal number of odd coefficients, then synthesizing a circuit. By Lemma 1, this in fact finds a minimal T-count circuit. In this section we reduce this problem to a minimum-distance decoding problem and give a T-count optimization algorithm based on this decoding. Given a tuple a Z2n−1, we define [a] to be the equivalence class of implementations of P – i.e., b [a] if and only if P (x∈) =8P (x) for all x Fn (hence U = U ). Moreover, we define toa be the s∈et of implementationsaof the zbero-everywhere∈ph2ase polynomPaial. CPlebarly for any a Z2n−1,C[an] = a+ , since by we have P (x)+P (x)=P (x) for any a,b Z2n−1, x Fn. As a resul∈t we8 see that the prConblem of a b a+b ∈ 8 ∈ 2 finding an implementation of P minimizing T count is equivalent to finding an element c minimizing a n ∈C the number of odd coefficients in a+c. Whilethisoptimizationcouldbeperformeddirectlyover ,wecandobetterbyreducingtheoptimization n C problem to a decoding problem over a binary code where minimum-distance decoding corresponds exactly to T-countoptimization. Inparticular,recallthatthe T-countofanimplementationis givenby the number of odd coefficients. Defining Res : Z F as the function taking the binary residue of an integer and 2 2 extendingthiscomponent-wisetotuples→,weseethattheT-countofatuple a Z2n−1 isequaltothe weight ∈ 8 of the binary residue vector, i.e. wt(Res (a)). 2 Since Res is a ring homomorphism, we further see that 2 wt(Res (a+c))=wt(Res (a) Res (c))=d(Res (a),Res (c)), 2 2 2 2 2 ⊕ that,istheT-countofatuplea+cistheHammingdistancefromRes (a)toRes (c). Asaresult,optimizing 2 2 wt(Res (a+c))overallc isexactlythe problemofminimumdistancedecodingRes (a)overRes ( ), 2 n 2 2 n ∈C C the set of binary residue vectors of . Moreover, we note that Res ( ) is a binary linear code, since n 2 n C C Res (a) Res (b)=Res (a+b) Res ( ) for any a,b . In particular, this code turns out to be the 2 2 2 2 n n (n 4)th⊕order, length 2n 1 pun∈ctured RCeed-Muller cod∈e:C − − Theorem 2. Res ( )= (n 4,n)∗ 2 n C RM − We defer the proofofTheorem2 until SectionV and discuss someconsequences ofthis equivalence in the following. A. Upper bounds As a result of Lemma 1 and Theorem 2, the covering radius of Res ( ) = (n 4,n)∗ gives a tight 2 n C RM − upper bound on the number of T gates required to implement a linear phase operator over CNOT,T . { } Here we mean tight in the sense that there exists a linear phase operator which requires a minimum of ρ( (n 4,n)∗)T gatestoimplement over CNOT,T . While no analyticformulahasyetbeenfoundfor RM − { } the coveringradius of higher-orderReed-Muller codes, some asymptotic upper bounds are known. The best upper bound, to the authors’ knowledge, is due to Cohen and Litsyn [22]. In particular, Cohen and Litsyn showed that for large n and orders r where n r 3, − ≥ nn−r−2 ρ( (r,n)) . RM ≤ (n r 2)! − − 7 Since the covering radius of (r,n)∗ is trivially bounded above by ρ( (r,n)), we see that for suffi- ciently large n, ρ( (n 4,nR)∗M) n2 1. As a result we obtain a new aRsyMmptotic bound on the number RM − ≤ 2 − of T gates required to implement a circuit over CNOT,T . { } Theorem 3. Any linear phase operator U (n) can be implemented with O(n2) T-gates. p 4 ∈P B. T-count optimization Completing a T-count optimization procedure based on Reed-Muller decoding is not as immediate. In particular, decoding the binary residue Res (a) of a target tuple a Z2n−1 over (n 4,n)∗ produces a 2 ∈ 8 RM − minimalresidue c′ =Res (c)ofacodewordcin . ToactuallyproduceaminimalT-countimplementation 2 n C of P we need to compute c from Res (c) and then synthesize a+c. Fortunately, there is an easy way a n 2 ∈C to do this, given the following lemma. Lemma 2. For all y∈Fn2 with wt(y)≤n−4 we have vy11vy22···vynn ∈Cn. UsingLemma 2, whichwe proveinSectionV,andthe factthatthe setofallmonomialevaluationvectors with degree at most n 4, − = vy1vy2 vyn y Fn,wt(y) n 4 , B { 1 2 ··· n | ∈ 2 ≤ − } forms a basis for (n 4,n)∗, we can write a decoded word c′ over this basis then reinterpret the sum overZ . InparticuRlaMr,ifc−′ =b b b forsomeb ,b ,...,b ,thenc=b +b + +b , 8 1 2 k 1 2 k 1 2 k n ⊕ ⊕···⊕ ∈B ··· ∈C and Res (c)=Res (b +b + +b )=Res (b ) Res (b ) Res (b )=c′. 2 2 1 2 k 2 1 2 2 2 k ··· ⊕ ⊕···⊕ Algorithm 1 T-optimize(C) 1: Computecoefficients a∈Z82n−1 from C 2: c′ ←RM-DECODE(n−4, n, Res2(a)) 3: Writec′ over basis B: c′=b1⊕b2⊕···⊕bk 4: c←b1+b2+···+bk (mod 8) 5: SYNTHESIZE(a+c) Algorithm 1 summarizes our algorithm for T-count optimization in CNOT,T circuits. For simplicity { } the algorithm assumes the input circuit implements a linear phase operator – for more general CNOT,T { } circuits the extra linear permutation is synthesized and appended to the end. The algorithm works by computing a set of coefficients implementing the linear phase operator U computed by the circuit. The P vectorofresiduesmodulo2isthendecodedasc′ in (n 4,n)∗ usingtheprocedureRM-DECODE(n 4, RM − − n, Res (a)). Any variable order Reed-Muller decoder may then be used in RM-DECODE. In particular, 2 using a minimum distance decoder gives a minimal T-count synthesis procedure. A vector c with binary residue equal to c′ is then computed and added to the original set of n ∈ C coefficients, and a circuit is synthesized for the new implementation of P. In particular, the procedure SYNTHESIZE takes a set of coefficients a and synthesizes a circuit over CNOT,T implementing U . { } Pa The T-par algorithm [1], for instance, may be used to implement SYNTHESIZE. Example 2. Consider the circuit below: x1 x1 x1 T T† T T† x1 • • • • • • • • x2 Z x2 x2 T T† T† Z T T† T† x2 • • = • • • • x3 x3 x3 T T x3 • • • x4 P x4 x4 T T P x4 • • • 8 By iterating through the circuit and tracking the qubit states (see, e.g., [1]), we compute the phase polynomial for this operator as P(x)=2x +6x +6(x x )+x +7(x x )+7(x x )+(x x x ) 1 2 1 2 3 1 3 2 3 1 2 3 ⊕ ⊕ ⊕ ⊕ ⊕ +3x +7(x x )+7(x x )+(x x x ). 4 1 4 2 4 1 2 4 ⊕ ⊕ ⊕ ⊕ Writing the coefficients of P as a 2n 1-tuple a we get − a=(2,6,6,1,7,7,1,3,7,7,1,0,0,0,0), which has a canonical T-count of 8 – a reduction of 6 T gates. Now we optimize the implementation of P further by decoding Res (a)=(0,0,0,1,1,1,1,1,1,1,1,0,0,0,0) 2 in the code (0,4)∗. As (0,4)∗ is the set of evaluation vectors for degree 0 binary polynomials, RM RM there are exactly two vectors to choose from, corresponding to the zero (zero-everywhere) and constant (one-everywhere) functions. Since the all 1 vector achieves the minimum distance of 7 from Res (a), we 2 choose c′ to be the all 1 vector. By Lemma 2, c′ = 1 (mod 2) is already in the space of zero-everywhere polynomials , so steps 3 & 4 are trivial and we set c=1 (mod 8). Finally we synthesize a circuit for the n C tuple a+c=(3,7,7,2,0,0,2,4,0,0,2,1,1,1,1),corresponding to the phase polynomial P′(x)=3x +7x +7(x x )+2x +2(x x x )+4x +2(x x x ) 1 2 1 2 3 1 2 3 4 1 2 4 ⊕ ⊕ ⊕ ⊕ ⊕ +(x x )+(x x x )+(x x x )+(x x x x ). 3 4 1 3 4 2 3 4 1 2 3 4 ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ ⊕ A possible circuit implementing P′ is shown below: x1 T P T T x1 • • x2 T† T† x2 • • • • • • x3 P P x3 • • x4 Z T T P x4 • • Note that this decoding reduces the T-count from 14 (or 8, as T-par would obtain) to 7. Moreover, the number of T gates is equal to the distance from Res (a) to the decoded word c′. 2 C. Complexity It may be noted that Algorithm 1 gives a polynomial-time (in 2n) reduction from T-count optimization over CNOT,T to minimum-distance decoding in (n 4,n)∗. We may likewise reduce the minimum- distan{ce decodin}g problem for (n 4,n)∗ to TR-cMount−optimization: given a binary vector x F2n−1, synthesize U over CNOT,TRMthen o−ptimize the circuit and compute the coefficients a Z2n−∈1 fo2r the Px { } ∈ 8 optimized circuit. As a consequence of Theorem 2, the vector x Res (a) is a minimum distance decoding 2 ⊕ of x. Assuming the optimized circuit does not have exponentially more gates than a canonicalcircuit,1 this reduction is also polynomial in the word length 2n, so we see that the problems are in this sense equivalent. This equivalence lends evidence to the difficulty of T-count optimization, even in the restricted setting of circuits over CNOT and T gates. In particular, any sub-exponential algorithm for exact optimization of T-countovern-qubit CNOT,T circuits induces a polynomial-time minimum-distance decoding algorithm { } for (n 4,n). This can be further reduced to a linear-time algorithm by noting that the unitary U RM − Px 1 ThecanonicalcircuitforanylinearphaseoperatorusesO(n2n−1)gates. 9 above can be implemented with O(2n) gates using one ancilla and the Gray code to cycle through each of the 2n binary sums of n variables with one CNOT gate each. Ineithercaseitappearsveryunlikelythatanefficientalgorithmforminimum-distancedecodingtheorder n 4 punctured Reed-Muller code exists. No minimum distance decoding algorithms in time polynomial in 2n−or the Hamming weight of the received word are currently known for arbitrary order length 2n binary Reed-Muller codes. While some particular orders of Reed-Muller codes have efficient decoders, e.g., order 1, it was shown in [23] that minimum-distance decoding for (n 4,n)∗ is equivalent to the problem of RM − finding a minimal decompositionofa symmetric 3-tensorinto symmetric tryads(rank 1 3-tensors),a known hard problem. V. PROOF OF THEOREM 2 In this section we prove Theorem 2, and as a corollary Lemma 2. A. The monomial basis Our proof relies on a connection between the binary evaluations of polynomials over Z and the module 8 Z2n−1. In particular, consider the set of degree at most n 1 monomial (Boolean) evaluation vectors 8 − vy1vy2 vyn y Fn 1 . { 1 2 ··· n | ∈ 2 \{ }} We show that this set of vectors, under the natural inclusion of F in Z , forms a generating set for Z2n−1 2 8 8 – moreover, since each such vector is linearly independent over F2n−1 and hence also linearly independent 2 over Z2n−1, this set is in fact a basis. We call this basis the monomial basis and give a proof. 8 Lemma 3. vy1vy2 vyn y Fn 1 is a basis of Z2n−1 { 1 2 ··· n | ∈ 2 \{ }} 8 Proof. We first note that the set of all non-constant monomial evaluation vectors, vy1vy2 vyn y Fn 0 ,isabasisforthemoduleZ2n−1. Inparticular,foranyy Fn 0 thevector{vy11vy22 ···vynn c|ont∈ain2sa\ l{ea}d}ing1attheythindex(e.g.,T8ableI),andhenceanytupleof∈Z2n2−\1{m}aybewritten1asa2li·n·e·arncombination 8 oveItrmthaisysbeet.oIbtsethrveerdefothreatsuoffivecresFt,otphreovseettohfatalvl1mv2on··o·mvinalisevinaluthaetiospnavnecotfo{rsvy1is1vliy2n2e·a·r·lvyyndne|pyen∈deFnn2t,\a{n1d}}in. 2 particular that vy1vy2 vyn =0 M 1 2 ··· n y∈Fn 2 sinceeveryinputevaluatesto1foranevennumberofmonomials. Further,asRes ishomomorphicwehave 2 vy1vy2 vyn =Res  vy1vy2 vyn=0 M 1 2 ··· n 2 X 1 2 ··· n y∈Fn y∈Fn  2 2 {avndy11vsoy22P··y·∈vFynn2nv|y1y1v∈y22F·n2··\v{yn0n}}=aandfomr osovme aellain∈stZa28nnc−es1osfucvh1vt2h·a·t·Rvnest2o(at)he=le0f.t wIfewseeewrite a over the basis b·v1v2···vn =a′− X vy11vy22···vynn y∈Fn\{1} 2 where a′ is in the span of {vy11vy22···vynn |y∈Fn2 \{0,1}} and b∈Z8. 10