Semantic Aspects of Parallelism for SuperCollider Tim BLECHMANN Vienna, Austria [email protected] Abstract (synths or nested groups). The groups there- fore form a hierarchical tree data structure, the Supernova is a new implementation of the Super- node graph with a group as root of the tree. Colliderserverscsynth,withamulti-threadedaudio synthesis engine. To make use of this thread-level Groups are used for two purposes. First, parallelism, two extensions have been introduced to they define the order of execution of their child theconceptoftheSuperCollidernodegraph,expos- nodes, which are evalutated sequentially from ing parallelism explicitly to the user. This paper head to tail using a depth-first traversal algo- discusses the semantic inplications of these exten- rithm. The node graph therefore defines a to- sions. tal order, in which synths are evaluated. The second use case for groups is to structure the Keywords audio synthesis and to be able to address multi- SuperCollider, Supernova, parallelism, multi-core ple synths as one entity. When sending a node command to a group it is applied to all its child 1 Introduction nodes. Groups can be moved inside the node graph like a single node. These days, the development of audio synthe- sis applications is mainly focussed on off-the- 2.1 Semantic Constraints for shelf hardware and software. While some em- Parallelization bedded,low-powerormobilesystemsusesingle- The node graph is designed as data structure core CPUs, most computer systems which are for structuring synths in a hierarchical man- actually used in musical production use multi- ner. Traversing the tree structure is used to corehardware. Exceptforsomenetbooks, most determine the order of execution, but it does laptops use dual-core CPUs, single-core work- not contain any notion of parallelism. While stations are getting rare. synths may be able to run in parallel, it is im- Traditionally, audio synthesis engines are de- possible for the synthesis engine to know this signedtouseasinglethreadforaudiocomputa- in advance. Synths do not communicate with tion. In order to use multiple CPU cores for au- each other directly, but instead they use global dio computation, this design has to be adapted busses to exchange audio data. So any auto- by parallelizing the signal processing work. matic parallelization would have to create a de- This paper is divided into the following sec- pendency graph depending on the access pat- tions: Section 2 describes the SuperCollider tern of synths to global resources. The current nodegraphwhichisthebasefortheparalleliza- implementationlacksapossibilitytodetermine, tion of Supernova. Sections 3 introduces the which global resources are accessed by a synth. Supernova extensions to SuperCollider with a But even if it would be possible, the resources focus on their semantic aspects. Section 4 dis- which are accessed by a synth are not constant, cusses different approaches of other parallel au- but can change at control rate or even at au- dio synthesis systems. dio rate. Introducing automatic parallelization would therefore introduce a constant overhead 2 SuperCollider Node Graph and the parallelism would be limited by the SuperCollider has a distinction between instru- granularity in which resource access could be ment definitions, called SynthDefs, and their predicted by the runtime system. instantiations, called Synths. Synths are orga- Using pipelining techniques to increase the nizedingroups,whicharelinkedlistsof nodes throughput would only be of limited use, ei- ther. The synthesis engine dispatches com- 3.2 Satellite Nodes mands at control rate and during the execu- Parallel groups have one disadvantage. Each tion of each command, it needs to have a syn- member of a parallel group is still synchronized chronized view of the node graph. In order with two other nodes, it is executed after the to implement pipelining across the boundaries parallel group’s predecessor and before its suc- of control rate blocks, a speculative pipelining cessor. For many use cases, only one relation with a rollback mechanism would have to be is actually required. Many generating synths used. This approach would only be interesting can be started without waiting for any prede- for non-realtime synthesis. Introducing pipelin- cessor, while synths for disk recording or peak ing inside control-rate blocks would only be of followersforGUIapplicationscanstartrunning limited use, since control rate blocks are typi- after their predecessor has been executed, but callysmall(usually64samples). Alsothewhole no successor has to wait for its result. unit generator API would need to be restruc- These use cases can be formulated using tured, imposing considerable rewrite effort. satellite nodes. These satellite nodes, are Since neither automatic graph parallelization nodes which are in dependency relation with nor pipelining a feasible, we introduced new only one reference node. The resulting depen- concepts to the node graph in order to expose dency graph has a more fine-grained structure, parallelism explicitly to the user. compared to a dependency graph, which is only using parallel groups. 3 Extending the SuperCollider Node Listing 2 shows, how the example of Listing 1 Graph canbeformulatedwithsatellitenodesunderthe assumption, that none of the generator synths To make use of thread-level parallelism, Super- depends on the result of any earlier synth. In- novaintroducestwoextensionstotheSuperCol- stead of packing the generators into a parallel lider node graph. This enables the user to for- group, they are simply defined as satellite pre- mulate parallelism explicitly when defining the decessors of the effect synth. synthesis graph. It is even possible to prioritize dependency graph nodes to optimize graph progress. In or- 3.1 Parallel Groups der to achive the best throughput, we need to The first extension to the node graph are par- ensure, that there are always at least as many allel groups. AsdescribedinSection2, groups parallel jobs available as audio threads. To en- are linked lists of nodes which are evaluated in sure this, a simple heuristic can be used, which sequential order. Parallel groups have the same alwaystriestoincreasethenumberofjobs, that semantics as groups, but with the exception, are actually runnable. that their child nodes are not ordered. This im- plies that they can be executed in in separate • Nodes with successors have a higher prior- threads. This concept is similar to the SEQ ity than nodes without. and PAR statements, which specify blocks of • Nodes with successors early in the depen- sequential and parallel statements in the con- dency graph have a high priority. current programming language [Hyde, 1995]. Parallel groups are very easy to use in ex- These rules can be realized with a heuris- isting code. Especially for additive synthesis tic that splits the nodes into three categories or granular synthesis with many voices, it is quite convenient to instantiate synths inside a parallel groups, especially since many users al- Listing 1: Parallel Group Example ready use groups for these use cases in order var generator_group, fx; to structure the synthesis graph. For other use generator_group = ParGroup.new; cases like polyphonic phrases, all independent 4.do { phrasescouldbecomputedinsidegroups,which Synth.head(generator_group, are themselves part of a parallel group. \myGenerator) Listing 1 shows a simple example, how paral- }; lel groups can be used to write a simple poly- fx = Synth.after(generator_group, phonic synthesizer of 4 synths, which are evalu- \myFx); ated before a effect synth. with different priorities: ‘regular’ nodes having less additional functionality is implemented to the highest priority, satellite predecessors with read always those data, which are written dur- medium priority and satellite successors with ing the previous cycle. Satellite nodes cannot low priority. While it is far from optimal, this be used to model the data flow between JITLIB heuristic can easily be implemented with three nodes, since they cannot be used to formulate lock-free queues, so it is easy to use it in a real- cycles. time context. 4 Related Work 3.3 Common Use Cases & Library During the last few years, support for multi- Integration core audio synthesis has been introduced into The SuperCollider language contains a huge different systems, that impose different seman- class library. Some parts of the library are de- tic constraints. signed to help with the organization of the au- 4.1 Max/FTS, Pure Data & Max/MSP dio synthesis like the pattern sequencer library Oneoftheearliestcomputer-musicsystems,the or the Just-In-Time programming environment Ircam Signal Processing Workstation (ISPW) JITLIB. [Puckette, 1991], used the Max dataflow lan- The pattern sequencer library is a powerful guage to control the signal processing engine, library, that can be used to create sequences of running on a multi-processor extension board Events. Events are dictionaries, which can be of a NeXT computer. FTS was implementing interpeted as musical events, with specific keys explicit pipelining, so patches could be defined havingpredefinedsemanticsasmusicalparame- to run on a specific CPU. When audio data was ters [McCartney, ]. Events may contain are the sent from one CPU to another, it was delayed keysgroupandaddAction, whichifpresentare by one audio block size. used to specify the position of a node on the Recently a similar approach has been imple- server. With these keys, both parallel groups mentedforPureData[Puckette,2008]. Thepd~ and satellite nodes can be used from a pattern object can be used to load a subpatch as a sep- environment. In many cases, the pattern se- arate process. Moving audio data between par- quencerlibraryisusedinawaythatthecreated ent and child process adds one block of latency, synths are mutually independent and do not re- similar to FTS. Therefore it is not easily possi- quire data from other synths. In these cases bletomodifyexistingpatcheswithoutchanging both parallel groups and satellite predecessors the semantics, unless a latency compensation is can safely be used. taken into account. The situation is a bit different with JITLIB. The latest versions of Max/MSP contains a When using JITLIB, the handling of the syn- poly~object,whichcanrunseveralinstancesof thesisgraphiscompletelyhiddenfromtheuser, the same abstraction in multiple threads. How- since the library wraps every syntesis node in- ever, it is not documented, if the signal is de- side a proxy object. JITLIB nodes communi- layed by a certain ammount of samples or not. cate with each other using global busses. This And since only the same abstraction can be dis- approach makes it easy to take the output of tributed to multiple processors, it is not a gen- one node as input of another and to quickly re- eral purpose solution. configurethesynthesisgraph. JITLIBtherefore An automatic parallelization of max-like sys- requiresadeterministicorderfortheread/write tems is rather difficult to achieve, because max- access to busses, which cannot be guaranteed graphshavebothexplicitdependencies(thesig- wheninstantiating nodesinparallel groups, un- nal flow) and implicit ones (resource access). In order to keep the semantics of the sequential program, one would have to introduce implicit Listing 2: Satellite Node example dependencies between all objects, which access var fx = Synth.new(\myFx); a specific resource. 4.do { 4.2 CSound Synth.preceeding(fx, Recent versions of CSound implement auto- \myGenerator) matic parallelization in order to make use of }; multicore hardware [ffitch, 2009]. This is fea- sible, because the CSound parser has a lot of knowledge about resource access patterns and the instrument graph is more constrained com- pared to SuperCollider. Therefore the CSound compilercaninfermanydependenciesautomat- ically, but if this is not the case, the sequential implementation needs to be emulated. The automatic parallelization has the advan- tage, that existing code can make use of multi- core hardware without any modifications. 4.3 Faust Faust supports backends for parallelization, an open-mp based code generator and a custom work-stealingscheduler[Letzetal.,2010]. Since Faust is only a signal processing language, with little notion of control structures. Since Faust is a compiled language, it cannot be used to dy- namically modify the signal graph. 5 Conclusions The proposed extensions to the SuperCollider node graph enable the user to formulate signal graphparallelismexplicitly. Theyintegratewell into the concepts of SuperCollider and can be used to parallelize many use cases, which regu- larly appear in computer music applications. References John ffitch. 2009. Parallel Execution of Csound. In Proceedings of the International Computer Music Conference. DanielC.Hyde,1995.Introductiontothepro- gramming language Occam. Stphane Letz, Yann Orlarey, and Dominique Fober. 2010. WorkStealingSchedulerforAu- tomatic Parallelization in Faust. In Proceed- ings of the Linux Audio Conference. James McCartney. SuperCollider Manual. Miller Puckette. 1991. Combining Event and Signal Processing in the MAX Graphical ProgrammingEnvironment. ComputerMusic Journal, 15(3):68–77. Miller Puckette. 2008. Thoughts on Parallel Computing for Music. In Proceedings of the International Computer Music Conference.