1 ========================= 2 ALSA Compress-Offload API 3 ========================= 4 5 Pierre-Louis.Bossart <pierre-louis.bossart@linu 6 7 Vinod Koul <vinod.koul@linux.intel.com> 8 9 10 Overview 11 ======== 12 Since its early days, the ALSA API was defined 13 constant bitrates payloads such as IEC61937 in 14 returned values in frames are the norm, making 15 extend the existing API to compressed data str 16 17 In recent years, audio digital signal processo 18 in system-on-chip designs, and DSPs are also i 19 codecs. Processing compressed data on such DSP 20 reduction of power consumption compared to hos 21 processing. Support for such hardware has not 22 mostly because of a lack of a generic API avai 23 kernel. 24 25 Rather than requiring a compatibility break wi 26 ALSA PCM interface, a new 'Compressed Data' AP 27 provide a control and data-streaming interface 28 29 The design of this API was inspired by the 2-y 30 Intel Moorestown SOC, with many corrections re 31 API in the mainline kernel instead of the stag 32 usable by others. 33 34 35 Requirements 36 ============ 37 The main requirements are: 38 39 - separation between byte counts and time. Com 40 a header per file, per frame, or no header a 41 may vary from frame-to-frame. As a result, i 42 estimate reliably the duration of audio buff 43 compressed data. Dedicated mechanisms are re 44 reliable audio-video synchronization, which 45 reporting of the number of samples rendered 46 47 - Handling of multiple formats. PCM data only 48 of the sampling rate, number of channels and 49 contrast, compressed data comes in a variety 50 may also provide support for a limited numbe 51 decoders embedded in firmware, or may suppor 52 dynamic download of libraries. 53 54 - Focus on main formats. This API provides sup 55 popular formats used for audio and video cap 56 likely that as audio compression technology 57 will be added. 58 59 - Handling of multiple configurations. Even fo 60 AAC, some implementations may support AAC mu 61 stereo. Likewise WMA10 level M3 may require 62 cycles. The new API needs to provide a gener 63 formats. 64 65 - Rendering/Grabbing only. This API does not p 66 hardware acceleration, where PCM samples are 67 user-space for additional processing. This A 68 streaming compressed data to a DSP, with the 69 decoded samples are routed to a physical out 70 71 - Complexity hiding. Existing user-space multi 72 have existing enums/structures for each comp 73 API assumes the existence of a platform-spec 74 to expose, translate and make use of the cap 75 DSP, eg. Android HAL or PulseAudio sinks. By 76 applications are not supposed to make use of 77 78 79 Design 80 ====== 81 The new API shares a number of concepts with t 82 control. Start, pause, resume, drain and stop 83 semantics no matter what the content is. 84 85 The concept of memory ring buffer divided in a 86 borrowed from the ALSA PCM API. However, only 87 specified. 88 89 Seeks/trick modes are assumed to be handled by 90 91 The notion of rewinds/forwards is not supporte 92 ring buffer cannot be invalidated, except when 93 94 The Compressed Data API does not make any assu 95 is transmitted to the audio DSP. DMA transfers 96 embedded audio cluster or to a SPI interface f 97 possible. As in the ALSA PCM case, a core set 98 each driver implementer will have to write sup 99 mandatory routines and possibly make use of op 100 101 The main additions are 102 103 get_caps 104 This routine returns the list of audio forma 105 codecs on a capture stream will return encod 106 listed for playback streams. 107 108 get_codec_caps 109 For each codec, this routine returns a list 110 capabilities. The intent is to make sure all 111 correspond to valid settings, and to minimiz 112 configuration failures. For example, for a c 113 the number of channels supported may depend 114 the capabilities were exposed with a single 115 that a specific combination of profiles/chan 116 supported. Likewise, embedded DSPs have limi 117 it is likely that some implementations make 118 dynamic and dependent on existing workloads. 119 settings, this routine returns the minimum b 120 implementation. This information can be a fu 121 sizes, the number of bytes required to synch 122 used by userspace to define how much needs t 123 buffer before playback can start. 124 125 set_params 126 This routine sets the configuration chosen f 127 most important field in the parameters is th 128 cases decoders will ignore other fields, whi 129 comply to the settings 130 131 get_params 132 This routines returns the actual settings us 133 the settings should remain the exception. 134 135 get_timestamp 136 The timestamp becomes a multiple field struc 137 of bytes transferred, the number of samples 138 of samples rendered/grabbed. All these value 139 the average bitrate, figure out if the ring 140 refilled or the delay due to decoding/encodi 141 142 Note that the list of codecs/profiles/modes wa 143 OpenMAX AL specification instead of reinventin 144 Modifications include: 145 - Addition of FLAC and IEC formats 146 - Merge of encoder/decoder capabilities 147 - Profiles/modes listed as bitmasks to make de 148 - Addition of set_params for decoders (missing 149 - Addition of AMR/AMR-WB encoding modes (missi 150 - Addition of format information for WMA 151 - Addition of encoding options when required ( 152 - Addition of rateControlSupported (missing in 153 154 State Machine 155 ============= 156 157 The compressed audio stream state machine is d 158 159 +----- 160 | 161 | OP 162 | 163 +----- 164 | 165 | 166 | 167 | 168 v 169 compr_free() +------ 170 +------------------------------------| 171 | | SET 172 | +-------------------------| 173 | | compr_write() +------ 174 | | ^ 175 | | | 176 | | | 177 | | | 178 | | | 179 | | +------ 180 | | | 181 | | | DRA 182 | | | 183 | | +------ 184 | | ^ 185 | | | 186 | | | 187 | | | 188 | v | 189 | +----------+ +------ 190 | | | compr_start() | 191 | | PREPARE |------------------->| RUNN 192 | | | | 193 | +----------+ +------ 194 | | | 195 | |compr_free() | 196 | | compr_pause() | 197 | | | 198 | v v 199 | +----------+ +------- 200 | | | | 201 +--->| FREE | | PAUSE 202 | | | 203 +----------+ +------- 204 205 206 Gapless Playback 207 ================ 208 When playing thru an album, the decoders have 209 delay and padding and directly move from one t 210 user can perceive this as gapless playback as 211 switching from one track to another 212 213 Also, there might be low-intensity noises due 214 difficult to reach with all types of compresse 215 music content. The decoder needs to know the e 216 So we need to pass this to DSP. This metadata 217 and are not present by default in the bitstrea 218 interface to pass this information to the DSP. 219 switch from one track to another and start usi 220 221 The main additions are: 222 223 set_metadata 224 This routine sets the encoder delay and enco 225 decoder to strip the silence. This needs to 226 is written. 227 228 set_next_track 229 This routine tells DSP that metadata and wri 230 correspond to subsequent track 231 232 partial drain 233 This is called when end of file is reached. 234 EOF is reached and now DSP can start skippin 235 data would belong to next track 236 237 Sequence flow for gapless would be: 238 - Open 239 - Get caps / codec caps 240 - Set params 241 - Set metadata of the first track 242 - Fill data of the first track 243 - Trigger start 244 - User-space finished sending all, 245 - Indicate next track data by sending set_next 246 - Set metadata of the next track 247 - then call partial_drain to flush most of buf 248 - Fill data of the next track 249 - DSP switches to second track 250 251 (note: order for partial_drain and write for n 252 253 Gapless Playback SM 254 =================== 255 256 For Gapless, we move from running state to par 257 with setting of meta_data and signalling for n 258 259 260 +----- 261 compr_drain_notify() | 262 +------------------------>| RUN 263 | | 264 | +----- 265 | | 266 | | 267 | | 268 | | 269 | V 270 | +----- 271 | compr_set_params() | 272 | +-----------|NEXT_ 273 | | | 274 | | +--+-- 275 | | | | 276 | +--------------+ | 277 | | 278 | | 279 | | 280 | V 281 | +----- 282 | | 283 +------------------------ | PART 284 | DRA 285 +----- 286 287 Not supported 288 ============= 289 - Support for VoIP/circuit-switched calls is n 290 API. Support for dynamic bit-rate changes wo 291 coupling between the DSP and the host stack, 292 293 - Packet-loss concealment is not supported. Th 294 additional interface to let the decoder synt 295 are lost during transmission. This may be ad 296 297 - Volume control/routing is not handled by thi 298 compressed data interface will be considered 299 volume changes and routing information will 300 ALSA kcontrols. 301 302 - Embedded audio effects. Such effects should 303 manner, no matter if the input was PCM or co 304 305 - multichannel IEC encoding. Unclear if this i 306 307 - Encoding/decoding acceleration is not suppor 308 above. It is possible to route the output of 309 stream, or even implement transcoding capabi 310 would be enabled with ALSA kcontrols. 311 312 - Audio policy/resource management. This API d 313 hooks to query the utilization of the audio 314 mechanisms. 315 316 - No notion of underrun/overrun. Since the byt 317 in nature and data written/read doesn't tran 318 rendered output in time, this does not deal 319 maybe dealt in user-library 320 321 322 Credits 323 ======= 324 - Mark Brown and Liam Girdwood for discussions 325 - Harsha Priya for her work on intel_sst compr 326 - Rakesh Ughreja for valuable feedback 327 - Sing Nallasellan, Sikkandar Madar and Prasan 328 demonstrating and quantifying the benefits o 329 real platform.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.