1 ========================= 1 ========================= 2 ALSA Compress-Offload API 2 ALSA Compress-Offload API 3 ========================= 3 ========================= 4 4 5 Pierre-Louis.Bossart <pierre-louis.bossart@linu 5 Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com> 6 6 7 Vinod Koul <vinod.koul@linux.intel.com> 7 Vinod Koul <vinod.koul@linux.intel.com> 8 8 9 9 10 Overview 10 Overview 11 ======== 11 ======== 12 Since its early days, the ALSA API was defined 12 Since its early days, the ALSA API was defined with PCM support or 13 constant bitrates payloads such as IEC61937 in 13 constant bitrates payloads such as IEC61937 in mind. Arguments and 14 returned values in frames are the norm, making 14 returned values in frames are the norm, making it a challenge to 15 extend the existing API to compressed data str 15 extend the existing API to compressed data streams. 16 16 17 In recent years, audio digital signal processo 17 In recent years, audio digital signal processors (DSP) were integrated 18 in system-on-chip designs, and DSPs are also i 18 in system-on-chip designs, and DSPs are also integrated in audio 19 codecs. Processing compressed data on such DSP 19 codecs. Processing compressed data on such DSPs results in a dramatic 20 reduction of power consumption compared to hos 20 reduction of power consumption compared to host-based 21 processing. Support for such hardware has not 21 processing. Support for such hardware has not been very good in Linux, 22 mostly because of a lack of a generic API avai 22 mostly because of a lack of a generic API available in the mainline 23 kernel. 23 kernel. 24 24 25 Rather than requiring a compatibility break wi 25 Rather than requiring a compatibility break with an API change of the 26 ALSA PCM interface, a new 'Compressed Data' AP 26 ALSA PCM interface, a new 'Compressed Data' API is introduced to 27 provide a control and data-streaming interface 27 provide a control and data-streaming interface for audio DSPs. 28 28 29 The design of this API was inspired by the 2-y 29 The design of this API was inspired by the 2-year experience with the 30 Intel Moorestown SOC, with many corrections re 30 Intel Moorestown SOC, with many corrections required to upstream the 31 API in the mainline kernel instead of the stag 31 API in the mainline kernel instead of the staging tree and make it 32 usable by others. 32 usable by others. 33 33 34 34 35 Requirements 35 Requirements 36 ============ 36 ============ 37 The main requirements are: 37 The main requirements are: 38 38 39 - separation between byte counts and time. Com 39 - separation between byte counts and time. Compressed formats may have 40 a header per file, per frame, or no header a 40 a header per file, per frame, or no header at all. The payload size 41 may vary from frame-to-frame. As a result, i 41 may vary from frame-to-frame. As a result, it is not possible to 42 estimate reliably the duration of audio buff 42 estimate reliably the duration of audio buffers when handling 43 compressed data. Dedicated mechanisms are re 43 compressed data. Dedicated mechanisms are required to allow for 44 reliable audio-video synchronization, which 44 reliable audio-video synchronization, which requires precise 45 reporting of the number of samples rendered 45 reporting of the number of samples rendered at any given time. 46 46 47 - Handling of multiple formats. PCM data only 47 - Handling of multiple formats. PCM data only requires a specification 48 of the sampling rate, number of channels and 48 of the sampling rate, number of channels and bits per sample. In 49 contrast, compressed data comes in a variety 49 contrast, compressed data comes in a variety of formats. Audio DSPs 50 may also provide support for a limited numbe 50 may also provide support for a limited number of audio encoders and 51 decoders embedded in firmware, or may suppor 51 decoders embedded in firmware, or may support more choices through 52 dynamic download of libraries. 52 dynamic download of libraries. 53 53 54 - Focus on main formats. This API provides sup 54 - Focus on main formats. This API provides support for the most 55 popular formats used for audio and video cap 55 popular formats used for audio and video capture and playback. It is 56 likely that as audio compression technology 56 likely that as audio compression technology advances, new formats 57 will be added. 57 will be added. 58 58 59 - Handling of multiple configurations. Even fo 59 - Handling of multiple configurations. Even for a given format like 60 AAC, some implementations may support AAC mu 60 AAC, some implementations may support AAC multichannel but HE-AAC 61 stereo. Likewise WMA10 level M3 may require 61 stereo. Likewise WMA10 level M3 may require too much memory and cpu 62 cycles. The new API needs to provide a gener 62 cycles. The new API needs to provide a generic way of listing these 63 formats. 63 formats. 64 64 65 - Rendering/Grabbing only. This API does not p 65 - Rendering/Grabbing only. This API does not provide any means of 66 hardware acceleration, where PCM samples are 66 hardware acceleration, where PCM samples are provided back to 67 user-space for additional processing. This A 67 user-space for additional processing. This API focuses instead on 68 streaming compressed data to a DSP, with the 68 streaming compressed data to a DSP, with the assumption that the 69 decoded samples are routed to a physical out 69 decoded samples are routed to a physical output or logical back-end. 70 70 71 - Complexity hiding. Existing user-space multi 71 - Complexity hiding. Existing user-space multimedia frameworks all 72 have existing enums/structures for each comp 72 have existing enums/structures for each compressed format. This new 73 API assumes the existence of a platform-spec 73 API assumes the existence of a platform-specific compatibility layer 74 to expose, translate and make use of the cap 74 to expose, translate and make use of the capabilities of the audio 75 DSP, eg. Android HAL or PulseAudio sinks. By 75 DSP, eg. Android HAL or PulseAudio sinks. By construction, regular 76 applications are not supposed to make use of 76 applications are not supposed to make use of this API. 77 77 78 78 79 Design 79 Design 80 ====== 80 ====== 81 The new API shares a number of concepts with t 81 The new API shares a number of concepts with the PCM API for flow 82 control. Start, pause, resume, drain and stop 82 control. Start, pause, resume, drain and stop commands have the same 83 semantics no matter what the content is. 83 semantics no matter what the content is. 84 84 85 The concept of memory ring buffer divided in a 85 The concept of memory ring buffer divided in a set of fragments is 86 borrowed from the ALSA PCM API. However, only 86 borrowed from the ALSA PCM API. However, only sizes in bytes can be 87 specified. 87 specified. 88 88 89 Seeks/trick modes are assumed to be handled by 89 Seeks/trick modes are assumed to be handled by the host. 90 90 91 The notion of rewinds/forwards is not supporte 91 The notion of rewinds/forwards is not supported. Data committed to the 92 ring buffer cannot be invalidated, except when 92 ring buffer cannot be invalidated, except when dropping all buffers. 93 93 94 The Compressed Data API does not make any assu 94 The Compressed Data API does not make any assumptions on how the data 95 is transmitted to the audio DSP. DMA transfers 95 is transmitted to the audio DSP. DMA transfers from main memory to an 96 embedded audio cluster or to a SPI interface f 96 embedded audio cluster or to a SPI interface for external DSPs are 97 possible. As in the ALSA PCM case, a core set 97 possible. As in the ALSA PCM case, a core set of routines is exposed; 98 each driver implementer will have to write sup 98 each driver implementer will have to write support for a set of 99 mandatory routines and possibly make use of op 99 mandatory routines and possibly make use of optional ones. 100 100 101 The main additions are 101 The main additions are 102 102 103 get_caps 103 get_caps 104 This routine returns the list of audio forma 104 This routine returns the list of audio formats supported. Querying the 105 codecs on a capture stream will return encod 105 codecs on a capture stream will return encoders, decoders will be 106 listed for playback streams. 106 listed for playback streams. 107 107 108 get_codec_caps 108 get_codec_caps 109 For each codec, this routine returns a list 109 For each codec, this routine returns a list of 110 capabilities. The intent is to make sure all 110 capabilities. The intent is to make sure all the capabilities 111 correspond to valid settings, and to minimiz 111 correspond to valid settings, and to minimize the risks of 112 configuration failures. For example, for a c 112 configuration failures. For example, for a complex codec such as AAC, 113 the number of channels supported may depend 113 the number of channels supported may depend on a specific profile. If 114 the capabilities were exposed with a single 114 the capabilities were exposed with a single descriptor, it may happen 115 that a specific combination of profiles/chan 115 that a specific combination of profiles/channels/formats may not be 116 supported. Likewise, embedded DSPs have limi 116 supported. Likewise, embedded DSPs have limited memory and cpu cycles, 117 it is likely that some implementations make 117 it is likely that some implementations make the list of capabilities 118 dynamic and dependent on existing workloads. 118 dynamic and dependent on existing workloads. In addition to codec 119 settings, this routine returns the minimum b 119 settings, this routine returns the minimum buffer size handled by the 120 implementation. This information can be a fu 120 implementation. This information can be a function of the DMA buffer 121 sizes, the number of bytes required to synch 121 sizes, the number of bytes required to synchronize, etc, and can be 122 used by userspace to define how much needs t 122 used by userspace to define how much needs to be written in the ring 123 buffer before playback can start. 123 buffer before playback can start. 124 124 125 set_params 125 set_params 126 This routine sets the configuration chosen f 126 This routine sets the configuration chosen for a specific codec. The 127 most important field in the parameters is th 127 most important field in the parameters is the codec type; in most 128 cases decoders will ignore other fields, whi 128 cases decoders will ignore other fields, while encoders will strictly 129 comply to the settings 129 comply to the settings 130 130 131 get_params 131 get_params 132 This routines returns the actual settings us 132 This routines returns the actual settings used by the DSP. Changes to 133 the settings should remain the exception. 133 the settings should remain the exception. 134 134 135 get_timestamp 135 get_timestamp 136 The timestamp becomes a multiple field struc 136 The timestamp becomes a multiple field structure. It lists the number 137 of bytes transferred, the number of samples 137 of bytes transferred, the number of samples processed and the number 138 of samples rendered/grabbed. All these value 138 of samples rendered/grabbed. All these values can be used to determine 139 the average bitrate, figure out if the ring 139 the average bitrate, figure out if the ring buffer needs to be 140 refilled or the delay due to decoding/encodi 140 refilled or the delay due to decoding/encoding/io on the DSP. 141 141 142 Note that the list of codecs/profiles/modes wa 142 Note that the list of codecs/profiles/modes was derived from the 143 OpenMAX AL specification instead of reinventin 143 OpenMAX AL specification instead of reinventing the wheel. 144 Modifications include: 144 Modifications include: 145 - Addition of FLAC and IEC formats 145 - Addition of FLAC and IEC formats 146 - Merge of encoder/decoder capabilities 146 - Merge of encoder/decoder capabilities 147 - Profiles/modes listed as bitmasks to make de 147 - Profiles/modes listed as bitmasks to make descriptors more compact 148 - Addition of set_params for decoders (missing 148 - Addition of set_params for decoders (missing in OpenMAX AL) 149 - Addition of AMR/AMR-WB encoding modes (missi 149 - Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL) 150 - Addition of format information for WMA 150 - Addition of format information for WMA 151 - Addition of encoding options when required ( 151 - Addition of encoding options when required (derived from OpenMAX IL) 152 - Addition of rateControlSupported (missing in 152 - Addition of rateControlSupported (missing in OpenMAX AL) 153 153 154 State Machine 154 State Machine 155 ============= 155 ============= 156 156 157 The compressed audio stream state machine is d 157 The compressed audio stream state machine is described below :: 158 158 159 +----- 159 +----------+ 160 | 160 | | 161 | OP 161 | OPEN | 162 | 162 | | 163 +----- 163 +----------+ 164 | 164 | 165 | 165 | 166 | 166 | compr_set_params() 167 | 167 | 168 v 168 v 169 compr_free() +------ 169 compr_free() +----------+ 170 +------------------------------------| 170 +------------------------------------| | 171 | | SET 171 | | SETUP | 172 | +-------------------------| 172 | +-------------------------| |<-------------------------+ 173 | | compr_write() +------ 173 | | compr_write() +----------+ | 174 | | ^ 174 | | ^ | 175 | | | 175 | | | compr_drain_notify() | 176 | | | 176 | | | or | 177 | | | 177 | | | compr_stop() | 178 | | | 178 | | | | 179 | | +------ 179 | | +----------+ | 180 | | | 180 | | | | | 181 | | | DRA 181 | | | DRAIN | | 182 | | | 182 | | | | | 183 | | +------ 183 | | +----------+ | 184 | | ^ 184 | | ^ | 185 | | | 185 | | | | 186 | | | 186 | | | compr_drain() | 187 | | | 187 | | | | 188 | v | 188 | v | | 189 | +----------+ +------ 189 | +----------+ +----------+ | 190 | | | compr_start() | 190 | | | compr_start() | | compr_stop() | 191 | | PREPARE |------------------->| RUNN 191 | | PREPARE |------------------->| RUNNING |--------------------------+ 192 | | | | 192 | | | | | | 193 | +----------+ +------ 193 | +----------+ +----------+ | 194 | | | 194 | | | ^ | 195 | |compr_free() | 195 | |compr_free() | | | 196 | | compr_pause() | 196 | | compr_pause() | | compr_resume() | 197 | | | 197 | | | | | 198 | v v 198 | v v | | 199 | +----------+ +------- 199 | +----------+ +----------+ | 200 | | | | 200 | | | | | compr_stop() | 201 +--->| FREE | | PAUSE 201 +--->| FREE | | PAUSE |---------------------------+ 202 | | | 202 | | | | 203 +----------+ +------- 203 +----------+ +----------+ 204 204 205 205 206 Gapless Playback 206 Gapless Playback 207 ================ 207 ================ 208 When playing thru an album, the decoders have 208 When playing thru an album, the decoders have the ability to skip the encoder 209 delay and padding and directly move from one t 209 delay and padding and directly move from one track content to another. The end 210 user can perceive this as gapless playback as 210 user can perceive this as gapless playback as we don't have silence while 211 switching from one track to another 211 switching from one track to another 212 212 213 Also, there might be low-intensity noises due 213 Also, there might be low-intensity noises due to encoding. Perfect gapless is 214 difficult to reach with all types of compresse 214 difficult to reach with all types of compressed data, but works fine with most 215 music content. The decoder needs to know the e 215 music content. The decoder needs to know the encoder delay and encoder padding. 216 So we need to pass this to DSP. This metadata 216 So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers 217 and are not present by default in the bitstrea 217 and are not present by default in the bitstream, hence the need for a new 218 interface to pass this information to the DSP. 218 interface to pass this information to the DSP. Also DSP and userspace needs to 219 switch from one track to another and start usi 219 switch from one track to another and start using data for second track. 220 220 221 The main additions are: 221 The main additions are: 222 222 223 set_metadata 223 set_metadata 224 This routine sets the encoder delay and enco 224 This routine sets the encoder delay and encoder padding. This can be used by 225 decoder to strip the silence. This needs to 225 decoder to strip the silence. This needs to be set before the data in the track 226 is written. 226 is written. 227 227 228 set_next_track 228 set_next_track 229 This routine tells DSP that metadata and wri 229 This routine tells DSP that metadata and write operation sent after this would 230 correspond to subsequent track 230 correspond to subsequent track 231 231 232 partial drain 232 partial drain 233 This is called when end of file is reached. 233 This is called when end of file is reached. The userspace can inform DSP that 234 EOF is reached and now DSP can start skippin 234 EOF is reached and now DSP can start skipping padding delay. Also next write 235 data would belong to next track 235 data would belong to next track 236 236 237 Sequence flow for gapless would be: 237 Sequence flow for gapless would be: 238 - Open 238 - Open 239 - Get caps / codec caps 239 - Get caps / codec caps 240 - Set params 240 - Set params 241 - Set metadata of the first track 241 - Set metadata of the first track 242 - Fill data of the first track 242 - Fill data of the first track 243 - Trigger start 243 - Trigger start 244 - User-space finished sending all, 244 - User-space finished sending all, 245 - Indicate next track data by sending set_next 245 - Indicate next track data by sending set_next_track 246 - Set metadata of the next track 246 - Set metadata of the next track 247 - then call partial_drain to flush most of buf 247 - then call partial_drain to flush most of buffer in DSP 248 - Fill data of the next track 248 - Fill data of the next track 249 - DSP switches to second track 249 - DSP switches to second track 250 250 251 (note: order for partial_drain and write for n 251 (note: order for partial_drain and write for next track can be reversed as well) 252 252 253 Gapless Playback SM 253 Gapless Playback SM 254 =================== 254 =================== 255 255 256 For Gapless, we move from running state to par 256 For Gapless, we move from running state to partial drain and back, along 257 with setting of meta_data and signalling for n 257 with setting of meta_data and signalling for next track :: 258 258 259 259 260 +----- 260 +----------+ 261 compr_drain_notify() | 261 compr_drain_notify() | | 262 +------------------------>| RUN 262 +------------------------>| RUNNING | 263 | | 263 | | | 264 | +----- 264 | +----------+ 265 | | 265 | | 266 | | 266 | | 267 | | 267 | | compr_next_track() 268 | | 268 | | 269 | V 269 | V 270 | +----- 270 | +----------+ 271 | compr_set_params() | !! 271 | | | 272 | +-----------|NEXT_ !! 272 | |NEXT_TRACK| 273 | | | !! 273 | | | 274 | | +--+-- !! 274 | +----------+ 275 | | | | !! 275 | | 276 | +--------------+ | << 277 | | 276 | | 278 | | 277 | | compr_partial_drain() 279 | | 278 | | 280 | V 279 | V 281 | +----- 280 | +----------+ 282 | | 281 | | | 283 +------------------------ | PART 282 +------------------------ | PARTIAL_ | 284 | DRA 283 | DRAIN | 285 +----- 284 +----------+ 286 285 287 Not supported 286 Not supported 288 ============= 287 ============= 289 - Support for VoIP/circuit-switched calls is n 288 - Support for VoIP/circuit-switched calls is not the target of this 290 API. Support for dynamic bit-rate changes wo 289 API. Support for dynamic bit-rate changes would require a tight 291 coupling between the DSP and the host stack, 290 coupling between the DSP and the host stack, limiting power savings. 292 291 293 - Packet-loss concealment is not supported. Th 292 - Packet-loss concealment is not supported. This would require an 294 additional interface to let the decoder synt 293 additional interface to let the decoder synthesize data when frames 295 are lost during transmission. This may be ad 294 are lost during transmission. This may be added in the future. 296 295 297 - Volume control/routing is not handled by thi 296 - Volume control/routing is not handled by this API. Devices exposing a 298 compressed data interface will be considered 297 compressed data interface will be considered as regular ALSA devices; 299 volume changes and routing information will 298 volume changes and routing information will be provided with regular 300 ALSA kcontrols. 299 ALSA kcontrols. 301 300 302 - Embedded audio effects. Such effects should 301 - Embedded audio effects. Such effects should be enabled in the same 303 manner, no matter if the input was PCM or co 302 manner, no matter if the input was PCM or compressed. 304 303 305 - multichannel IEC encoding. Unclear if this i 304 - multichannel IEC encoding. Unclear if this is required. 306 305 307 - Encoding/decoding acceleration is not suppor 306 - Encoding/decoding acceleration is not supported as mentioned 308 above. It is possible to route the output of 307 above. It is possible to route the output of a decoder to a capture 309 stream, or even implement transcoding capabi 308 stream, or even implement transcoding capabilities. This routing 310 would be enabled with ALSA kcontrols. 309 would be enabled with ALSA kcontrols. 311 310 312 - Audio policy/resource management. This API d 311 - Audio policy/resource management. This API does not provide any 313 hooks to query the utilization of the audio 312 hooks to query the utilization of the audio DSP, nor any preemption 314 mechanisms. 313 mechanisms. 315 314 316 - No notion of underrun/overrun. Since the byt 315 - No notion of underrun/overrun. Since the bytes written are compressed 317 in nature and data written/read doesn't tran 316 in nature and data written/read doesn't translate directly to 318 rendered output in time, this does not deal 317 rendered output in time, this does not deal with underrun/overrun and 319 maybe dealt in user-library 318 maybe dealt in user-library 320 319 321 320 322 Credits 321 Credits 323 ======= 322 ======= 324 - Mark Brown and Liam Girdwood for discussions 323 - Mark Brown and Liam Girdwood for discussions on the need for this API 325 - Harsha Priya for her work on intel_sst compr 324 - Harsha Priya for her work on intel_sst compressed API 326 - Rakesh Ughreja for valuable feedback 325 - Rakesh Ughreja for valuable feedback 327 - Sing Nallasellan, Sikkandar Madar and Prasan 326 - Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for 328 demonstrating and quantifying the benefits o 327 demonstrating and quantifying the benefits of audio offload on a 329 real platform. 328 real platform.
Linux® is a registered trademark of Linus Torvalds in the United States and other countries.
TOMOYO® is a registered trademark of NTT DATA CORPORATION.