| « Cheap Dedicated Hosting! | A Practical Report on Untainted Linux Gaming » |
Audio Multimedia Sound Stacks
The title of this post is intentionally devoid of meaning and full of ambiguity. It seems that misunderstanding abounds when we talk about the various components of the sound architecture on the free desktop. In this post, I am going to cherry-pick on some comments posted to Aaron Seigo’s post regarding PulseAudio which seem to indicate severe misunderstanding of just what everything in the stack does.
This post is going to be as unbias as possible, and at no point will I recommend that you should use a particular configuration or set of programs/libraries/servers. My only goal here is to correct factual inaccuracies, but like any piece of writing, my own opinions will of course play into my selection of facts. And just in case you want to interpret me as pushing a particular agenda, think again; please consider that for the past 6 months I have almost exclusively used a DMIX-based sound setup on my laptop, and a PulseAudio-based sound setup on my desktop. Different hardware, different use cases, different programs. If someone really presses me, I might write a separate article expounding why I think particular solutions are good for my own particular use cases, but that’s not the focus of this article.
I will preface this article by asserting the following claim that may not be universally agreed upon, but it is so broadly-held that I am confident exposing it as a groundwork postulate:
(1) Hardware mixing solutions are very rare these days.
(2) In the absence of hardware mixing, the only way to play sound from two processes at once is to use software mixing.
(3) The mixing of sound from two processes at once is a desirable goal.
(4) Furthermore, the loss of mixing capability should be avoided at all times during the user’s experience. That is, the only process that should acquire the non-mixed “lock” on the raw sound hardware, is the facility that performs the software mixing.
(5) Therefore we must deploy some mechanism for software mixing as a de facto standard feature on all Linux desktops, much as we have accepted the de facto standard of having a mouse cursor associated with a graphical interface.
If you do not accept this argument, you may have difficulty seeing the soundness of any of my assertions below.
PLEASE NOTE: Nothing said in this post is intended to reflect upon the opinions of Aaron Seigo or his claims. I am leaving Aaron’s argument “as is” and focusing on some of the really problematic assertions made in the vast compendium of comments that followed.
First, Jeff says:
Now, what with it being new distro season and all, the first thing I do when I get a new distro up and running is tell Pulse to screw off. I use KDE4, and Phonon Just Works 95% of the time (the other five? Not sure if that’s Phonon or the app messing things up, but they’re minor issues at best)
Could there be an implicit assumption? Paraphrased: If I use KDE4 and Phonon, I’m not using PulseAudio.
In reality: PulseAudio is a perfectly valid backend to one of the Phonon backends; in other words, until you uninstall PulseAudio, you could very well have all of your applications use Phonon _and_ (at a lower level) PulseAudio as well. The two serve different purposes, and Phonon is really designed to “hand off” media to one of the media frameworks (GStreamer or Xine) which, in turn, hands off the decoded sound to PulseAudio.
Of course, Jeff could also be aware that Phonon has this capability, and he is simply remarking that, by not using PulseAudio but instead using some other software PCM mixing alternative, possibly dmix he is getting better results. This would be a coherent assertion.
My question in this case is, assuming that you accept my assertion that you need software mixing, what software mixing solution are you using? Certainly not Phonon, because Phonon doesn’t provide any software mixing solution! At least, not across processes. If you have software mixing at all (and this post assumes you need to), it’s coming from somewhere else. We should then weigh the perceived negatives of PulseAudio against this alternative’s own negatives.
Second, kriko remarked:
So pulseaudio was uninstalled from my machine very quickly.
Phonon + xine = win for me.
Implicit assumption: Using Phonon + xine is mutually exclusive with running PulseAudio.
In reality: Wrong. See above.
Third, saschpe remarked:
PulseAudio has just no use-case. Honestly, for whatever reason does we need this one? Back in history, one or two years ago, I installed it when everybody did because it was cool and TheNewThing ™. Now I still have it installed , this time because the (K)Ubuntu packagers force me to do so, for whatever reason there are dependencies on libpulse for OpenJDK and kdebase. Maybe I should file a bug on launchpad.net on it …
This is a juicy one with many deep misconceptions. Let me start with the easy one:
Implicit assumption #1: Any program that links against libpulse is somehow using PulseAudio.
In reality: This is simply false. libpulse is only the client side of the PulseAudio architecture; without the pulseaudio server (the package ‘pulseaudio’ on Debian and *buntu), this functionality is present but never used. It is conceivable that some packages link against libpulse as their only method of audio output; however, so far this is not true for anything except the native PulseAudio utilities. Even OpenJDK provides other (non-PulseAudio) alternatives for audio I/O. It’s just linkage – if you aren’t running a PulseAudio daemon, it isn’t doing anything.
Filing a bug like this would be like saying, I run an Ubuntu Server instance, and I never want to start an X.Org X Server instance for a graphical desktop. So maybe I should file a bug against the Java stack because its AWT library links against libX11? This sort of reasoning is complete nonsense.
Now for the harder one to tackle:
Implicit assumption #2: PulseAudio has just no use-case.
In reality: This is simply false. You could have stated more accurately that There are other facilities that perform the same functions as the subset of PulseAudio that I do need; as for the rest of it, I don’t want it.
There are two big problems with the assertion that PA has no use-case:
1. Just because you don’t want it, shouldn’t mean that distributions should abandon that feature altogether. As a concrete example, the in-flight stream swapping feature of PulseAudio is completely unmatched by any other software. Just point me to some other software that will switch your sound-playing streams from one sound card to another without interrupting the stream at all (AND without causing any hiccups or restarting the playback of the client). There are a few other features that PulseAudio pioneers that other users do have a use-case for. Whether or not you particularly have a use-case for them shouldn’t dictate what distributions are allowed to ship. If you want a distribution that includes all, and only all the features that you specifically have a use-case for, then start your own distribution!
2. You might be asserting that any features PulseAudio provides that you want, are overlapped by other programs. For example, if the only feature of PulseAudio you really want is the software mixing (and nothing else), you can use ALSA’s dmix. That’s a fine thing to say, but like in the previous point, that doesn’t mean no one else needs those features. And what’s more, just because another program does it equally well (or perhaps better if you have found some bugs in PA) doesn’t mean that there is no use-case for the features PA provides! What you are essentially claiming is that there is no use case for any of PulseAudio’s features, such as software mixing – which I have to completely disagree with; see above.
Implicit assumption #3: The (K)Ubuntu packagers force me [to install and use PulseAudio].
In reality: This, too, is simply false. Just ask Aaron Seigo, or one of the many other Linux users who are not “locked in” to PulseAudio any more than you, and they will tell you that you can have a perfectly functional desktop with software mixing but devoid of any PulseAudio daemons. The idea that somehow you are having things forced upon you is erroneous; it is a simple fact of the matter that distributions must choose some default configuration that they think will be good for the masses of the users, but this is simply a default configuration; it is by no means binding. If you really think that they are forcing PulseAudio on you, then you must also think they are forcing GNOME (or KDE) on you, as well as X.Org, and Firefox (or Konqueror), and HAL, and D-Bus, and the Linux kernel, etc. But this is absurd.
You are free to alter the system not to use PulseAudio, and that is actually a fairly easy task that does not even require you to remove any of your favorite programs. So there isn’t even a practical problem here, much less in principle.
Fourth, Samuel Mukoti remarked:
I believe it [PulseAudio] has the same design goals (to some extent) of what he developer of aRts had.
Yes and no. I like how you qualified your statement with “to some extent". In that sense, this isn’t really an erroneous assertion at all, so I’m not going to bring up that you have any misconceptions. I’ll just say that this was a fairly brief comparison of aRts and PulseAudio that hides some of their differences.
In reality, aRts tried to do lots of things PulseAudio doesn’t try (and won’t ever try) to do. For instance, aRts had the ability to decode non-PCM media (mp3, ogg, etc.) into PCM data. It would then do software mixing on it, similar to Esound or PulseAudio. So, while there is overlap between aRts and PulseAudio, I just wanted to make it clear that, unless hell freezes over or the core devs of PulseAudio (Lennart) have a major rework of their opinions, I don’t think PulseAudio will ever have this feature which (crucially) distinguishes aRts and PulseAudio.
Also, PulseAudio tries to do lots of things that aRts never tried to do. Since aRts is mostly dead, I can’t rule out that aRts would never try to do these things if it had been more successful; however, judging from the direction aRts was taking, it seems plausible to say that:
*aRts would never care to do hot swapping of streams from one card to another.
*aRts would never care to support bluetooth headsets.
*aRts would never care to support automatic discovery of sinks via HAL and Avahi.
*aRts would never care to support a vast number of interoperability libraries that help programs using other APIs to use aRts underneath. I never saw any work started on an aRts-ALSA plugin like we see with Pulse-ALSA.
Fifth, maninalift remarked:
I have had the same experience with openSUSE. Your blog post gave me hope that I could rip-out PulseAudio but apparently not
In reality, there is a very simple way to get rid of PulseAudio, while retaining software mixing through dmix and dsnoop:
Get to a root shell, or use the sudo command. I don’t care how you do it, but you need to run all of these commands as root:
Code:
mv /usr/bin/pulseaudio /usr/bin/pulseaudio-disabled | |
mv -f /etc/asound.conf /etc/asound.conf_bak | |
echo 'pcm.!default { | |
type asym | |
playback.pcm { | |
type plug | |
slave.pcm "dmix:0" | |
} | |
| |
capture.pcm { | |
type plug | |
slave.pcm "dsnoop:0" | |
} | |
| |
hint { | |
show on | |
description "DMIX" | |
} | |
}' > /etc/asound.conf | |
killall pulseaudio | |
mv -f /usr/share/alsa/pulse.conf /usr/share/alsa/pulse.conf_bak |
You now have to deal with all of the associated problems of dmix; for example, if you plug in a USB headset, you have to edit this configuration file to get sound to play out of the headset. Oh, and if you want some sound to play out of the headset and other sound out of the internal soundcard, you’re really in a bind. There are some hacks you can do, but in the general case you’re hosed if that’s what you want. That’s one of the problems PulseAudio was designed to fix in the first place!
Sixth, behavedave remarked:
I thought PulseAudio was a sound server like AlSA but better. Now I find it’s somewhere around the Xine/Phonon level. What does it do then?
ALSA is not a sound server. There is a completely deprecated sound server still in the ALSA codebase last I checked, but it hasn’t been maintained for many, many years; every other part of ALSA (the stuff that actually works) is not a sound server.
The similarity between PulseAudio and ALSA can be summarized thus:
1. They both provide an API (Application Programming Interface) for developers to write programs that need to play sound.
2. They both provide an API that is a bit too complex for most sound-using programs, so in general you should use neither ALSA nor PulseAudio’s API, directly, unless you want to make bug-prone software. On the other hand, certain “special” requirements in the audio arena do require directly using these APIs. You can find more info on these special requirements by googling “realtime audio", “writing your own sound server", “not invented here", or “I AM an inventor and I want to do new exotic things with digital audio".
3. They both provide software mixing capability to an extent. The specific part of ALSA that provides software mixing is called dmix. dmix is neither required nor automatically used in most configurations; only a vanilla install of ALSA (unmodified by the distro) will activate dmix without any changes. PulseAudio, on the other hand, can not be used without the software mixing piece. In other words, I can write an ALSA application that breaks software mixing. I can not write a PulseAudio application that breaks software mixing.
Is it [PulseAudio] the reason why:
Audio just drops off on my PC? (reboot)
What do you mean by this? Does audio simply stop working altogether with no way to fix it other than reboot? If so, this could be PulseAudio. It could also be a failure inside of ALSA. Either possibility is likely. ALSA is not bug-free either!
I can no longer mirror front and rear audio in KMix?
Maybe, yes. But KMix is not aware of PulseAudio. There are ways to do this with PulseAudio that do not involve KMix.
The starting sound in KDE4 only plays half a second?
Another post addressed this issue.
behavedave continued as follows:
It does seem complex but it handles so much legacy compatibility it was always going to end up that way. It seems a mystery to me that all those differing systems worked harmoniously without pulse not despite of it.
The only curious thing is what or where is the native protocol for pulse surely it would provide its own control layer so that eventually that would become the standard bringing together all the other applications as they left their legacy interfaces.
In fact, all those disparate systems did not work harmoniously without PulseAudio! Admittedly, PulseAudio does not resolve the problem, as the problem itself is poses an almost insurmountable design challenge:
Give me an audio stack on which (assuming that I have enough hardware resources like RAM) I can install every audio-using Linux program, run them all at the same time, and not experience any dropouts or degradation of sound quality.
This is the challenge that software mixing solutions must ultimately face. This is the challenge that OSSv4, dmix, pulseaudio, aRts, and Esound all hope to eventually conquer. This is the challenge that we are so jealous of the proprietary operating systems about, because Mac has CoreAudio and Windows has DirectX; in 99.999% of the cases, these “other OSes” can meet the challenge above, given enough system resources. Windows never says “Device or resource busy.”
The division of the audio stack is the reason why we can’t claim the same. Incompatibility is our plague, and every additional piece to the puzzle increases the necessary complexity of any solution that wants to unify the free desktop audio architecture.
Now: As to your question of “where” the PulseAudio native protocol is? The answer is simple: libpulse is a client library that gives instructions to a PulseAudio server, such as “play this clip of PCM data". The PulseAudio native protocol is used whenever any application (except Esound applications) wants to play sound through PulseAudio. Keep in mind the number of indirections possible in the sound stack per the diagram; just because your end-user tools say that your program is using Phonon, that doesn’t mean Phonon isn’t using PulseAudio through gstreamer through libpulse in the backend. In fact, it probably is by default. PulseAudio is like the Xserver of audio, if you understand the graphics architecture any better.
Seventh, rf4nI5QRtMlfRg3rv00S9IGZVQNadQ– (what a name!) remarked:
Plus Fedora had to rewrite most of the PulseAudio-Core to stop it from randomly clipping(which is still happening btw.) which left me with a big WTF in my mouth. They need distributions to fix stuff that would be considered “basic” by 99% of all peole?
Just so you know:
Fedora is Red Hat’s officially sponsored test bed distribution. Fedora percolates ideas and new software to prepare it for inclusion in stable Red Hat releases. And yes, I’d bet money on the fact that Red Hat Enterprise Linux 6.0 will probably ship PulseAudio in some form or another.
Lennart Poettering does 90+% of the work on PulseAudio.
Lennart Poettering also works for Red Hat.
In order to prepare PulseAudio for inclusion in Red Hat, Lennart is also the maintainer for Fedora’s PulseAudio.
So it’s not a “distribution” fixing it; the guy who basically is the PulseAudio project is working on it, but he’s being paid by Red Hat to do it. What’s the difference? I don’t see any.
BTW, glitch-free (also known as time-based scheduling) is not designed to stop all glitches from ever occurring. Its true benefits are not to prevent glitches (it can’t do that on certain kernel/hardware configurations, no matter how hard it tries.) The benefit of glitch-free is to:
1. Reduce power consumption and CPU usage significantly.
2. Deliver sound with lower latency than before: what you need, when you need it. Nothing more, nothing less.
The primary growing pains of glitch-free (its much-maligned brokenness) are due to the following sort of development model:
1. Lennart happily hums a tune while developing large amounts of code based on his understanding of how ALSA-lib works. (His understanding is dependent on how well ALSA-lib is documented, among other things; ALSA-lib is almost completely undocumented.)
2. Lennart tries the new code on a few of his sound cards and runs into problems. Hmm, this is not good. Maybe he misunderstood some of the semantics of ALSA calls.
3. He finds a mixture of “that’s supposed to work that way, but it doesn’t… oops. File a bug and we’ll get back to it in a year.” and “Nah, you’re mistaken; it actually works this-a-way.”
4. He goes and fixes his code so that, with a combination of hacks and dependency on some upstream bugfixes, glitch-free seems to work on his reference hardware.
5. He releases PulseAudio versions with glitch-free enabled by default.
6. Users, who inherently have much more time and resources and variety of sound cards and kernel configurations, run into myriad problems because the code was not designed to work on these configurations, or there are bugs in these hardware drivers, and so forth.
And you can see how it gets ugly from there. That’s why glitch-free is currently, well, glitchy. No one’s to blame; it’s simply the state of flux of a software that’s still in development. If we have the requirement of having everything work 100%, I would agree with assessments that glitch-free is being shipped prematurely on production distributions. But it needed to get out there to get the wide testing that Lennart needs to fix it… hence, chicken-and-the-egg problem. Lennart simply can’t test all the hardware out there; there are too many different cards.
So that’s my summary of why glitch-free is currently a bit broken in some cases. But if you think about the situation with a clear mind, do you see anything that would permanently block glitch-free from being a plausible design in the long run?
1. Kernel maintainers for distros can configure the kernel in a way that makes glitch-free happy.
2. ALSA maintainers (and Lennart) can fix bugs in ALSA that makes glitch-free work better.
3. Lennart can tweak PulseAudio to accommodate some non-optimal situations in which the above two actions do not resolve the problem.
4. Distributors can tweak PulseAudio’s user-facing configuration to accommodate situations that are not addressed otherwise (Example: disabling time-based scheduling in some environments).
Summary
I’m tired of typing, so I hope some of this made a bit of sense. I’ve just cherry picked seven comments off of Aaron’s blog and made some responses to them. Some of it is more informative/factual in nature, while other parts are pure opinion. Take it as you will.
Comments are enabled but must be approved by me first, to prevent spam.
-Sean