(I originally included a link to the PC-ABX site, but this no longer exists.)
There are many different ways to conduct listening tests, but the most convincing are those which can separate real from imagined effects. There are of course reasons other than imagination why listening test results can be wrong. Unfortunately arguments about 'subjectivism' tend to degenerate into accusations that those on one side are liars or fools, while those on the other side are deaf. Maybe these accusations are sometimes true, without reliable testing methods how would we know?.
Although it has its limitations, ABX testing, carried out on a computer using the soundcard and headphones, can eliminate some sources of listening test errors, and can demonstrate the audibility of known, measurable effects such as inversion and low-pass filtering. ABX can be used to compare any two selected sound files, A and B, by listening to file X which is identical to either A or B, and trying to decide which one it is. The A, B and X samples can be listened to any number of times over any period of time before making a decision, and breaks of any duration can be taken, and the test can be repeated sufficient times to give a good probability that something more than guessing was involved. (It is obviously possible to lie about ABX results or make errors in the setup, so although ABX is useful for investigating audibility, it is of limited help in trying to convince other people unless they set up and witness the event themselves.) Also we can learn from ABX results, and with practice can improve the sensitivity of our hearing and learn to identify effects such as notch filtering.
A low-distortion soundcard and reasonably good headphones are needed. I have a test file downloaded from HydrogenAudio to demonstrate whether high frequency signals interfere with lower frequency audible signals. Some highly regarded soundcards are reported to fail this test.
Be careful trying this test, it includes a high level high frequency component which may be inaudible for most of us but can damage speakers, headphones or ears if played at too high a level.
As mentioned, a computer soundcard has its limitations, for example it is impossible to use this to compare a digital signal to an analogue version, or to test whether an amplifier with a 1MHz bandwidth sounds better than one limited to 50kHz, but even with a 16-bit 44kHz card we can still find evidence to help with these questions. If we start with a 16 bit digital signal, and carry out ABX comparison with the same signal reduced first to 15, then 14, then 13 bits, then we are moving further away from the original analogue version and increasing quantisation distortion. If an increase from 14 bits to 16 bits is found to make no audible difference it is unlikely that something even closer to the original analogue version would sound any better. (Analogue to digital conversion often uses a technique known as 'dithering' which can effectively eliminate quantisation distortion, but replaces it with random noise, which can be to some extent confined to frequencies where it is less obtrusive by 'noise shaping', so this simple comparison of different bit numbers could be misleading.)
Similarly if a 18kHz low-pass filter is undetectable compared to 22kHz it is unlikely that extending the bandwidth far beyond 22kHz has any benefit. Trying the filter test which was at one time available on the PC-ABX site I had difficulty detecting even a 12kHz filter applied to a transient signal, even though I can hear sinewaves above this frequency. Even those with younger ears are likely to find an 18kHz filter difficult to detect using this test. Trying these sort of tests is an excellent antidote to some of the more extreme 'audiophile' claims, e.g. that amplifiers need a bandwidth of several hundred kHz to adequately reproduce transients.
An interesting variation on the ABX test was demonstrated on a recent BBC-1 tv program (The Making of Me. 29-Aug-2008). As with the standard ABX test there were three sounds, two were identical and one different. These were called A, B and C, and they were played just once in sequence, the question being which one was different. The first test was with a large easily heard difference, then repeated many times in random order with a gradually reducing difference, so that for any listener the results started all correct, then at some level started to have errors, and eventually became random guesses. Results using violin sounds were shown for typical subjects, which were all fairly similar, but also for a professional violinist (Vanessa Mae), and her results were far better. This demonstrated that there can be big differences in hearing ability between different subjects, for whatever reason, but more to the point it showed that such differences can be revealed effectively by ABX type testing. This ABC test should be more difficult than the standard ABX because there was no opportunity to repeat the sounds many times before making a decision. Starting with a big difference and gradually making the sounds more similar can sometimes be done with standard ABX, which is helpful because then we have a better idea what effect we are listening for.
There are many people who regularly use ABX and other similar methods and find them invaluable. The developers of MP3 and other 'lossy' encoders are a good example, there being no simple alternative approach equivalent to a distortion test which could give a figure relating to the performance, and so listening test results are useful so that problems can be identified, progress made, and new versions of the encoders can evolve and be tested. A good place to find out about that sort of thing is the Hydrogenaudio website, where there is also an interesting discussion forum. This has an unusual feature, which is that the 'terms of service' include the requirement (not always enforced) that any claims about audible sound quality must be supported by ABX or something equivalent.
I have mentioned a few times that the nonlinear distortion extracted from my MJR-6 amplifier when driving a typical speaker with a music signal was inaudible even when listened to alone without the masking effect of the undistorted component of the output, and it would need to be increased many times before becoming audible. At this level conventional listening tests are unlikely to be helpful for comparing amplifier distortion, even ignoring the fact that the speakers will almost certainly have harmonic distortion a hundred or even a thousand times higher, plus far worse types such as phase modulation. If we were to extract and listen to typical speaker distortion that would certainly be audible.