Jump to content

Talk:Characteristic function (probability theory)

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 87.119.247.109 (talk) at 20:00, 16 May 2010 (→‎Notation). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

WikiProject iconMathematics B‑class Mid‑priority
WikiProject iconThis article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
BThis article has been rated as B-class on Wikipedia's content assessment scale.
MidThis article has been rated as Mid-priority on the project's priority scale.
WikiProject iconStatistics B‑class High‑importance
WikiProject iconThis article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
BThis article has been rated as B-class on Wikipedia's content assessment scale.
HighThis article has been rated as High-importance on the importance scale.

Discussion…

I think the formula for calculating the n-th moment based on the characteristic function was not correct. It was:

.

I changed it to

.

Please compare with [1].

In fact, neither of those are correct. I will correct both occurrences of the formula in the article to . —Preceding unsigned comment added by 188.36.27.144 (talk) 14:00, 11 October 2009 (UTC)[reply]
Why?! We have (assuming analyticity),
Right? Boris Tsirelson (talk) 16:45, 11 October 2009 (UTC)[reply]

"Different change of parameter"

What does this phrase mean?: “which is essentially a different change of parameter”. —Preceding unsigned comment added by 130.207.104.54 (talk) 15:56, 20 August 2009 (UTC)[reply]

It means that one function is just a rescaled version of the other:
Second definition is more standard in the Fourier analysis, while former is generally used in statistical sciences. ... stpasha » talk » 10:50, 21 August 2009 (UTC)[reply]

Inversion theorem

I have a bit of a different definition for the inversion theorem that seems to contradict the one in this article. From "probability and random processes" by Grimmet I have:

then

--Faisel Gulamhussein 22:17, 20 October 2006 (UTC)[reply]

Probably you're correct. I don't think there is a contradiction exactly. Just many authors assume their random variables have a density, if this is the case the statements would be the same. Thenub314 13:07, 21 October 2006 (UTC)[reply]

Would someone like to write a brief explanation of the meaning of a characteristic function, for those of us without such strong statistics and mathematics backgrounds? I have no idea of its concept or application. An example that appeals to intuition would be great, thanks! BenWilliamson 10:28, 21 February 2007 (UTC)[reply]

Bochner-Khinchin theorem

There is something strange as it is formulated.

The would-be characteristic function

satisfies all the requested properties but it implies that the corresponding distribution has vanishing second moment

--MagnusPI (talk) 09:25, 24 April 2008 (UTC)[reply]

I believe the problem lies in condition: " is a positive definite function". I suggest that this doesn't hold, and there are results that indirectly prove it isn't ... but I don't know of a direct approach to showing this. Note that "positive definite function" is a non-straightforward math thing if you are not familiar with it. Melcombe (talk) 09:34, 25 April 2008 (UTC)[reply]

On any compact is positive definite, the only points where it is not are but in this case I do not understand why the characteristic function for some Levy stable can be accepted since they show the same behaviour at

--MagnusPI (talk) 08:44, 28 April 2008 (UTC)[reply]

I suggest you follow the link to positive definite function to see what is required .. it is not just Melcombe (talk) 08:56, 28 April 2008 (UTC)[reply]

Interesting but this is not what a physicist would call a positive definite function. I will therefore add a note to the main page

--MagnusPI (talk) 08:55, 29 April 2008 (UTC)[reply]

History of Characteristic Functions

Historia Matematica has a nice little thread on the history of characteristic functions which could be used to give some background on where c.f.'s came from. --Michael Stone 19:22, 10 April 2007 (UTC)[reply]

Please check your work.

.

Inverse fourier transform

The article mentions that you can calculate the characteristic function taking the conjugate of the Fourier transform of the pdf, but isn't this simply the inverse Fourier transform?... This should be made more clear. We tend to think about transforming using only the "direct" transform first, and then transforming back again, but you can perfectly say that you can find the characteristic function taking the inverse Fourier transform, then come back again taking the direct Fourier transform...

It's like the pdf is already the "transform", and then the characteristic function is the "original" function, that we could see as a signal to be transformed...

The idea is just that we save some words using the idea of "inverse fourier transform" instead of "conjugate of the transform", and also help to see the inverse transform as something as natural as the "direct" transform. No transform domain is intrinsically better! there is no reason to insist in seeing the pdf as the correlate of a signal to be transformed by the direct fourier transform, and not the inverse one... —Preceding unsigned comment added by Nwerneck (talkcontribs) 22:53, 17 November 2007 (UTC)[reply]

Pathological examples

The article gives the impression that every random variable can be described by a characteristic function; I am almost sure (no, that's too strong to say in a statistics page - let me rephrase it: I have more than 95% confidence) that a pathological example can be designed that does not have a characteristic function... Something along the lines of the perforated interval in page standard probability space. Darth Albmont (talk) 19:06, 18 November 2008 (UTC)[reply]

I suggest you see page 11 of the Lukacs reference. He cites a theorem of Cramer (1946) which he says ensures the existence of the cf for "every distribution function". The previous context makes it clear that this includes even dfs with singular components. See also page 20, 36 and 64 for other results on the relation between the cf and whether the df has a singular component. Melcombe (talk) 10:01, 19 November 2008 (UTC)[reply]
Does every random variable defined in the real numbers have a distribution function? Darth Albmont (talk) 12:25, 19 November 2008 (UTC)[reply]
You linked to density function but wrote distribution function; they are not equivalent. Every real valued RV has a distribution function; not all have density functions (e.g., Cantor distribution). Your original question was about characteristic functions; they always exist for any distribution with a Lebesgue measure due to the dominated convergence theorem. I cannot say for sure about distributions with more pathological measures, but I suspect it's true there too. I cleaned up the wikilinks in your original post. Baccyak4H (Yak!) 15:02, 19 November 2008 (UTC)[reply]
You linked to density function but wrote distribution function; they are not equivalent - Baccyak4H, yes, but the page distribution function is about physics, and it gives both cumulative and density as the probability concept. I don't know if there are pathological real valued random variables so pathological that it's not possible to compute its characteristic function - but let's try a counting argument. Let X be any set that is dense in [0,1]; is it possible to define a real-valued random variable whose sample space is X? If we accept the Axiom of Choice (the psychomath's favourite tool in creating monsters) and the Continuum hypothesis, then the answer is yes: either X is countable or has the cardinality of the continuum; in the first case, it's trivial, in the second, well-order both X and [0,1], sample uniformly from [0,1] and get back to X using the bijection (probably there's no need to use the continuum hypothesis to create this monster). So, by counting, there are 2c real-valued random variables. However, since every characteristic function is continuous, there are only c characteristic functions. Unless I made some big mistake, I think I proved that not all real-valued random variables have a characteristic function (in fact, I guess I showed that most of them don't - the problem with pathologies is that usually pathologies are the rule). Darth Albmont (talk) 16:33, 19 November 2008 (UTC)[reply]
The characteristic function is just the Fourier transform of the distribution. (Thus it is really about distribution rather than random variable itself.) It is well-defined always. At least if we treat a distribution as a probability measure on the Borel sigma-field (be it completed or not). No place for pathologies here. Boris Tsirelson (talk) 20:17, 23 March 2009 (UTC)[reply]
Characteristic function is an integral . Since expectation operator exists for any random variable, and the corresponding integral is absolutely integrable for real values of t: , then characteristic function is well-defined for any random variable, even a "pathological one". Stpasha (talk) 21:20, 16 June 2009 (UTC)[reply]

Matrix-valued random variables

In the above-named section, what the dimensions of the matrix T in relation to those of X? Are the dimensions the obvious ones or are other sizes used occasionally? Melcombe (talk) 15:45, 9 December 2008 (UTC)[reply]

Hi Melcome. I've added some material to the page; hope it is clear. Let us know if not. Best wishes, Robinh (talk) 08:05, 10 December 2008 (UTC)[reply]
Yes it is clear. But can I ask if this formulation is used in practice anywhere? It seems that if T were replaced by T-transpose then, not only would the formula be directly comparable to the vector case, but both the following would apply
  • each Xij would be associated with Tij rather than Tji,
  • the formulas would be equivalent to the vector case if both X and T were strung out into vectors using the usual vec() operator.
Melcombe (talk) 09:41, 10 December 2008 (UTC)[reply]
Hi. Well you could define a function like the characteristic function that works as you specify. But this would not take advantage of the structure present in matrices. The characteristic function as specified in the article is the standard, which crops up all over the place, at least in distributions such as the Wishart distribution. Best wishes, Robinh (talk) 13:16, 10 December 2008 (UTC)[reply]

Moment generating function

I would like to have any mention of the moment-generating function (mgf) removed from the "definition" section. The concept of characteristic function can and should be defined without any reference to mgf; in fact such reference is only confusing, because unlike cf, mgf may not always exist, or may be unbounded, etc.

The reference itself is quite dubious as well. The expression denotes an mgf of a complex-valued random variable. I'm not entirely sure how mgf for a complex-valued r.v. is supposed to be defined, but most likely the argument t will be complex as well. The expression is not correct. Function is defined over a domain of real numbers, it cannot be evaluated at an imaginary point (at least not from the point of view of strict mathematics). Of course if function is given by some simple formula then we can always plug in instead of and hopefully get a meaningful result. But what if there is no simple formula? What if is defined graphically for example. It is difficult to interpret without going back to definition of , but then such interpretation turns into a tautology.

Anyways, i believe that mgf is nothing more than a "related concept", and thus should be mentioned in the corresponding section at the end of the article.

Stpasha (talk) 11:24, 17 June 2009 (UTC)[reply]

Some of the old notation involving was wrong and I have taken it out. Many people do define mgf's as allowing complex arguments. The similarity in the formulae for cf's and mgf's means that mgf's have to be mentioned early on to prevent readers wondering why they are not mentioned. On there is no reason why they should not be mentioned more than once if there is an increase in technicality towards the end of the article. It is wrong to think that either cf's or mgf's are only defined for real-valued arguments and I have tried to introduce something about this. Melcombe (talk) 13:27, 17 June 2009 (UTC)[reply]
I do agree that moment-generating functions are treated in many current textbooks, although the reasons for that are either historical (mgf was invented earlier than cf), or the authors are unwilling to go into the area of complex analysis.
However, mgf does not bring anything new to the theory compared to cf. Surely mgf can be used to calculate moments, or to formulate a convergence theorem. But cf already does that -- we can calculate moments by differentiating cf and we have Lévy continuity theorem. On the other hand mgf has quite a few drawbacks compared to cf.
And it is wrong to think that cf's or mgf's are defined for unreal-valued arguments. In fact, if we try to plug complex-valued t into cf, we will arrive at something called analytic characteristic function (see Lukacs, ch.7), which loses some of the nice properties of cf (for example existence), but gains some extra useful properties (ridge property, existence of ∞ number of moments, sometimes entire-ness). It is when we consider analytic cf, the connection with mgf becomes well-defined. However analytic cfs are different from regular cfs, they deserve their own section, and should not be implied in the definition of regular cf. Stpasha (talk) 21:14, 17 June 2009 (UTC)[reply]
Remember that this is not a text book. It needs to be useful to many types of people and to include what is notable about the topic and related topics. the connection to mgf's is definitely that. It is wrong to say that "to plug complex-valued t into cf" will arrive at analytic cfs, ... it arrives at cfs that may or may not be analytic over regions to be defined, as per Lukacs. Just because you like thing think of real 't' only is no reason for the article to give the impression that only that case is important. And considering that the major work on characteristic functions defines them for complex t it needs to be included in the main definition. Also, before you add even more unexplained maths in a prvate notation into the initial section, read WP:LEDE. Melcombe (talk) 08:50, 18 June 2009 (UTC)[reply]

Standard normal

Does anybody know if standard normal N(0,1) is the only random variable whose characteristic function coincides with its pdf? Then it would be a fixed point of the Fourier transformation; wonder what this would imply. Stpasha (talk) 18:35, 30 June 2009 (UTC)[reply]

Well, you have to be a little careful N(0,1) is not quite the same as its characteristic function, depending on how you define the characteristic function. The characteristic function defined here is missing the normalizing factor involving 2π. It turns out there are many distributions whose characteristic functions are themselves except for a normalizing factor in this way. Thenub314 (talk) 06:53, 1 July 2009 (UTC)[reply]
Yep, my bad. This should use alternative definition of characteristic function φ=E[e-2πitX] (because only then Fourier transform is unitary), and then fixed point distribution will be N(0, (2π)-1) (seeing as we need fX(0)=1). But still what other "many distribution whose cf's are themselves except for a normalizing factor"? // Stpasha (talk) 07:56, 1 July 2009 (UTC)[reply]
The unitary operator of Fourier transform has four eigenvalues: 1, i, –1. –i (and no continuous spectrum). The corresponding eigenfunctions are Hermite polynomials times the normal density. Their eigenvalues are: 1, i, –1. –i, 1, i, –1. –i, 1, i, –1. –i, ... Thus, you may add to the normal density the fourth Hermite polynomial (times the normal density) with a small coefficient of such sign that for large x the addition is positive. This way you get an example you seek. Boris Tsirelson (talk) 12:47, 1 July 2009 (UTC)[reply]
Another way is to consider any Schwartz function ƒ and take , then when you apply a Fourier transform to such a thing you get out what you started. Thenub314 (talk) 13:07, 1 July 2009 (UTC)[reply]
Yes, but we also want it to be positive (and integrable). Boris Tsirelson (talk) 18:40, 1 July 2009 (UTC)[reply]
I hope you'll agree integrability is handled by the fact that I said Schwartz. I should have said something about positivity, but to ensure that one simply needs to take a function whose Fourier transform is positive. For completeness of this discussion here are a few examples aside the gaussian: e-|x|; sech(x); ƒ*ƒ for any even positive Schwartz function ƒ, and many others. For the concerned reader I should mention why these examples and you sum the iterates of Fourier transforms is not necessarily the Gaussian, one easy way to see this it to look at the asymptotic as x → ∞, the first example will decay like 1/x2, the second will decay like e-x, for the third it of course depends on what you pick for ƒ. The example suggested by Boris is much better in the sense it is much more systematic, and much more complete. I just thought this was an interesting way to look at it, and it perhaps generates some closed formulas that are not covered by Boris's argument. Thenub314 (talk) 09:27, 2 July 2009 (UTC)[reply]
Alright, so Hermite polynomials didn't really work; however density of the form (where ) indeed coincides with its own cf. Although it certainly cannot be expressed as a Hermite polynomial since they have integer coefficients.
I thought about Boris's example some more, and am now fairly sure that any function of the form
(where H is a physicist's Hermite polynomial) is an eigenfunction of cf. transform. Although not an appropriate pdf, we can always add a normal distribution to it to ensure positiveness. // Stpasha (talk) 19:28, 1 July 2009 (UTC)[reply]
I am a bit confused by this last comment, we seem to have moved from showing there are a few any other then the Gaussian distribution to describing the space of all probability distributions with this property. This problem is a bit harder. Here are some random thoughts about that. First we are looking for probability measures whose Fourier transform is itself, but the Fourier transform of a probability measures is a bounded function, and the measure has integral one, so in fact our probability measure must have a bounded pdf. This pdf, being bounded and in L1 must also be in L2. Following Boris's suggestion, we could consider a basis in L2 for the eigenspace associated to eigenvalue 1. If you want to describe all elements of this eigenspace that are positive and integrable then it depends on what you mean by describe (aka why is it not enough to say take everything in the eigenspace that is positive and integrable.) I suppose it could be useful to notice that we are looking for positive definite integrable functions in this space. By a theorem of Bochner if the function is positive definite the characteristic function will be positive, since these functions are themselves under the Fourier transform, this turns into a condition for the function itself to be positive. What would really be nice is to have a condition on the coefficients of the expansion in this basis that guaranteed that resulting function would be positive, and integrable. Unfortunately nothing springs to mind that would tell me when this was the case. Thenub314 (talk) 09:27, 2 July 2009 (UTC)[reply]
I suppose we have that actually we might be in better shape than I first thought. We have that is a projection in L2 onto the functions that are their own Fourier transform. From the reasons above we are interesting in finding positive functions that are integrable with integral 1 (from being a probability measure), and are bounded (from being the Fourier transform of a probability measure). All such functions are in L2, so take your favorite one and apply the projection. We still can't categorize them exactly, but we know that doing this we hit all of them. Here are some properties of all such functions ƒ that might be interesting.
  • ƒ must be positive, and have integral one since it defines a probability measure.
  • ƒ must be bounded by 1, because it is the Fourier transform of a probability measure (itself).
  • ƒ(0)=1 because its value at zero is equal to the integral of its Fourier transform, which is in turn the integral if itself.
  • ||ƒ||p ≤ 1 for all 1 ≤ p ≤ ∞ (interpolation).
  • ƒ is uniformly continuous and tends to 0 as |x| → ∞, because it is the Fourier transform of an integrable function.
  • Applying our projection to the triangular function gives an example to show that ƒ need not be differentiable, or decay faster then 1/x2
  • ƒ must be positive definite because by Bochner's theorem a positive function is the Fourier transform of a measure if and only if it is positive definite and continuous.
  • ƒ is real-valued, so it must be and even because its Fourier transform is real valued.
  • ƒ must be supported on the entire real line.
The list could probably be extended. For example, I it probably follows from the uncertainty principle that the function also cannot decay too rapidly (e-x4 is out). On the other hand if the function is smooth it cannot decay too slowly either. I guess this above paragraph goes under the "what would this imply" part of your original question. Thenub314 (talk) 11:28, 2 July 2009 (UTC)[reply]

This can't be right, surely?

The article says that, if phi is a characteristic function, then so is Im(phi). But one property all characteristic functions phi have is phi(0) = 1, so Im(phi)(o) = Im(phi (0)) = Im(1) = 0, which is not 1. So, surely, if phi is a characteristic function, then Im(phi) never is? I tried correcting this by removing Im(phi) from the list, but I never saved my edit. I previewed it, only to find out I'd messed up the code on all the other symbols for Re(phi) etc. Could someone who knows what they're doing with the code please remove Im(phi)? (Unless it actually does belong there, but for the life of me I cannot see how!) 90.206.183.244 (talk) 15:38, 6 July 2009 (UTC)[reply]

Thank you. You are right. I did. Boris Tsirelson (talk) 18:36, 6 July 2009 (UTC)[reply]

Properties - symmetry

Under "properties" the article states "the characteristic function of a symmetric random variable is real-valued and even". A similar remark is made in the caption for graph of the characteristic function of a uniform distribution. But is this really generally true, or true only if the random variable is symetric around zero. For example, a normal or Cauchy distribution with a non-zero mean is symetric, but it appears that the characteristic functions would retain complex values. But I admit that I am weak on complex math, so maybe those i's that seem to be there drop out. Rlendog (talk) 13:54, 21 August 2009 (UTC)[reply]

You are right. Thanks. I just added "around the origin". Boris Tsirelson (talk) 15:07, 21 August 2009 (UTC)[reply]

The same space, or the dual space?

"The argument of the characteristic function will always belong to the same space where random variable X takes values" (Sect. "Definition") — really? In principle, it belongs to the dual space. In practice, the given space usually is endowed with an inner product and so, may be treated as dual to itself. But not always; the distinction may be essential for random processes (because of infinite dimension). In fact, the distinction manifests itself already in finite dimension, if the given space is just an abstract linear space (with no preferred basis). Boris Tsirelson (talk) 17:42, 24 September 2009 (UTC)[reply]

You may be right, I haven’t seen a book which defines the cf for generic random elements. This sentence is just my attempt to describe those several definitions that follow. In most cases of course the dual space is isomorphic to the the “primal” space, so that for example if XR then the argument of cf is also tR. And if XRk then also tRk. In any case it may be just the matter of point of view: whether you want to see the product t′X as a linear map acting on variable X, or as an inner product (t, X) between two elements of the same space over the field of reals.
The case of cf of the stochastic process is trickier though. Instead of having XL² and tL² we have X arbitrary, and t belonging to some strange class of functions integrable in product with X. Maybe this is what the dual space is.  … stpasha »  18:32, 24 September 2009 (UTC)[reply]

Notation

How about we denote the characteristic function with letter χ instead of φ? The reason being that φ is a reserved symbol for the pdf of the standard normal distribution.  … stpasha »  00:04, 5 October 2009 (UTC)[reply]

And χ is reserved for the indicator function; and by the way, non-probabilists often call it characteristic function of a set! It is impossible to reserve every symbol once and for all. But if you see χ for the characteristic function in some textbooks, then we can follow these. Boris Tsirelson (talk) 09:42, 5 October 2009 (UTC)[reply]
Luckily we have an unambiguous notation for the characteristic function of a set: 1(A) or 1A. Now I don't know if any textbook actually uses χ for c.f., and besides it clashes with the chi-square distribution... As an alternative, we can use ϕ and φ to denote the c.f. and the standard normal pdf. Only need to decide which one is which (I personally vote for ϕ being the standard normal pdf, since it is more similar to Φ).  … stpasha »  22:34, 5 October 2009 (UTC)[reply]

How about providing a brief explanation of terms, such as i (square of -1) and t in e^it? That would be great to improve understanding of the cf for those who have not taken advanced courses in math. Another suggestion is to provide examples of derivation cf for, say, bernulli, binomial and normal distributions. Such examples are useful for quick learning of what cfs are. NoName (talk) 17 May 2010 (UTC)

Definition for k×p-dimensional random matrix

What's T suppose to denote? How different is this from t? —Preceding unsigned comment added by 161.53.64.70 (talk) 08:53, 7 October 2009 (UTC)[reply]

It was supposed to be small t — the k×p matrix argument of the c.f.  … stpasha »  19:58, 7 October 2009 (UTC)[reply]