Just 1 bit of info ...

Sunday, October 30, 2016

Probabilistically zero events

In this short post, we show that events which are theoretically impossible to occur can be present. That is, practically there can be events which theoretically should not have occurred! Sounds fun?

Mathematically, if the probability that an event occurs is zero, does not mean that, that event cannot occur. Often, I cite the example of events that have continuous probability distribution. In such distributions, the probability that any point from sample space is selected is, in fact, zero. For example, consider [from wiki] a species of bacteria that have lifetime of about 4 to 6 hours. Thus, the probability distribution of the lifetime of these bacteria is continuous as it can take any value, say 2 hours, 7 hours, 3.0005 hours etc. Now, if we ask the following question -

"What is the probability that a bacterium lives exactly 5 hours?".

This probability is, hold-your-breath, ZERO(0). Why? Because a lot of bacteria live for approximately 5 hours, but there is no chance that any given bacterium dies at exactly 5.0000000000... hours.

Instead, if we ask -

"What is the probability that a bacterium dies between 5 and 5.01 hours?"

This probability could be zero and can be found by integrating the probability density function of the lifetime from 5 to 5.01.

So, what did we see just now? We saw that even when an event that has probability zero (bacterium dying exactly at 5th hour), it could occur in practise (there would be bacteria that die at 5th hour)

And finally, the converse - Impossibility implies zero probability, is true!

Thursday, October 20, 2016

Generating probability distributions from a given probability distributions

Here, we contemplate on generating a new uniform distribution given another uniform distribution. Specifically, for example, when we have a function rand7(), which generates random numbers from 1 to 7 (or 0 to 6), we want to compute rand5(), that generated random numbers from 1 to 5. Later, we look at the reverse problem - given rand5(), can we generated rand7().

First problem: Given rand7(), can we generate rand5() i.e. random numbers between 1 and 5?. This is easy - use rand7() to generate a random number, say x, and then output the result as y = x if x <=5. What if x = 6 or 7? Then, we simply repeat the experiment, i.e. we keep calling rand7() until we get a random number between 1 to 5. So that was simple? But, we still have to prove mathematically that this definition of rand5(), generates uniform random numbers i.e. the function outputs the numbers between 1 and 5 with probability 1/5 each. Let's try!

Let, A be the probability that rand5() outputs number 1, then

Pr[rand5() outputs 1]
= Pr[y = 1 ]
= Pr[ x= 1 ] + Pr[ x = 6 or 7] . Pr [ rand5() outputs 1]

= 1/7 + 2/7 * A

Thus, A = 1/5. Similarly, Probability for other numbers 2 till 5 is same as 1/5 ! That was easy too! Right? Let's see a tough problem.

Now consider the reverse problem - Given rand5(), we want to implement rand7(). How do we go about it? Intuitively, we see that we need to use rand5() multiple times since we want to generate numbers 6 and 7 as well, which cannot be generated by a single call to rand5(). The idea is, since rand5() generates numbers till 5, we want a formula such that it produces the remaining numbers i.e. 6 and 7. For example, if we wanted to implement rand19() using rand5(), then this formula should generate all the numbers after 5 as well. We won't mind if this formula generates number beyond 19, since we now know, how to generate only 19 numbers randomly from a big random generator (see first problem!). Coming back to rand7(), consider the formula:

F = 5 * [rand5() - 1] + rand5()

This formula uses rand5() two times. We see that, it generates all the numbers from 1 to 25 only once. You can verify this. (Actually, this formula resembles, formula for Base-5 representation, and, in fact, it draws inspirations from it). Hence this is rand25(), each number has probability of 1/25. Now, using rand25(), we can easily generate rand7(), by using first problem's solution i.e. use rand25() and output the number if it is less than equal to 7 or else call rand25() again. But, we will be little more smarter for this problem. This approach has a subtle issue that we will talk about later. But for now consider new solution for generating rand7() from rand25():

Use rand25() and let y be the output number of rand25(). If y <= 21, then rand7() outputs number x = y mod 7. A subtlety here, set x = 1 when (y mod 7) = 0. Or else, we repeat the call to rand25(). The idea is, 21 is multiple of 7 and hence when we do mod 7 to the number between 1 and 21, then we see that each number between 1 and 7 has occurred equally i.e. 3 times

Why is second idea more smarter than the first one? Because, if we implement rand7() using rand25() in computer program, then second solution is faster! The reason is, for first solution, the probability of getting a number between 1 and 7 using rand25() in first call, is only 7/25 whereas for the second solution, it is 21/25. Hence, on an expectation, second solution would produce output within 2 iterations whereas first solution would take more than 3 iterations!

Generating probability distributions from a given probability distributions

Pr[rand5() outputs 1]
= Pr[y = 1 ]
= Pr[ x= 1 ] + Pr[ x = 6 or 7] . Pr [ rand5() outputs 1]

= 1/7 + 2/7 * A

F = 5 * [rand5() - 1] + rand5()

This formula uses rand5() two times. We see that, it generates all the numbers from 1 to 25 only once. You can verify this. (Actually, this formula resembles, formula for Base-5 representation, and, in fact, it draws inspirations from it). Hence this is rand25(), each number has probability of 1/25. Now, using rand25(), we can easily generate rand7(), by using first problem's solution i.e. use rand25() and output the number if it is less than equal to 7 or else call rand25() again. But, we will be little more smarter for this problem. This approach has a subtle issue that we will talk about later. But for now consider, new solution for generating rand7() from rand25():

Use rand25(), let y be the output number of rand25(). If y <= 21, then rand7() outputs number x = y mod 7. Or else, we repeat the call to rand25(). The idea is, 21 is multiple of 7 and hence when we do mod 7 to the number between 1 and 21, then we see that each number between 1 and 7 has occurred equally i.e. 3 times

Why is second idea more smarter than the first one? Because, if we implement rand7() using rand25() in computer program, then second solution is faster! The reason is, for first solution, the probability of getting a number between 1 and 7 using rand25() is only 7/25 whereas for the second solution, it is 21/25. Hence, on an expectation, second solution would produce output within 2 iterations whereas first solution would take more than 3 iterations!

Generating probability distributions from a given probability distributions

Here, we contemplate on generating a new uniform distribution given another uniform distribution. Specifically, for example, when we have a function rand7(), which generates random numbers from 1 to 7 (or 0 to 6), we want to compute rand5(), that generated random numbers from 1 to 5. Later, we look at the reverse problem - given rand5(), can we generated rand7().

First problem: Given rand7(), can we generated rand5() i.e. random numbers between 1 and 5?. This is easy - use rand7() to generate a random number, say x, and then output the result as y = x if x <=5. What if x = 6 or 7? Then, we simply repeat the experiment, i.e. we keep calling rand7() until we get a random number between 1 to 5. So that was simple? But, we still have to prove mathematically that this definition of rand5(), generates uniform random numbers i.e. the function outputs the numbers between 1 and 5 with probability 1/5 each. Let's try!

Let, A be the probability that rand5() outputs 1, then

Pr[rand5() outputs 1] = Pr[y = 1 ]
= Pr[ x= 1 ] + Pr[ x = 6 or 7] . Pr [ rand5() outputs 1]

= 1/7 + 2/7 * A

Thus, A = 1/5. Similarly, Probability for other numbers 2 to 5 is same as 1/5 ! That was easy too! Let's see a tough problem.

Now consider the reverse problem - Given rand5(), we want to implement rand7(). How do we go about it? Intuitively, we see that we need to use rand5() multiple times since we want to generate numbers 6 and 7, which cannot be generated by a single call of rand5(). The idea is, since rand5() generates numbers till 5, we want a formula such that it produces the next consequent numbers i.e. 6 and 7. For example, if we wanted to implement rand19() using rand5(), then this formula should generate all the numbers 6 onwards. We won't mind if this formula generates number beyond 19, since we now, know how to generate only 19 numbers randomly from a big random generator (see first problem!). Coming back to rand7(), consider the formula:

F = 5 * [rand5() - 1] + rand5()

This formula uses rand5() two times. We see that, it generates all the numbers from 1 to 25 only once. You can verify this. Hence this is rand25(), each number has probability of 1/25. Now, using rand25(), we can easily generate rand7(), by using first problem's solution i.e. use rand25() and output the number if it is less than equal to 7 or else call rand25() again. But, we will be little more smarter for this problem. This approach has a subtle issue that we will talk about later. But for now consider, new solution for generating rand7() from rand25():

Use rand25(), let y be the output number of rand25(). If y <= 21, then rand7() outputs number x = y mod 7. Or else, we repeat the call to rand25(). The idea is, 21 is multiple of 7 and hence when we do mod 7 to the number between 1 and 21, then we see that each number between 1 and 7 has occurred equally i.e. 3 times

Why is second idea more smarter than the first one? Because, if we implement rand7() using rand25() in computer program, then second solution is faster! The reason is, for first solution, the probability of getting a number between 1 and 7 using rand25() is only 7/25 whereas for the second solution, it is 21/25. Hence, on an expectation, second solution would produce output within 2 iterations whereas first solution would take more than 3 iterations!

Tuesday, September 20, 2016

more on if...and only if

This is the follow-up post on my previous post.

In this post, I will try to explain a simple and practical idea of the statement: p -> q.

p -> q means, in simple words, that p implies q or if p then q. Check more details about this here. So what, exactly, is necessary here? And what is sufficient in this context? Behold the following text.

This example, I will give, comes from SQL - the language that we use to write RDBMS queries, to fetch data from databases. If you are familiar with SQL, you skip to next para. Here I give a vague intro. One of the reports that we frequently use on such data is of the kinds where we look for what the data is indicating, in its entirety. Such functions in SQL are called aggregate function. For example, suppose we have bank transaction data in number of records, each record has month (1-12), day(1-31) and amount credited(+) or debited(-) on that day. Now suppose we want to MAXIMUM amount of transaction credited in each month. Then we have a simple query-

SELECT MAX(amount)

from bank_transactions

group by month;

That's it - for each month I have the maximum amount that was credited. Notice - MAX is a aggregate function and the aggregation is done on the basis on month. And suppose if we skip the group by clause - then the query will give the transaction that had maximum credit in all the transactions. So group by clause is not mandatory is we are using aggregate functions. But, we are using group by clause then we need to use some aggregrate function - MAX, AVG, SUM etc since this functions tells SQL - what data must be aggregrated and how it should be aggregrated. So feeling better now. Let's come back to p -> q now.

Now we know two things:

1. We can use aggregate functions without any group-by clause.

2. If we are using group-by clause, then we need to use some aggregate function.

So can we put this information in terms of p -> q? Surely. Let:
p: group-by clause is used
q: aggregate function is present

then p -> q means
“if group-by clause is used then aggregate function is present”

Wow! That was easy, right. Now:

1. If I see a query where group-by is used, then I am damn sure that it has to have some aggregate function. That means, aggregate function must necessarily be present if the query has a group by clause. So, in this case, p is necessary for q. It must happen, if q has happened. Although, there could be another way p could happen, but what I am sure is, that if q has happened p should follow too. But if I see an aggregate function(q) in a query, then I cannot say that there is a group-by clause(p) in it. Hence, for q, p is not necessary.

2. Another way to look is – if I see a group-by clause in a query then, it is sufficient for me to conclude that there was an aggregate function used. In this sense, p is sufficient for q to happen. Again, there are other ways for q to happen. But that doesn’t mean that seeing just aggregate function I can say if there was a group-by clause involved in it or not. Hence, q is not sufficient for p.

Hope, you enjoyed this post!