We often think of computers as working with numbers, and in the purest sense we are, but those numbers are simply 1s and 0s. Normally these bits can be grouped together to represent much larger numbers but as a developer, you'll often times need to use them as bits and you forego addition and division for "bitwise" (sometimes referred to as Logical) operations. Bitwise simply means that you're dealing with individual bits and not the number as a whole, the representation of the bits working together to make a larger number.
There are all sorts of bitwise operations: AND, OR, XOR (pronounced ex-or), NOT, NAND, NOR, etc. I could talk all day about any one of these, but I'm going to focus on AND, OR, and XOR. I'm a C# fanatic so I'm going to use C#'s operators to reference them. You've probably seen AND and OR as && and ||, respectfully. C#, as well as many other languages, use two of each of them to separate them as Conditional operators as oppose to Bitwise operators. && and || work with boolean values, however & and | are the bitwise alternatives and can work with primitive types (and objects that overload the operator). C# does have a bitwise XOR operator ^ (caret), however, using two of them (^^) does not make it a logical operator. MSDN has an extensive list of C# operators.
AND and OR work just as they do in everyday language and just like it does for logical operators. The difference is that a logical operator compares one boolean value to another and results in a third boolean value, whereas a bitwise operator can take in two primitive type values, and produce a third value that is the same type of the two input values. For example:
int x = 5;
// x = 00000101
int y = 8;
// y = 00001000
int z = x | y;
// z now holds 13 or 00001101
Bitwise operators apply their operation to each pair of bits that are in the same location in each operands value. In the above example, an OR acts on 5 and 8 to produce 13. Each bit in the output is decided by the bits of the input. Since this is an OR operation, if either of the two inputs is a 1, then the output is a 1, i.e. "If bit x OR bit y is a 1 then the output is a 1, otherwise 0." AND is expectedly different in that both values must be 1 for the output to be one, or better worded as "If bit x AND bit y are both 1s then the output is a 1, otherwise 0." XOR is less familiar to most people because we don't use it as much in daily language as we use AND and OR. XOR stands for "eXclusive OR" meaning one but not the other: "If bit x is 1 or bit y is 1, but not both, then the output is a 1, otherwise a 0." There are many different ways to view XOR that all result in the same meaning: by inversion such as "If the first bit is a 1, invert the second bit and that's the output, if the first bit is a 0, the second bit is the output" or by inequality, "if bit x does not equal bit y (or vice versa) the output is 1, if bit x does equal bit y, regardless what that value is, the output is 0." These values result in tables that look like this:
| x |
y |
AND |
OR |
XOR |
| 0 |
0 |
0 |
0 |
0 |
| 0 |
1 |
0 |
1 |
1 |
| 1 |
0 |
0 |
1 |
1 |
| 1 |
1 |
1 |
1 |
0 |
Honestly, you probably knew this. If you didn't, I doubt it'll take you long to get it. The reason I write this article is more about the statistical importance of using AND vs. OR vs. XOR. There are countless uses for bitwise operations. From compacting stored booleans as flags in an int or long to complex data obfuscation. Primarily I'll be talking about its purposes in encryption and why XOR is the prime candidate over AND or OR.
The simplest thing to note is the odds for each output, for example AND has a 75% chance of output a 0 and 25% chance of outputting a 1. This is only true if the inputs each have equal possibilities of being 1 or 0. If either x or y has a better chance of being a 0, then AND is even more likely to output a 0 than a 1. OR is quite the opposite being 75% chance of outputting a 1 and 25% chance of outputting a 0, again, only if inputs are not biased. XOR is 50/50, but there's a more interesting principle at play here...
Before going any further, let me define two functions: Max() and Min(). They're simple and you probably already have experience with them, but for full disclosure, I'll explain. Max() accepts two numbers and returns the number that is bigger, Min() accepts two numbers and returns the number that is smaller. So Max(5, 1) will return 5 and Min(5,1) will return 1. For Max() or Min(), if both numbers are the same, that number is returned. Simple, just like Math.Min() and Math.Max().
What I'm about to tell you, I had to find out for myself. I shouldn't have to, as part of a computer science degree, I'd expect this to be fundamental and taught in Programming 1 when they go over all the other operators. Take a look at this:
byte a = 100;
byte b = 255;
byte c = a | b;
byte m = 200;
byte n = 250;
byte o = m ^ n;
byte x = 0;
byte y = 75;
byte z = x & y;
The numbers are arbitrary. What's important is the relationship between the input values and the output values. Obviously, the input is directly responsible for the output, but with enough information about one operand, the other operand may not matter. Look at the first set of variables, a through c. Because b is 255 (11111111 in binary) we already know that any OR operation involving that number will result in 255. OR has an interesting property: Regardless of the values of a and b, if c = a | b, then its a fact that c >= Max(a, b). Take a minute to soak that in. Look at the table above, if there is an OR with an operand of 1, then the output will be 1, it is impossible to OR a 1 and get a 0. This means a number cannot be made smaller by being ORed with another number, it can only be made bigger. The corollary is true for AND. Looking at the variables x, y, and z, you can tell that since one operand is 0, whatever it is ANDed with cannot alter any 0 to become a 1. If z = x & y, then its a fact that z <= Min(x, y). Again, in the table above, it is impossible to AND a 0 and get a 1. These properties deem AND and OR as highly not recommended in cryptography. Cryptography is about removing all tells about what the original data was. If you knew that some number, I'll reuse c, was the result of an AND and had the value of 10, you know that both operands of that AND were greater than or equal to 10. If it were an OR, you know that both operands must have been less than or equal to 10. The only saving grace for AND and OR is that if you know the output and you know one operand, you do not know the other operand, XOR falls short there but makes up for it because of the fact that regardless the input values, the output value is in the range of the min value and the max value of the data type (byte, int, long, etc.). XOR cannot be simplified to a formula like AND and OR, it's result is not guaranteed to pre-determined relative to either input operand. XOR does have special cases, such as if 0 is an operand, the output is simply the other operand. If 255 is an operand, the inversion of the other operand is the result. Note that even in these special cases, the output could be any value between the min value and the max value.
There are two tricks worth pointing out about XOR: in-place swapping of variables, and simple (VERY SIMPLE) basics to encryption. Here's two more code samples to talk about:
byte x = 100;
byte y = 200;
x = x ^ y;
y = x ^ y;
x = x ^ y;
byte a = 100;
byte b = 200;
byte c = a ^ b;
bool match1 = ((c ^ a) == b);
bool match2 = ((c ^ b) == a);
In the first example, x starts as 100 and y starts as 200, after the 5th line, x holds 200 and y holds 100. In a sense, during the XORing process, the variables "shadow" each other and take up the same place. This is better displayed in the second example. Both match1 and match2 are true. After XORing a and b together, if you XOR the result, c, with one of the initial values, you get the other initial value. If one initial value is a character of text and the other value is a byte from a key, the output is an encrypted byte. If you XOR that encrypted byte with the key value again, you get the original unencrypted character. This is absolute fundamentals of implementations of encryptions, there's a lot more involved if that encryption is going to be secure, but I'd say this is square one.
I hope this isn't too much info all at once. It turned out to be more writing than I thought it would be. As simple as these bitwise operations are, they're applications are endless and as basic as they seem, there's always something new to learn about them.
C#, Computer Security, Design Pattern, Development, Misc