Can anyone explain me what byte stream actually contains? Does it contain bytes (hex data) or binary data or english letters only? I am also confused about the term “raw data”. If someone asked me to “reverse the 4 byte data”, then what should I assume the data is hex code or binary code?
4
Byte streams contain, well, bytes. Broken down into what it is actually, it is 8 bits composed of 1s and 0s. If it were representing a number, it would be any number from 0 to 255 (which, I may add, is no coincidence why the 4 numbers in an IP address always range from 0 to 255). Byte streams are usually sophisticated interfaces meant to hide the underlying basic byte array used to hold a circular buffer (you fill up the buffer and wait for someone to empty it, at which time it simply fills up the buffer again).
What the heck does that represent? Well, it could represent a text file, or an image, or a live video stream. What it is is entirely dependent upon the context of who is reading it. Hex representation is another way of saying the same thing, though it is sometimes more convenient to manage bytes in terms of their hex representation rather than numbers however it is the same thing.
When you’re referring to raw data, you are usually referring to byte data. The data comes without a tag saying “I am an image file!” Usually you only deal with raw data when you don’t really care what the data represents overall. For example, if I wanted to convert an image to its black and white version, I might say to read an image’s raw data and for every 3 bytes read (which would actually be representation of red color, representation of green color, and representation of blue color), add its number value and divide by 3, then write that value 3 times. Essentially what I’d be doing is averaging a pixel’s red, green, and blue values and making its gray equivalent pixel from that. However, when you talk about performing operations to data at the level of “byte by byte”, you don’t really care about the big picture, so to speak.
Or, perhaps you wish to save a file in a database, but it asks you to insert its “raw data” in a blob data type. This simply means to convert the data of a file into a large byte array that the database can understand and manage. You’ll find that when you retrieve that value from the database, it will be simply one large byte array as you initially provided to the database to begin with. If that data was a file, then you, the programmer, must reinterpret that byte data as if you were reading a file one byte at a time.
If someone asked you to “reverse the 4 byte data”, I would assume it refers to big-endian vs little-endian interpretation of numbers, which writes numbers starting with the most or least significant byte. It does not matter if a number is represented as big-endian or little-endian, just that all systems reading the number interpret it consistently.
This isn’t to say that the actual number representation (or hex representation for that matter) is changed, simply that the order in which these 4 bytes make a number should be reversed. So say you have 0x01, 0x02, 0x03, and 0x04. To reverse these, you’d have 0x04, 0x03, 0x02, 0x01 instead. The system would presumably read these 4 bytes in the reverse order and since you’ve already reversed it, the value is interpreted to be the very same as what was intended in the raw data.
I hope that explains it!
4
A byte is simply a unit of information – it can be anything. A byte by itself doesn’t mean anything, you have to attach some sort of meaning to it.
So, to expand on that –
Does it contain bytes (hex data) or binary data or english letters only?
Hex data is the same as binary data. It’s just a different way of displaying the data. For example, 0x41 = 0b01000001 = ‘A’ = 65 (decimal). English letters would be just a subset of that.
If someone asked me to “reverse the 4 byte data”, then what should I assume the data is hex code or binary code?
Since hex is just a representation of the data, it doesn’t matter how you think about it. If you have data of 0x65 0x66 0x67 0x68
, to reverse it you would get 0x68 0x67 0x66 0x65
. If you were looking at this data in terms of characters, you would originally have A B C D
, but now you have D C B A
.
Back to a byte stream – it’s just a sequence of data. You need to know what the data represents in order to use it. If we’re reading a text file, the byte stream that you would get when you’re reading the file would just be characters of some kind. An executable file would have a bunch of unprintable characters in it, which is why it would be called a binary file. Clearly, it’s possible to open up an executable in a text editor, but it doesn’t do anything useful.
3
A byte stream is an ordered sequence of bytes. There is a first byte, which has no predecessor. Its successor is the second byte, and so on. Nowadays, a byte is widely understood to consist of eight bits. If we want to be more precise, we use the term octet stream and octet. There still exist computers with bytes that aren’t eight bits wide.
Hexadecimal is a way of writing numbers, and serves as a printed representation for binary data. Hexadecimal is actually text. For instance, the hexadecimal value FE
might represent a byte: the bits 11111110
which have the decimal value 255
. However FE
is actually a character string consisting of the characters F
and E
, which requires two bytes in the US-ASCII or ISO-646 character set! These two bytes is what FE
is, and the single byte with value 254 is what the FE
represents, as a printed notation.
If a communication channel, or file handle or some such device is described as carrying a byte stream, and no other information is given, it almost certainly does not mean that bytes are represented as hexadecimal text, so that each abstract byte in the stream requires two physical bytes.
And raw data simply means bits which are not interpreted to have any structure beyond just “array of bits”. Raw data usually has a structure and represents something, but when we are looking at it as raw data, we are either ignoring the interpretation for the moment (for instance, we are looking at the raw representation of a data type to verify its correctness down to the bit level detail), or the interpretation is not available (we have some data, but we do not understand the structure of the data and what it represents).
1
A byte is 8 bits. A bit is 0 or 1. The “raw data” is just a flow of one byte after another. A byte stream can come from a file, a network connection, a serialized object, a random number generator, etc.
-
There are several ways to display a byte: binary (01110110), hex = hexidecimal (7C), octal (0271), or decimal (215). In all cases, the maximum value is 255 (base 10).
-
Sometimes bytes are assigned to characters, like ascii. Type “ascii” on a unix command line, and you’ll get a big table that maps the byte vales 0-255 or (0-FF hex) to the associated character. For example, space is x20 and “A” is x40. Note that some byte values map to control characters and aren’t printable. But the bytes themselves aren’t characters — they’e just a bundle of bits. A number.
-
“reverse 4 bytes” would be to take some bytes 123 42 231 0 and flip the order — 0 231 42 123. Applied to a byte steam, I’d probably read 4 bytes, reverse them, read the next 4 bytes, etc.
(BTW that problem is relevant, because if you want to represent a number bigger than 255 as byes, you need to use more than one byte. But then the question is, does the “biggest” byte come first, or last? That’s called big endian or little endian — look those up for more background on why it’s useful to shuffle around the bytes in a raw byte stream.)