The word length of characters to be stored in a data processing system is 8 bit (one byte) (i.e., 00000000 is labeled 0 and 1111111 is labeled 255). This data processing system generates characters and stores in to a hard drive (HD). The characters labeled 0 to 31 are the most likely generated characters with probabilities 2/256 each and the characters labeled 32 to 127 are the second most likely generated characters with probabilities 1/256 each, and remaining characters labeled 128 to 255 are used equally with probabilities 1/512 each. The data processing system generates 1024 characters every second, and HD needs to store at least 24 hour of data. (a) If you need to store the data without any processing, what should be the minimum size of the HD? (b) If you need to implement the most efficient lossless compression technique, how much space is it reuired in the HD? (c) If you used Huffman Coding how much space is it required on the HD? (d) Assuming, the read and write process introduces errors on average 1 character per 1000 characters, how many character errors on average can there be in 24 hour recording? (e) In order to correct the errors, you applied (8, 4) extended Hamming code after data compression (compression rate that you found in Part B), what would be the minimum size of the HD? (f) What would be the character error rate after the Hamming code, minimum distance of the extended Hamming code is 4 (d_min = 4)? (g) Please briefly discuss the trade-offs of the systems with “only source coding”, “source coding + channel coding”, and “no source and channel codings”?
Expert Answer
Answer:
Given word length = 8 bits = 1 byte
a) Characters generated per second: 1024
Hence,
Characters generated in a minute = 1024*60
Characters generated in an hour = 1024*60*60
Then, characters generated in a day = 1024*60*60*24 = 88473600
Hence sise required = 88473600 bytes = 88.4736 MB
b) A lossless compression technique can normally achieve compression in ratio 1:2 approximately. In that case, required size for hard disk will be half of original size i.e. 44.2368 MB.
c) There are 3 groups of characters to generate. For Huffman coding, lets calculate the average bits required to represent a character belonging to a group out of 3.
Say
first group is A with probability = 2/256
second group is B with probability = 1/256
third group is C with probability = 1/512
On calculation, it can be shown that group A will take 1 bit for representation, while group B & C will take 2 bits for representation. So average number of bits per group will be:
(1*2/256)+(2*1/256)+(2*1/512) = 10/512 = 0.0195 = 0.02
Overall compression ration will be 0.02/3 = 0.0067 = 0.67%
It means there will not be any benefit on huffman coding in this case. Hence required size of hard disk will be same as in first case.
d) Total number of characters generated in a day = 88473600
Now
since number of errors per 1000 characters = 1
hence number of errors per character = 1/1000
hence number of errors received in a day = 88473600/1000 = 88473.6