My man Bob “I’ve got Hockey Hair” Elder has written an fantastic post on performing a jtag examination on Android Phones.
Check it out – its completely awesome!
My man Bob “I’ve got Hockey Hair” Elder has written an fantastic post on performing a jtag examination on Android Phones.
Check it out – its completely awesome!
Check it out! ViaForensics has just enhanced ViaExtract with the ability to brute force swipe passcodes. Its available to licensed users. Check it out at the below link.
While working on revisions for my Smart Phone Course, I thought I could perhaps craft a primer that would help explain Huffman Coding in SQLite a bit better than I had explained it during the class. Below is the result of that endeavor, written between flights to Saudia Arabia. I hope that it is helpful for you.
1. Introduction
This article is designed to provide you with an introduction to a concept know as Huffman Coding. Understanding the basics of how to decode Huffman coded variable length integers within SQlite database records is of relevance to any forensic examiner seeking to test and validate forensic software or to carve for SQLite records.
The examiner is encouraged to explore the references listed after the primer for further understanding and information related to developing search algorithms.
2. What is “Huffman Coding”?
Huffman coding is an algorithm used for lossless data compression. It is used in SQLite to store the values of the record size, key and record header variable length integers. It is also used in other file formats such as PKZIP (and its variants) JPEG and MP3.
Huffman coding uses one byte for a decimal value from 0 to 127, two bytes for a value up to 16,383, three bytes up to 2097152, continuing up to a nine byte maximum.
3. SQLite Record Structure
In SQLite 64 bit variable length integers – which use Huffman coding -are used to store the value of the record size, the key and the header data of an individual record. Below is a graphical representation of a SQLite data record
The bytes that make up the individual header follow in order the Schema Layer of the database. The values of the bytes will be interpreted as in the below table. The value column refers to the hex value of the byte not the decimal value. To find the length of the bytes listed in the last two rows you need to first convert the byte to decimal and then follow the formula listed in the third right most column.
Remember that we said that any value above 127 decimal requires a special conversion because the decimal value has been compressed using the Huffman algorithm. This will be detailed in the following sections.
4. Huffman Coding Example
We will now look at an instance of an SMS message that employs Huffman coding to store the decimal value of the record size. The total record length does not factor in the record size byte or the bytes that make up the key (record number). The size of the record reflects the record header plus the data section. The first byte of the below record has been highlighted to show the beginning value with which we start to calculate our record size.
5. Decoding the sequence – Two methods
I will show you two methods for calculating the decimal value of the total byte length of the record. The first method, while lengthy will give you a detailed understanding of how huffman coding works within SQLite. The second method is equally valid and provides for a shortcut. I reccomend trying a few on your own using the lengthier method to fully understand the ins and outs before heading to the “shortcut”.
The variable integers in the SQLite records are stored in big endian format regardless of the host operating system. This means that the most significant bit will be stored first and the least significant bit stored last.
The first method we will be looking at converts our byte of interest from hexidecimal to binary to determine if the most significant bit is set. If this bit is set we then know that the next byte in the sequence is also part of the calculation. This byte is then converted to binary and checked to see its most significant bit is set. If it is, we proceed onto the next byte and so on.
Lets look at our above text message and the byte we have highlighted. The value of our byte is \x81. Converted to binary its value is 1000 0001. As we can see from the below graphic the most significant bit – shown in red – is set. Therefore we need to look at the next byte for our calculation
We now need to look at the next byte in the sequence. in this case it is \x36. Converted to binary its value is 0011 0110.
As is shown in red, this byte’s most significant bit is not set. Therefore our calculation uses only the first two bytes in the record to calculate size. To do this calculation we leave off the most significant bit of each byte- shown below in green and red – and concatinate the remaining seven bits of each (starting with the binary of the first byte \x81). The resulting binary is then converted to decimal for the size of our record.
The next method uses a calculation for determining the stored compressed decimal value. You will recall from our previous discussion that a byte that is greater than 127 decimal uses 2 bytes to store its value up to decimal 16,383 and three bytes up to 2,097,151 etc. 127, 16,383, and 2,097,151 are special because they are related to powers of 2^7; i.e., 127 = (2^7-1), 16,383 = (2^14-1), and 2,097,171 = (2^21-1). Given this, when we examine the first byte of our record – the one that determines the size of the record – /x81 or decimal 129 we see that it is greater than 127 and smaller than 16,383 and will therefore use 2 variable integer bytes to store its uncompressed value. The formula to calculate the uncompressed value using two bytes in the calculation is as follows
The value of “x” is the value of the first byte converted to decimal and the value of “y” is the second byte converted to decimal. The calculation is continued below
As you can see from the above calculation the value obtained from this method is the same as the other method. It is unlikely that values above 16,383 will be encountered, however the formula for calculating a compressed value stored within three variable integers is as follows ( x x 128) x 16384 + ( y – 128) x 128 + z. Again the value of “x” is the value of the first byte converted to decimal, the value of “y” is the second byte converted to decimal and the value of “z” is the third byte converted to decimal.
6. Conclusion
We have now explored the basics of how SQLite stores decimal numbers over 127 in 64 bit variable integers using an algorithm called Huffman Coding. Understanding this concept will help forensic practitioners in testing and validating software used to analyse SQLite databases and in crafting searches for deleted and unallocated records that may remain extant on suspect media. The reader is encouraged to explore the references listed below for further understanding and insight into SQLite and searching for records in unallocated space.
Comments or questions are welcome and can be directed to linuxchimp at gmail dot com.
7. References
8. Acknowledgements
No article is written without help. I would like to thank Dr. Gary Kessler for his valuable eye and suggestions as well as DC Shafik Punja of the Calgary Police and Sheran Gunasekera of Zenconsult, LTD for our discsussions regarding variable length integers.
The phrase “profile forcing” refers to using an extraction profile of mobile model numbers in and around the model you are seeking to examine when that model is either officially unsupported or for some reason won’t work with the existing profile. Though not a sure thing, this has proved to be a successful attack on problematic phones and yielded information on what might have been a lost cause otherwise.
I used this tactic just today when trying to grab a physical image of a Samsung GT-s5322A. It was officially not supported for a physical or logical download in XRY and Cellebrite. In fact, the phone was being problematic with the NSPRO box as well. As my frustration mounted, I reached for the UFED again and tried a physical dump using the s5230 profile.
Suddenly a beam of light shot from the sky, the clouds parted, and a heavenly choir began to sing…
Well, the extraction started and I did the Geek dance of joy…over 250 MBs of juicy data – yum!
When using this method always be sure to validate the findings and report the success to the vendor so they can do additional research and add it into their profiles for others to enjoy equally in their forensic endeavors.
Mahalo nui loa Cellebrite for making me look like a rock star in Saudi Arabia today!
According to CNET’s by Don Reisinger in his online blog post on 6/6/11, Microsoft is suing companies in order to force them to pay royalties on alleged patent violations of Android. I have to say, though I’m not surprised by this tactic – Microsoft is stagnant – I find it highly irritating and disappointing.
All this patent suing is stifling innovation and ultimately harms the consumer. Do we really all want to have Windows mobiles? Don’t you want a choice? Don’t you want a healthy competitive market place where ideas flow and prices stay low? This is yet another instance of big companies gone wild, abusing the system and sticking their middle finger up at the rest of us as they grab more and more of the pie.
This is ugly poor form by Microsoft and other greedy selfish grabbing companies. Read more on Don’s article by clicking the link below
According to CNET’s Lance Whitney, the iPhone-dev team announced that the popular RedSn0w utility had been released with an update for jailbreaking the still developer only iOS5. The jailbreak is tethered only at the moment.
Read more on CNET by clicking on the below link -
RedSn0w update
Stephen Shankland in his “Deep Tech” blog for CNET is reporting that Andy Rubin, the leader of Google’s Android Project has stated that Android is open source from a “software” perspective not a “community” perspective.
Shankland writes:
The reason for Google’s approach is so the company can control Android’s interfaces, the underlying features that Android apps use, Rubin told reporters at the Google I/O conference.
These application programming interfaces (APIs) are best developed in private and released only when Google deems them done, a process that ensures app compatibility, he said.
“Open source is different than a community-driven project,” Rubin said, in which a broad collection of people collectively determine the software’s future. “Android is light on the community-driven side and heavy on open source.”
Read more: Android Quasi Open Source
Josh Lowensohn is reporting in his CNET Apple Talk blog that Apple has addressed the public outcry over its devices’ location-tracking behavior including that information’s security in its latest update to iOS.
Lowensohn writes “iOS 4.3.3, which is a free update delivered through Apple’s iTunes software, reduces the size of the “crowdsourced” location cache, no longer backs up the cache to iTunes in the form of a device recovery image, and deletes the cache when a user turns Location Services off. “
Read more: http://news.cnet.com/8301-27076_3-20059752-248.html#ixzz1LVBlzoIV
Looks like RIM and Steve Balmer have something that is keeping them awake at night. As Marguerite Reardon writes about in Signal Strength RIM and Microsoft are desperately trying to outdo Google and Apple. Ms. Reardon states that the two new allies are likely only to close the gap in the smartphone race. That’s right – catch up, not pass over. From the sounds of Ms. Reardon’s article, there is a lot of ho-hum going on.
In a related note, I was watching Castle last night and was surprised to see the Apple iPhone replaced by a Windows 7 mobile phone. Rick Castle has been a staunch Apple user – interesting. How much was paid by Lord Balmer for that switch?
Oh yeah and Body of Proof? iPads galore.
Redmond and Waterloo Canada may need some Tums.
I am facinated by SQLite. But before you say “Mike you really need to get out more and get a life! ” consider this -the two most popular smartphone operating systems of today, iOS and Android, use SQLite databases to store important information such as contacts, SMS, and call records. That’s a lot of users – both victims and perpetrators. Have you ever wondered how the SQLite structures its records? An understanding of the SQLite record architecture is crucial to validating the output of forensic tools and for knowing where to look for evidence – including that elusive brass ring, deleted information.
SQLite database images always begin with a well know 16 byte signature which in ASCII is represented by “SQLite format 3″ followed by a null byte. The database image header is 100 bytes in length. A full discussion of the database image header can be found on the official SQLite page. Amongst other things a well formed serialized database image will have a database schema or table layout. Knowing the schema of tables is a key bit of knowledge as it will provide the guide to the record header and subsequent record contents.
We will be using a record from the sms database of a first generation iPhone running iOS version 3.13 as an example for this post. Below is a screen shot taken of the schema of the message table using my favorite Mac SQLite editor Base.
Having noted the schema of the message table within the sms database, lets take a look at a record in a hex viewer.
The record is structured as is show in the below graphic.
In addition the record uses the values in the below table to represent the values of the bytes.
So let’s take our example and work through the record header. The first byte of course is our record length – in this case the record is 110 bytes long. The second byte is our key.
The next byte is the length of our record header including this byte – which is 19 bytes.
The next byte is null which indicates that the value is not included in the record. This would have been the ROWID from the schema. The next byte corresponds to the address row in the schema. The byte 0×27 is odd and greater than 13. According to our table this corresponds to text and the byte length is derived by taking the decimal value subtracting thirteen and then dividing by two.
39 -13 = 26 / 2 = 13
We can see from the below graphic that the address or telephone number is indeed 13 bytes in length.
The next byte, 0×4 corresponds in the schema to the date of the SMS. This is a four byte value and is stored in epoch time. The value here is 1296980309 and translates to Sun, 06 Feb 2011 08:18:29 GMT.
The next byte, 0×81 is, as is indicated in the schema, the text message – but it is unique. SQLite uses a compression method based on Huffman coding to store values greater than 127 bits. In this instance the byte in the record header indicating the text message is \x81 or 129 and therefore greater than 127. Since the method uses 2 bytes up to a decimal value of 16,383, we can assume the next byte \x0D is also for the length of the text message. The method to calculate the length of the text message is as follows – where X = the first byte value and Y = the second byte value -
(X-128) x 128 + Y
Calculated out this comes to the below
(129-128) x 128 + 13
\x142
To find the length of the header we now refer to the table and do our calculation as normal
(N-13)/2
(129)/2
64
This is indeed the length of the text message. The message is straight hex to ascii.
The next byte in the header is 0×1 and in the schema refers to the flags row. The value in this case is 0×02 which means that it is an incoming text.
The last byte of significance in this record occurs at offset 17 in the record and as our table indicates is a value greater than 13 (0×11 = decimal 17). Since the calculation is N-13/2 the value we have here is 2 bytes and this refers in the schema to the country. In our example this is 0x6A 0x6F or “jo” for Jordan.
I hope that you find this post useful in your forensic endeavors. This post would not have been possible without the generous help and counsel of DC Shafik Punja of Calgary and Sheran Gunasekera of ZenConsult Pte Ltd.
Research regarding Huffman encoding in SQLite records was conducted using Murilo Tito Pereira’s article “Forensic analysis of the Firefox 3 Internet history and recovery of deleted SQLite records” published by Elselvier.