1

Brand New Day

-

It’s a brand new day with no novelty! Back to the lab today trying to now get access to the packet data to calculate the hash values. I suspect that inside netfilter’s sk_buff structure there’s an unsigned char* data field. This probably is exactly what I need to get the hash values. There’s this awesome link which has great information about sk_buff structure. The unsigned int len; has the size of the complete input data including the headers. I guess if this len value == size of the actual data for the IP header (which could be TCP header / UDP header / ICMP header) then if we are using chunks of this data to find hashes then the following algorithm could be used:

no_of_chunks = len / BYTE_SIZE_FOR_SIGN;

addendum = len % BYTE_SIZE_FOR_SIGN;


for (int i = 0; i < no_of_chunks; i++)
{
storeInTable(hashRabin(data,i*BYTE_SIZE_FOR_SIGN,
(i+1)*BYTE_SIZE_FOR_SIGN - 1 ,0));
}
storeInTable(
hashRabin(data,no_of_chunks*BYTE_SIZE_FOR_SIGN,
no_of_chunks*BYTE_SIZE_FOR_SIGN+addendum, 0)
);


This are my initial thoughts let’s see how it works out!

-Rajat.
Rajat’s Homepage

0

Die Another Day!

-

Back again in the lab to get the module completed as this part needs a lot of effort.
The RabinHash available at Jaspell was very helpful in getting me started with the actual coding of the whole thing. Now since the Rabin Hash values are really varied I need to first figure out ways to search the packet hash presence effectively. Today I’ll try an idea where I’ll mod out the hashes to 3 distinct prime numbers and see the values they hash to. These indices from the table of pointers would point to respective hash values.


mod p1 mod p3
|_______________| |_______________|
|_______________|--->[val1]<--+ |_______________|
|_______________|--->[val2] |__|_______________|
|_______________|--->[val3]<--+
: : : | :
: : : |___ :

First I needed to read through how kernel memory allocation works.
Kernel Korner – Allocating Memory in the Kernel | Linux Journal was a fantastic link that got me right into the mem allocation principles!.
Let’s see how the day goes!

0

Nutch…too much Nutch

-

Yesterday the whole day was spent in trying to go through the Nutch source code. Chris and Ashish helped me out alongwith this link
Dissecting the Nutch Crawler
. This showed me that :
The file Fetcher.java has a reference to the “content” variable (which is of type Content). I found that initially only the URLs are stored during the crawl, then a request is sent. Then based on the MIME type of the content returned, the ParserFactory class creates a parser (html parser, pdf parser etc.). The code for these parsers can be found at nutch-0.6/src/plugin/. These plugins do the parsing and get the content as a “Parse” object. Using the Parse.getText() method (which we also felt was interesting) we can get the text content of any page!!!!!

0

Nutching Nutching Nutching

-

The whole day today was spent in analyzing Nutch Source Code with Anshul. It is almost 8:00 pm now and nothing has been done yet! Have received an e-mail from Chris Mattman and Ashish Vaidya giving some pointers. Hopefully, it’s gonna help!
Had problems while compiling the code as well. It’s strange that when I installed j2sdk from Java Sun site I did not get javac in the /usr/java/jre1.5.0_02/bin directory. So since I did not have enough time to look for the files, I simply downloaded Netbeans and got the thing compiled with Daddu’s suggestion!
Will blog later when I can get some success with Nutch.
-Rajat.
Rajat’s Abode