Many of the steering-committee has been inactive for some time, so after discussing with the remaining committee I was given permission to take charge and start working on production of the Okopipi project. After some research and working with Kwinter we have come up with what we feel will be the best structure for the Okopipi network. This paper explains how everything will work. However we are just 2 people, so please if you find any holes in the paper, or have any suggestions/comments please let us know.
Comments
Project paper
1) Give the doc a title so people can refer to it as more docs evolve...
2) Typo at VI compliant (complaint)
3) Distribute everything - I can see the argument for the assigned servers, but all that does is give a hook for the spammers (and others) to lever against - distribute everything. Any other approach is vulnerable by in-band or out-of-band attacks (even through foriegn courts etc)
4) think about some form of block-hashing, rather than the entire body. That way you can create multiple blocks from a single message, and declare anything matching more than (say) one block-hash as spam... This gets around changing in-message text (which I have seen also) and allows header information to contribute to identifying what is spam. Some rules around blocks being either a sentence, or in the absence of punctuation x chars to the next whitepace or something.
5) remember than you can tweak and improve the spam detection algorithms (as per 4) but that the security has to be solid from day one. While my head might work better around how to identify spam, this should NOT be your primary focus at this stage - I'll happily support and install an improvable mediocre system, so long as I'm comfortable with the security mechanisms around it - THAT must be your primary focus getting the network off the ground.
6) I wonder how much you can 'borrow' from existing networks to speed up development - eg: eDonkey search or Azeurus database - but I suspect you are much more expert than me in these matters - just PLEASE don't delay Okopipi go-live by reinventing (and then designing, coding, debugging, etc. any wheels)
6a) Also, if 3rd party network softwae providers are prepared to play along (even as 'secondaries'), launching the network as distributed across hundreds of thousands of nodes from day-one protects the Okopipi network from the certain take-down attempts which will occur before it gains sufficient traction (unlike blue frog, the evil ones will be arming up for this in advanced ways long before go-live)
7) I miss blue frog - my spam levels have now trippled and are back to what they were before... Please keep up the momentum!
--J
Well said!
5) remember than you can tweak and improve the spam detection algorithms (as per 4) but that the security has to be solid from day one. While my head might work better around how to identify spam, this should NOT be your primary focus at this stage - I'll happily support and install an improvable mediocre system, so long as I'm comfortable with the security mechanisms around it - THAT must be your primary focus getting the network off the ground.
I second that.
Comments/questions on okopipi.pdf
Comments:
Section III
-----------
Hashing the body of the message won't work - most spam contains random text.
You need to do something like pick the URLs out of the message.
Be careful though because URLs can easily be randomized, eg. http://ioeiuewusadn.cheapviagra.com/
I'd take just the first two parts of the domain name, ie. "cheapviagra.com" in the above example.
Section IV
----------
There's two databases - the database of complaints and the database of client keys.
The complaints database is short-lived and constantly being rebuilt so it isn't particularly valuable in itself. I think the fewer copies of the complaints database, the better. Synchronizing multiple copies of a high-churn database will be very difficult.
The database of client keys has very different needs. It's low-churn and very valuable. You need multiple signed/encrypted copies distributed around the world. In case of compromise of the client database it should be easy to revoke the keys and get users to generate a new one.
Section VII
-----------
How do clients obtain a key? If the process can be automated then it's pointless - a spammer can set up his botnet to obtain millions of keys then use them to flood the servers.
Other comments:
Updates
-------
Will the clients be able to update themselves if weaknesses/problems are found? Can the network deliver these updates automatically?
Master key
----------
Loss/compromise of the master key would be catastophic. Where is this key stored/who guards it? (eg. Verisign?)
Quote:Hashing the body of
random text per email? I havnt seen this in my spam (this isnt to say it isnt out there) some spams have multible url so which one would be hashed? all of them together? I think we are just as good hashing the whole body
we have decided not to store user keys in the database each report sent by a client will be sent with the signed pub key
The clients will generate their own keys and have them signed by an SU node, yes it would be automated for the most part. Why would it be pointless, the point of keys is not to stop people from connecting to the network its to prevent tampering and spoofing. Gnunet has features that will help us fight drones, such as beable to detect abusive nodes. it would be very hard to set up millions of bots to connect to our network they would have to make their own "evil client" and distrubte it, just because they have a bunch of infected spambot machines doesnt mean they have direct access to install whatever they want on them. Their botnets are fairly limited in what they can do, its not like they can just use them with any protocol they want on a whim
-master key
as far as the master key goes the project admin would have a copy of it, it would be stored on read-only media, it would also be encrypted probably using Truecrypt or somthing similar
-= Journeyman =-
Random text
random text per email? I havnt seen this in my spam (this isnt to say it isnt out there)
You won't see it if you only have one account, but yes, I've got spams in my inbox which are identical apart from a single line of text in the middle.
eg. One says: "Increased weight, less energy, bad mood and wrinkles on your face? "
The other says: "Just how often did you recall the times of youth when you felt a lot better and full of energy?"
Apart from that they're identical.
Domain
I think that each unique domain name listed in the email should generate a report or have each seperate domain be tied to the report for that email and any email with the same url in it would be matched to it. Doing it this way would identify the offending domain and not rely on spammers using the same email over and over. Every email I get has something different in it to make it more likely to pass filters. Like C1alis, Cial1s, C1al1s, C!alis, and Cialis.
It's not very hard to create an email script to email anonymously and generate a piece of random text in the body to make it different every time. It is hard to get someone to go to your website if they get a different address every time. The domain names are always the same.
Domain names
Not always. Many times they add a subdomain or a /something/ after the url, but they end up pointing to the same site. There probably needs to be a way to determine where the user is forwarded from the offending url. (many times a site with an affiliate link) Then some type of algorithm could be used to determine what spams are linking to what sites, to be able to tie them to the correct spam ID or attack/campaign ID.
If you get 50 spams with 50 different URLs, it's going to be hard to have the system realize that this is the same spam, and therefore attack the same spammer for all of it.
Domains, not subdomains
I said domain not subdomain. Okopipi would have to strip off anything besides the domain, in other words somewhere.something.com/anywhere.php would be something.com. Its very easy to get the domain from a url.