Learn about a 1200-year-old Chinese technique used for hacking and its relevance today. Read more on Hackcave.net and expand your tech knowledge.
![]() |
Image credits:i.ytimg.com |
A Scholor called Born al-Kindi who was born in Baghdad is credited as author of the oldest surviving document on cryptology, A Manuscript on Deciphering Cryptography Messages. In that manuscript he explains the use of letter frequency statistics to unlock ciphers, a technique taught even today in cryptographic classes.
![]() |
Image credits:i.ytimg.com |
A translation of al-Kindi’s tree diagram classification of cipher types. Screenshot from Cryptologia.
How Letter Frequency Attack works.
The concept is very simple. From a cipher text the most used characters are noted and educated guesses are made to what that letter would be. In English, for instance, the most commonly used letter is ‘E’. So where the letter ‘A’ is substituted for ‘E’ in a basic ciphertext, ‘A’ will likely appear more times than any other character. Using informed guesses based on the statistics at hand, it’s a case of working out what has been substituted where until the original text reveals itself.
In their recently released paper,Microsoft Security Researcher Kamara, along with co-researchers Muhammad Naveed from the University of Illinois and Charles Wright from Portland State University, have shown how they can grab revealing medical information from hospital databases using the above said Such frequency attacks when they’re protected by some of the most advanced encryption available.
Studies Conducted On Highly Secure Encryption
Their studies were conducted on searchable encrypted databases (EDBs), where operations can be performed on encrypted data without ever needing to use or share a key for decryption. With current systems, anyone who wants to analyse encrypted data has to unlock it first, which is nearly impossible.
There are various proposed methods for data encryption, the most promising one is called FHE (fully homomorphic encryption). Here one can carry out any mathematical task on any piece of encrypted data without ever revealing the hidden text. The problem with FHE is that it sucks up an extraordinary amount of compute power. It’s yet to be determined whether homomorphic can put to practical use.
Another one is CryptDB, created by MIT scientists in 2011. The software keeps it simple, allowing basic functions to operate on SQL databases, the most common form of database used by many of websites and business systems. CryptDB combines old forms of encryption, some of which allow certain kinds of calculation to take place on them.
By covering the database in different layers of encryption, CryptDB allows the user to peel away at the “onion” to get to the layers of encryption that allow those mathematical operations (the ones that make up search or analysis algorithms, for instance). Once that operation is performed, put the peel back on and the database is thoroughly wrapped up in its multi-layered protective shield again. At all points the data remains scrambled. When the layers are peeled away and queries made, data is leaked.
Applying al-Kindi’s ideas and Kamara’s hacks On Modern Encrypted Systems
Microsoft researcher Seny Kamara uses techniques developed 1200 years ago, as well as a couple of his own, to expose supposedly state-of-the-art encryption. Kamara and his fellow researchers started by downloading real patient data from 200 U.S. hospitals. They then set about determining if their “inference attacks”, where leaked data is combined with publicly available information to infer the plain text, would work against the kinds of encryption used by CryptDB.
In particular, they targeted two types of the “property preserving elements” of CryptoDB. Those properties are necessary to allow for operations on data whilst keeping it scrambled, but subsequently leak vital information. First was the order preserving encryption (OPE) scheme, which encrypts a set of messages in such a way that their ciphertexts reveal the order of the messages. Second was the deterministic encryption (DTE) scheme, which reveals whether scrambled data types are equal or not.
Taking al-Kindi’s 1200-year-old technique and using it on DTE-protected columns in the database, it was possible to look at the scrambled versions of the medical information to see what blobs of encrypted data occurred most often. They then compared the statistics with those of two freely-available auxiliary datasets. The comparison allowed them to make informed guesses at what lay behind the ciphertexts for a vast quantity of information.
The results were alarming. Their guesses uncovered the correct mortality risk and patient death attributes for 100 per cent of patients in at least 99 per cent of the 200 largest hospitals. They recovered the disease severity for 100 per cent of the patients for at least 51 per cent of the same hospitals.
Another simple attack revealed a lot of sensitive medical information. The “sorting” method takes advantage of the order left by OPE.
Taking al-Kindi’s 1200-year-old technique and using it on DTE-protected columns in the database, it was possible to look at the scrambled versions of the medical information to see what blobs of encrypted data occurred most often. They then compared the statistics with those of two freely-available auxiliary datasets. The comparison allowed them to make informed guesses at what lay behind the ciphertexts for a vast quantity of information.
Successful Attacks Against CryptDB
The results were alarming. Their guesses uncovered the correct mortality risk and patient death attributes for 100 per cent of patients in at least 99 per cent of the 200 largest hospitals. They recovered the disease severity for 100 per cent of the patients for at least 51 per cent of the same hospitals.
Another simple attack revealed a lot of sensitive medical information. The “sorting” method takes advantage of the order left by OPE.
Kamara explains the basic technique: “If I give you column encrypted using this order preserving stuff and suppose you have a range of one to 10 and all numbers are encrypted and appear once in this column, all that’s needed is to take a column and sort. Then I know the ciphertext that is the smallest corresponds to number one. It’s just kind of there.”
Putting that simple method into practice on the medical data proved devastatingly effective, recovering the admission month and mortality risk of 100 per cent of patients for at least 90 per cent of the 200 largest hospitals.
The researchers also created two of their own, somewhat more complex attacks. The results were similar, deciphering large portions of the databases. More Details are available in the full paper linked here.
CryptDB is not secure, it can be hacked.
Hackers hoping to exploit these techniques would first have to gain access to the server on which the database was held. They would then have to wait for queries to be made on the vault to get to the right layer and successfully expose the information.
Where they want names, the hackers could use de-anonymization attacks, again correlating auxiliary datasets to expose whose details they had uncovered. CryptDB and crypto of its ilk is not as secure as had been assumed.
“CryptDB seems like a magic bullet, but what was missed with all the excitement about it was that there are some real trade offs here,” said Kamara. “People who are into crypto know this and feel uneasy about it in some sense, but nobody has explicitly shown how much leakage there was.”
What CryptDB creators Say.
One of the creators of CryptDB, Raluca Ada Popa, said she did not believe the findings proved CryptDB weak as the flawed pieces of the software were not designed to handle sensitive information. She said OPE encryption should be used for “high-entropy values” where the order does not reveal much and that CryptDB was still a worthy way to protect information.
“This is how the CryptDB paper says it should be used. Saying that CryptDB is broken when used for something it was not designed for is straight incorrect – and thousands of security systems would not work in this case. Prevail and none of CryptDB followers are affected because they either use the order encryption scheme in a correct way (for the right types of data), or do not use it. Everyone I was in touch with that used CryptDB was careful about the use of OPE.”
She said that database administrators can specify which data fields are sensitive, and CryptDB ensures those fields are encrypted with strong schemes, not the weaker ones. Admins are warned the vulnerable modes are only suitable for data fields that have high entropy. Database administrators should therefore be careful when using certain kinds of information in CryptDB.
Conclusion
Findings of Kamara and others who are trying to break modern systems should be taken as an attempt to create safer and better systems. It doesn’t mean that the quest for cloud based encryption schemes is pointless. By attacking the existing systems and pointing out its weak points researchers can develop advanced and stronger systems. It is also astonishing to realize how old school methods of decryption are effective against ultra secure modern encryption techniques.
COMMENTS