Global Sources
EE Times-Asia
Stay in touch with EE Times Asia
EE Times-Asia > EDA/IP

Search algorithm enables accurate Big Data searches

Posted: 05 Feb 2015 ?? ?Print Version ?Bookmark and Share

Keywords:Northwestern University? latent Dirichlet allocation? LDA? search algorithm? Big Data?

A technique called latent Dirichlet allocation (LDA) is presently being used by almost every search algorithm through unstructured Big Data. However, LDA-based searches has major issues in terms of accuracy and repeatability. Now, a Northwestern University professor was able to figure out why the method seems to be 90 per cent inaccurate and unrepeatable 80 per cent of the time, and often delivers different "hit lists" for the same search string.

Taking apart LDA, Luis Amaral was able to identify its flaws and was able to fix them. In fact, he is offering the improved version, which not only returns more accurate results but returns exactly the same list every time it is used on the same database. He's offering all this for free to Google, Yahoo, Watson and any other search engine makers, from recommendation systems to spam filtering to digital image processing and scientific investigation.

Luis Amaral

Luis Amaral claims to have discovered bugs in Google's, Yahoo's and nearly every other search engine, which he has corrected into a perfect search algorithm that consistently returns optimal, consistent results quicker. (Source: Northwestern University)

"The common algorithmic implementation of the LDA model is incredibly naive," stated Amaral. "First, there is this unrealistic belief that one is able to detect topics when documents have a significant mixture of topics. Our systematic analysis reveals that as soon as the corpus is generated with a large value of alpha (which in LDA controls the amount of mixing of topics in documents), its algorithms fail miserably."

The other big problem with LDA is that it uses a technique that more often than not gets stuck in what are called local maximums. For instance, if looking for the highest mountain in the U.S., if it starts on the east coast it will get stuck in the Appalachia's and never make it to the Rockies. Since there is no path that goes uphill from the Appalachia's to the Rockies it never finds the correct peak. If it had started from the west coast moving east, it might have found the highest peak, making the algorithm unreliable and subject to giving different results each time it is run.

"The common algorithm assumes that by pretty much using steepest ascent it can find the global maximum in the likelihood function landscape. Physicists know from the study of disordered systems that when the landscape is rough, one gets trapped in local maxima and that the specific local maxima found depends on the initial state. In the specific case of LDA, what this means is that depending on the initial guess of the parameter values one is estimating, one gets a different estimate of the parameters," noted Amaral. the full article on EE Times India

- R. Colin Johnson
??EE Times

Article Comments - Search algorithm enables accurate Bi...
*? You can enter [0] more charecters.
*Verify code:


Visit Asia Webinars to learn about the latest in technology and get practical design tips.

Back to Top