Clustering and Similarity of Wikipedia Articles
In this article we are using nearest neighbors and clustering to retrieve documents that interest users, by analyzing their text. We explored two document representations: word counts and TF-IDF. In this iPython notebook we are going to dig deeper for retrieving articles from Wikipedia about various people and familiarize ourselves with the code needed to build a retrieval system.
import graphlab
Load some text data - from wikipedia, pages on people
# CSV format dataset https://d396qusza40orc.cloudfront.net/phoenixassets/people_wiki.csv
people = graphlab.SFrame('coursera-notebooks/course-1/people_wiki.gl')
[INFO] This non-commercial license of GraphLab Create is assigned to prashantgonarkar@gmail.com and will expire on February 13, 2017. For commercial licensing options, visit https://dato.com/buy/.
[INFO] Start server at: ipc:///tmp/graphlab_server-1185 - Server binary: /usr/local/lib/python2.7/dist-packages/graphlab/unity_server - Server log: /tmp/graphlab_server_1456409830.log
[INFO] GraphLab Server Version: 1.8
people.head()
URI | name | text |
---|---|---|
<http://dbpedia.org/resou rce/Digby_Morrell> ... |
Digby Morrell | digby morrell born 10 october 1979 is a former ... |
<http://dbpedia.org/resou rce/Alfred_J._Lewy> ... |
Alfred J. Lewy | alfred j lewy aka sandy lewy graduated from ... |
<http://dbpedia.org/resou rce/Harpdog_Brown> ... |
Harpdog Brown | harpdog brown is a singer and harmonica player who ... |
<http://dbpedia.org/resou rce/Franz_Rottensteiner> ... |
Franz Rottensteiner | franz rottensteiner born in waidmannsfeld lower ... |
<http://dbpedia.org/resou rce/G-Enka> ... |
G-Enka | henry krvits born 30 december 1974 in tallinn ... |
<http://dbpedia.org/resou rce/Sam_Henderson> ... |
Sam Henderson | sam henderson born october 18 1969 is an ... |
<http://dbpedia.org/resou rce/Aaron_LaCrate> ... |
Aaron LaCrate | aaron lacrate is an american music producer ... |
<http://dbpedia.org/resou rce/Trevor_Ferguson> ... |
Trevor Ferguson | trevor ferguson aka john farrow born 11 november ... |
<http://dbpedia.org/resou rce/Grant_Nelson> ... |
Grant Nelson | grant nelson born 27 april 1971 in london ... |
<http://dbpedia.org/resou rce/Cathy_Caruth> ... |
Cathy Caruth | cathy caruth born 1955 is frank h t rhodes ... |
len(people)
59071
people[1]
{'URI': '<http://dbpedia.org/resource/Alfred_J._Lewy>',
'name': 'Alfred J. Lewy',
'text': 'alfred j lewy aka sandy lewy graduated from university of chicago in 1973 after studying psychiatry pharmacology and ophthalmology he is a full professor and vicechair of the department of psychiatry at ohsu oregon health science university and holds an md and phd prior to moving to oregon in 1981 lewy was at the national institute of mental health nimh in bethesda maryland working with senior colleague thomas wehr in oregon he has worked closely with robert l sack as of december 2005 he had 94 publications available on pubmed he describes his research as follows my laboratory studies chronobiologic sleep and mood disorders these disorders include winter depression jet lag maladaptation to shift work and certain types of sleep disturbances relying on a very precise assay for plasma melatonin a hormone that has a clearly defined 24hour pattern of secretion biological rhythm disorders can be assessed and their treatment can be monitored current research is focused on developing bright light exposure and melatonin administration as treatment modalities for these disorders treatment must be precisely scheduled morning light exposure and evening melatonin administration cause circadian phaseadvance shifts evening light exposure and morning melatonin administration cause circadian phasedelay shifts totally blind individuals have 25hour circadian rhythms drifting an hour later each day unless they take a melatonin capsule at a certain time every day'}
Explore the datasets and checkout the text it contains
obama = people[people['name'] =='Barack Obama']
obama['text']
dtype: str
Rows: ?
['barack hussein obama ii brk husen bm born august 4 1961 is the 44th and current president of the united states and the first african american to hold the office born in honolulu hawaii obama is a graduate of columbia university and harvard law school where he served as president of the harvard law review he was a community organizer in chicago before earning his law degree he worked as a civil rights attorney and taught constitutional law at the university of chicago law school from 1992 to 2004 he served three terms representing the 13th district in the illinois senate from 1997 to 2004 running unsuccessfully for the united states house of representatives in 2000in 2004 obama received national attention during his campaign to represent illinois in the united states senate with his victory in the march democratic party primary his keynote address at the democratic national convention in july and his election to the senate in november he began his presidential campaign in 2007 and after a close primary campaign against hillary rodham clinton in 2008 he won sufficient delegates in the democratic party primaries to receive the presidential nomination he then defeated republican nominee john mccain in the general election and was inaugurated as president on january 20 2009 nine months after his election obama was named the 2009 nobel peace prize laureateduring his first two years in office obama signed into law economic stimulus legislation in response to the great recession in the form of the american recovery and reinvestment act of 2009 and the tax relief unemployment insurance reauthorization and job creation act of 2010 other major domestic initiatives in his first term included the patient protection and affordable care act often referred to as obamacare the doddfrank wall street reform and consumer protection act and the dont ask dont tell repeal act of 2010 in foreign policy obama ended us military involvement in the iraq war increased us troop levels in afghanistan signed the new start arms control treaty with russia ordered us military involvement in libya and ordered the military operation that resulted in the death of osama bin laden in january 2011 the republicans regained control of the house of representatives as the democratic party lost a total of 63 seats and after a lengthy debate over federal spending and whether or not to raise the nations debt limit obama signed the budget control act of 2011 and the american taxpayer relief act of 2012obama was reelected president in november 2012 defeating republican nominee mitt romney and was sworn in for a second term on january 20 2013 during his second term obama has promoted domestic policies related to gun control in response to the sandy hook elementary school shooting and has called for full equality for lgbt americans while his administration has filed briefs which urged the supreme court to strike down the defense of marriage act of 1996 and californias proposition 8 as unconstitutional in foreign policy obama ordered us military involvement in iraq in response to gains made by the islamic state in iraq after the 2011 withdrawal from iraq continued the process of ending us combat operations in afghanistan and has sought to normalize us relations with cuba', ... ]
clooney = people[people['name'] == 'George Clooney']
clooney['text']
dtype: str
Rows: ?
['george timothy clooney born may 6 1961 is an american actor writer producer director and activist he has received three golden globe awards for his work as an actor and two academy awards one for acting and the other for producingclooney made his acting debut on television in 1978 and later gained wide recognition in his role as dr doug ross on the longrunning medical drama er from 1994 to 1999 for which he received two emmy award nominations while working on er he began attracting a variety of leading roles in films including the superhero film batman robin 1997 and the crime comedy out of sight 1998 in which he first worked with a director who would become a longtime collaborator steven soderbergh in 1999 clooney took the lead role in three kings a wellreceived war satire set during the gulf warin 2001 clooneys fame widened with the release of his biggest commercial success the heist comedy oceans eleven the first of the film trilogy a remake of the 1960 film with frank sinatra as danny ocean he made his directorial debut a year later with the biographical thriller confessions of a dangerous mind and has since directed the drama good night and good luck 2005 the sports comedy leatherheads 2008 the political drama the ides of march 2011 and the comedydrama war film the monuments men 2014he won an academy award for best supporting actor for the middle east thriller syriana 2005 and subsequently earned best actor nominations for the legal thriller michael clayton 2007 the comedydrama up in the air 2009 and the drama the descendants 2011 in 2013 he received the academy award for best picture for producing the political thriller argo alongside ben affleck and grant heslov he is the only person ever to be nominated for academy awards in six categoriesclooney is sometimes described as one of the most handsome men in the world in 2005 tv guide ranked clooney no 1 on its 50 sexiest stars of all time list in 2009 he was included in times annual time 100 as one of the most influential people in the world clooney is also noted for his political activism and has served as one of the united nations messengers of peace since january 31 2008 his humanitarian work includes his advocacy of finding a resolution for the darfur conflict raising funds for the 2010 haiti earthquake 2004 tsunami and 911 victims and creating documentaries such as sand and sorrow to raise awareness about international crises he is also a member of the council on foreign relations', ... ]
Get the word count for obama article
obama['word_count'] = graphlab.text_analytics.count_words(obama['text'])
print obama['word_count']
[{'operations': 1, 'represent': 1, 'office': 2, 'unemployment': 1, 'doddfrank': 1, 'over': 1, 'unconstitutional': 1, 'domestic': 2, 'major': 1, 'years': 1, 'against': 1, 'proposition': 1, 'seats': 1, 'graduate': 1, 'debate': 1, 'before': 1, 'death': 1, '20': 2, 'taxpayer': 1, 'representing': 1, 'obamacare': 1, 'barack': 1, 'to': 14, '4': 1, 'policy': 2, '8': 1, 'he': 7, '2011': 3, '2010': 2, '2013': 1, '2012': 1, 'bin': 1, 'then': 1, 'his': 11, 'march': 1, 'gains': 1, 'cuba': 1, 'school': 3, '1992': 1, 'new': 1, 'not': 1, 'during': 2, 'ending': 1, 'continued': 1, 'presidential': 2, 'states': 3, 'husen': 1, 'osama': 1, 'californias': 1, 'equality': 1, 'prize': 1, 'lost': 1, 'made': 1, 'inaugurated': 1, 'january': 3, 'university': 2, 'rights': 1, 'july': 1, 'gun': 1, 'stimulus': 1, 'rodham': 1, 'troop': 1, 'withdrawal': 1, 'brk': 1, 'nine': 1, 'where': 1, 'referred': 1, 'affordable': 1, 'attorney': 1, 'on': 2, 'often': 1, 'senate': 3, 'regained': 1, 'national': 2, 'creation': 1, 'related': 1, 'hawaii': 1, 'born': 2, 'second': 2, 'defense': 1, 'election': 3, 'close': 1, 'operation': 1, 'insurance': 1, 'sandy': 1, 'afghanistan': 2, 'initiatives': 1, 'for': 4, 'reform': 1, 'house': 2, 'review': 1, 'representatives': 2, 'ended': 1, 'current': 1, 'state': 1, 'won': 1, 'limit': 1, 'victory': 1, 'unsuccessfully': 1, 'reauthorization': 1, 'keynote': 1, 'full': 1, 'patient': 1, 'august': 1, 'degree': 1, '44th': 1, 'bm': 1, 'mitt': 1, 'attention': 1, 'delegates': 1, 'lgbt': 1, 'job': 1, 'harvard': 2, 'term': 3, 'served': 2, 'ask': 1, 'november': 2, 'debt': 1, 'by': 1, 'wall': 1, 'care': 1, 'received': 1, 'great': 1, 'signed': 3, 'libya': 1, 'receive': 1, 'of': 18, 'months': 1, 'urged': 1, 'foreign': 2, 'american': 3, 'protection': 2, 'economic': 1, 'act': 8, 'military': 4, 'hussein': 1, 'or': 1, 'first': 3, 'control': 4, 'named': 1, 'clinton': 1, 'dont': 2, 'campaign': 3, 'russia': 1, 'civil': 1, 'reinvestment': 1, 'into': 1, 'address': 1, 'primary': 2, 'community': 1, 'mccain': 1, 'down': 1, 'hook': 1, '63': 1, 'americans': 1, 'elementary': 1, 'total': 1, 'earning': 1, 'repeal': 1, 'from': 3, 'raise': 1, 'district': 1, 'spending': 1, 'republican': 2, 'legislation': 1, 'three': 1, 'relations': 1, 'nobel': 1, 'start': 1, 'tell': 1, 'iraq': 4, 'convention': 1, 'resulted': 1, 'john': 1, 'was': 5, '2012obama': 1, 'form': 1, 'that': 1, 'tax': 1, 'sufficient': 1, 'republicans': 1, 'strike': 1, 'hillary': 1, 'street': 1, 'arms': 1, 'honolulu': 1, 'filed': 1, 'worked': 1, 'hold': 1, 'with': 3, 'obama': 9, 'ii': 1, 'has': 4, '1997': 1, '1996': 1, 'whether': 1, 'reelected': 1, 'budget': 1, 'us': 6, 'nations': 1, 'recession': 1, 'while': 1, 'taught': 1, 'marriage': 1, 'policies': 1, 'promoted': 1, 'called': 1, 'and': 21, 'supreme': 1, 'ordered': 3, 'nominee': 2, 'process': 1, '2000in': 1, 'is': 2, 'romney': 1, 'briefs': 1, 'defeated': 1, 'general': 1, '13th': 1, 'as': 6, 'at': 2, 'in': 30, 'sought': 1, 'organizer': 1, 'shooting': 1, 'increased': 1, 'normalize': 1, 'lengthy': 1, 'united': 3, 'court': 1, 'recovery': 1, 'laden': 1, 'laureateduring': 1, 'peace': 1, 'administration': 1, '1961': 1, 'illinois': 2, 'other': 1, 'which': 1, 'party': 3, 'primaries': 1, 'sworn': 1, 'relief': 2, 'war': 1, 'columbia': 1, 'combat': 1, 'after': 4, 'islamic': 1, 'running': 1, 'levels': 1, 'two': 1, 'involvement': 3, 'response': 3, 'included': 1, 'president': 4, 'law': 6, 'nomination': 1, '2008': 1, 'a': 7, '2009': 3, 'chicago': 2, 'constitutional': 1, 'defeating': 1, 'treaty': 1, 'federal': 1, '2007': 1, '2004': 3, 'african': 1, 'the': 40, 'democratic': 4, 'consumer': 1, 'began': 1, 'terms': 1}]
Sort the word count for obama article
obama_word_count_table = obama[['word_count']].stack('word_count',new_column_name = ['word','count'])
obama_word_count_table.head()
word | count |
---|---|
normalize | 1 |
sought | 1 |
combat | 1 |
continued | 1 |
unconstitutional | 1 |
8 | 1 |
californias | 1 |
1996 | 1 |
marriage | 1 |
defense | 1 |
obama_word_count_table.sort('count',ascending=False)
word | count |
---|---|
the | 40 |
in | 30 |
and | 21 |
of | 18 |
to | 14 |
his | 11 |
obama | 9 |
act | 8 |
a | 7 |
he | 7 |
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
Compute TF-IDF for the corpus
people['word_count'] = graphlab.text_analytics.count_words(people['text'])
people.head()
URI | name | text | word_count |
---|---|---|---|
<http://dbpedia.org/resou rce/Digby_Morrell> ... |
Digby Morrell | digby morrell born 10 october 1979 is a former ... |
{'since': 1, 'carltons': 1, 'being': 1, '2005' ... |
<http://dbpedia.org/resou rce/Alfred_J._Lewy> ... |
Alfred J. Lewy | alfred j lewy aka sandy lewy graduated from ... |
{'precise': 1, 'thomas': 1, 'closely': 1, ... |
<http://dbpedia.org/resou rce/Harpdog_Brown> ... |
Harpdog Brown | harpdog brown is a singer and harmonica player who ... |
{'just': 1, 'issued': 1, 'mainly': 1, 'nominat ... |
<http://dbpedia.org/resou rce/Franz_Rottensteiner> ... |
Franz Rottensteiner | franz rottensteiner born in waidmannsfeld lower ... |
{'all': 1, 'bauforschung': 1, ... |
<http://dbpedia.org/resou rce/G-Enka> ... |
G-Enka | henry krvits born 30 december 1974 in tallinn ... |
{'legendary': 1, 'gangstergenka': 1, ... |
<http://dbpedia.org/resou rce/Sam_Henderson> ... |
Sam Henderson | sam henderson born october 18 1969 is an ... |
{'now': 1, 'currently': 1, 'less': 1, 'being' ... |
<http://dbpedia.org/resou rce/Aaron_LaCrate> ... |
Aaron LaCrate | aaron lacrate is an american music producer ... |
{'exclusive': 2, 'producer': 1, 'tribe': ... |
<http://dbpedia.org/resou rce/Trevor_Ferguson> ... |
Trevor Ferguson | trevor ferguson aka john farrow born 11 november ... |
{'taxi': 1, 'salon': 1, 'gangs': 1, 'being': 1, ... |
<http://dbpedia.org/resou rce/Grant_Nelson> ... |
Grant Nelson | grant nelson born 27 april 1971 in london ... |
{'houston': 1, 'frankie': 1, 'labels': 1, ... |
<http://dbpedia.org/resou rce/Cathy_Caruth> ... |
Cathy Caruth | cathy caruth born 1955 is frank h t rhodes ... |
{'phenomenon': 1, 'deborash': 1, ... |
tfidf = graphlab.text_analytics.tf_idf(people['word_count'])
print tfidf.head()
[{'since': 1.455376717308041, 'carltons': 7.0744723837970485, 'being': 1.7938099524877322, '2005': 1.6425861253275964, '2008': 1.5093391374786154, 'coach': 5.444264118987054, 'its': 1.6875948402695313, 'before': 2.9935647453367427, 'australia': 2.86858644684204, '21': 2.797250863489293, 'northern': 3.310021742836038, 'bullants': 7.489987827758714, 'to': 0.23472468840899613, 'perth': 5.051601193605607, 'sydney': 3.5981675296480873, 'selection': 3.836578553093086, '2014': 2.2073995783446634, 'has': 0.428497539744039, '2011': 1.7023470901042919, '2013': 1.9545642372230505, 'division': 2.7906099979103978, 'his': 0.7878343656409719, 'was': 0.3968289280609173, 'rules': 3.8272034844276295, 'assistant': 2.5220702633476124, 'spanned': 5.531174273867493, 'early': 1.929422753652229, 'game': 2.4168995190159084, 'five': 2.2137301792754096, 'during': 1.3174651479035495, 'continued': 2.720588055069447, '44game': 9.887883100557085, 'cause': 4.8023464982877115, 'twice': 3.3301582227950113, 'round': 2.897933583948961, 'parade': 5.510031837293684, 'born': 0.268196273764765, 'clubs': 3.4464050690798693, 'college': 1.5613662703175555, 'blues': 4.064837205074066, 'for': 0.29145011737314763, 'falcons': 5.868501576808439, 'currently': 1.637088969126014, 'hill': 3.794313330511949, 'drawn': 4.96062941539988, 'kangaroos': 20.726873835958425, 'kicked': 5.142950972193835, 'exchange': 4.113331555012676, 'mckernan': 9.600201028105303, '19982000': 6.509158574746988, 'losing': 3.773463729390325, 'essendon': 6.016682089649193, 'along': 2.5088749729287803, 'teaches': 3.7712554104950966, 'by': 0.37455341206197373, 'box': 4.576320507259028, 'league': 8.342695137239867, 'career': 1.3050270203415668, 'of': 0.016624704796446097, 'against': 4.015921958283749, 'david': 2.4512658353228582, 'melbourne': 3.8914310119380633, 'digby': 8.347438059609935, 'games': 2.2331239682242914, 'leading': 5.061504320250359, 'traded': 3.952107459309691, 'first': 0.6956048713993103, 'goalkicker': 6.859361004180102, 'morrell': 46.88528738395547, 'corey': 6.486685718894929, 'acted': 4.137429106591736, 'football': 25.58570665567139, 'carlton': 10.682096983163913, 'from': 0.5875106759712689, 'district': 2.774469584601757, 'west': 5.205210090246754, 'hawks': 5.531174273867493, 'draft': 3.240194726993755, 'coburg': 7.851001173296044, 'forward': 3.8848194176057507, 'australian': 8.630007339620153, 'recruited': 4.4897203990393315, 'until': 1.7591012626831841, 'shifted': 5.634637255749128, 'club': 11.043158373020127, 'season': 5.6626008156972025, 'vflaffiliate': 7.431147327735781, 'western': 3.0880842964135953, 'with': 1.0123432126103034, 'former': 1.3573131120992086, 'he': 1.280211345952344, '10': 2.3157231098806563, 'october': 1.9182947844101343, 'third': 2.3506306680914584, 'footballer': 3.2388985505323085, 'victorian': 4.564873121418676, 'played': 4.636320601315451, '32': 4.3717697890214335, 'following': 1.9609195556941061, 'teague': 9.04058524016988, 'and': 0.002980575592194913, 'strathmore': 8.588600116426823, 'is': 0.02761625047551999, 'premier': 3.6766139034004075, 'delisted': 5.816011394187043, 'as': 0.2543390440248236, 'brisbaneafter': 10.986495389225194, 'at': 0.8612771466165147, 'in': 0.01255028255157884, 'education': 2.4487155642005685, 'physical': 4.001779069106929, 'afl': 4.70049729471633, 'end': 4.839120211828286, 'premiership': 4.5439552227569955, 'retiring': 3.8140708121003493, 'edflhe': 10.986495389225194, 'also': 0.4627270916162349, 'other': 1.4424007566948476, 'rookie': 4.104057918227347, 'play': 2.270287418073342, 'coached': 3.9925624140020046, 'who': 0.9098952189804214, 'senior': 6.897159666747814, 'a': 0.022476737890332586, 'vfl': 5.253154112327449, '1979': 2.6032908378122737, 'age': 2.138848033513307, '2002': 1.8753125887822302, '2003': 1.8013702663900752, '2000': 1.8763068991994527, '2001': 1.9280249665871378, '2006': 1.520737905384506, '2007': 1.4879730697555795, 'time': 1.3253342074200498, 'the': 0.0027426017494956603, '2009': 1.5644364836042695, 'aflfrom': 10.986495389225194, 'playing': 4.182005515547123, 'goals': 3.4581636225179473}, {'precise': 6.44320060695519, 'thomas': 3.3202734635624696, 'closely': 4.591233791109745, 'disturbances': 7.808441558877249, 'wehr': 10.293348208665249, 'bright': 5.602000326436105, 'sleep': 11.773257922801992, 'nimh': 8.588600116426823, 'monitored': 7.460134864609033, 'disorders': 23.746557527902628, 'had': 1.177428412308558, 'to': 0.1408348130453977, 'chronobiologic': 10.986495389225194, 'treatment': 13.071396407363236, 'rhythm': 4.985080511264044, 'maryland': 4.251903729252247, 'include': 2.174545211685214, 'plasma': 7.0744723837970485, 'scheduled': 4.870603263742161, 'hormone': 7.489987827758714, 'assay': 8.683910296231149, 'pharmacology': 6.908957945319475, 'very': 2.811792506755733, 'ophthalmology': 8.153282045168979, 'every': 3.306319748788604, 'they': 1.8993401178193898, 'day': 5.322863391187994, 'follows': 5.421974981902501, 'university': 1.6946860096423695, 'l': 3.969885705330975, 'morning': 7.646645996757104, 'winter': 4.011081461769242, 'individuals': 4.577966598165696, 'each': 3.168869946171825, 'biological': 5.041074780618619, 'depression': 5.862531409821935, 'work': 1.3993637697254548, 'mental': 4.694926249666874, 'national': 1.1860931647723914, 'circadian': 25.765800349280468, 'certain': 9.82690171024958, 'laboratory': 4.290696472166703, '24hour': 6.669007275688884, 'sandy': 5.493433945884646, 'mood': 6.592046234552756, 'for': 0.14572505868657382, 'chicago': 2.9291179006672023, 'pattern': 6.158181651922893, 'robert': 2.802818806604537, 'research': 4.395194159760633, 'current': 2.8308461188591933, 'health': 5.989129738745434, 'cause': 9.604692996575423, 'publications': 3.253249742695399, 'available': 3.9468350393631186, 'be': 4.218744013624684, 'full': 3.003055326218652, 'drifting': 7.489987827758714, 'focused': 3.684673047087262, 'graduated': 2.2579073935292953, 'relying': 7.179832899454874, 'lag': 8.588600116426823, 'on': 0.5532382600518011, 'rhythms': 6.166213823620158, 'jet': 5.868501576808439, 'institute': 2.1793227663084926, 'of': 0.02909323339378067, 'prior': 2.956736868784372, 'aka': 4.654993539331503, 'studies': 2.3430220318986272, 'blind': 5.522663584199584, 'lewy': 28.131172430373283, 'modalities': 8.588600116426823, 'phasedelay': 10.986495389225194, 'sack': 6.9975113426609195, 'precisely': 7.0162034756730725, 'he': 0.853474230634896, 'from': 0.29375533798563447, 'working': 2.074561053063768, '1981': 2.5978176200443843, 'their': 1.5201958369931787, 'describes': 4.564873121418676, 'alfred': 4.8596262051110095, 'secretion': 9.19473591999714, 'was': 0.07936578561218346, 'phaseadvance': 10.986495389225194, 'that': 0.6614069466714981, 'oregon': 14.163582529462454, 'worked': 1.553891853362109, 'with': 0.40493728504412135, '94': 5.89889905399281, 'must': 4.665727094974612, 'md': 4.75012579902149, 'has': 0.856995079488078, 'hour': 4.467348101284799, 'developing': 3.7973276508048714, 'these': 5.1951804362209755, 'science': 2.3447863231113892, 'j': 3.2558813231614545, 'evening': 8.825630444529097, 'can': 5.600061919606208, 'shift': 5.437419304329975, 'my': 2.903166780438819, '25hour': 10.986495389225194, 'and': 0.007451438980487283, 'colleague': 5.103173000736915, 'his': 0.2626114552136573, 'defined': 5.629909114553182, 'december': 2.001425829579395, 'is': 0.05523250095103998, 'an': 0.5964781781637942, 'as': 0.3815085660372354, 'pubmed': 9.600201028105303, 'at': 0.645957859962386, 'have': 1.4416847832984716, 'in': 0.003861625400485797, 'melatonin': 49.43941550278542, 'vicechair': 5.8565966743021205, 'clearly': 5.82170941530168, 'ohsu': 9.887883100557085, 'studying': 3.612493529875034, 'holds': 2.8415259721373194, 'administration': 9.885885875106995, 'shifts': 13.784301654006187, 'take': 3.112517009620693, 'bethesda': 6.7380001471758355, '2005': 1.6425861253275964, 'department': 2.3398541306220704, 'unless': 6.411784410721811, 'maladaptation': 10.986495389225194, 'after': 0.9443334420013064, 'phd': 2.5603215961961254, 'moving': 3.1908488528906003, 'assessed': 6.9975113426609195, 'totally': 5.810345656651365, 'senior': 2.2990532222492712, 'types': 5.620519374203343, 'exposure': 16.443491559878495, 'a': 0.03371510683549888, 'light': 11.222520964891977, 'professor': 2.010865204934687, 'later': 1.4294496043477696, '1973': 2.808137223619358, 'capsule': 7.895452935866879, 'time': 1.3253342074200498, 'psychiatry': 11.737003153616879, 'the': 0.00020315568514782669}, {'just': 2.7007299687108643, 'issued': 4.429717033067152, 'mainly': 3.841299254228023, 'nominated': 2.8896779186528754, 'years': 1.0752380994247055, 'leads': 4.9486244693030566, 'tours': 4.057957571060529, 'teamed': 4.91345085512479, 'cds': 4.676577110998678, 'broadcaster': 4.608069205573607, 'harmonica': 13.04117454114122, 'to': 0.04694493768179923, 'voted': 3.922591427753126, '2014': 4.414799156689327, 'piano': 3.6534723748387132, 'has': 0.856995079488078, 'thousands': 4.676577110998678, '2010': 1.5928339601219734, 'brown': 7.309560838997457, 'his': 0.7878343656409719, 'big': 2.940266288149817, 'band': 7.816740648304755, 'hamilton': 4.586237943916373, 'they': 1.8993401178193898, 'association': 2.1546374540272892, 'northwestern': 4.9042764788487485, 'performing': 3.1247683114012137, 'comprising': 6.2679965179301, 'arthur': 4.274754994169014, 'miles': 4.658558605495999, 'either': 4.495771854722687, 'release': 2.9982917922026187, 'where': 1.089076212090673, 'j': 3.2558813231614545, 'honored': 4.140615513961145, 'their': 1.5201958369931787, 'society': 2.4448047262085693, 'year': 1.3423616371539895, 'home': 4.842167251076965, 'portland': 4.788016672732886, 'best': 6.134404289659914, 'harp': 6.361522575940923, 'canada': 5.37143685674748, 'what': 5.638286804338248, 'for': 0.29145011737314763, 'chicago': 5.858235801334405, 'since': 1.455376717308041, 'extensively': 3.7786355177927193, 'won': 1.3836400683164753, 'tens': 6.411784410721811, 'hailing': 7.157853992736099, 'full': 3.003055326218652, 'small': 3.140296573727769, 'active': 2.7479584590534256, 'from': 0.5875106759712689, 'by': 0.37455341206197373, 'on': 0.18441275335060037, 'dates': 5.597423659408693, 'holger': 8.153282045168979, 'influential': 3.738702807457348, 'of': 0.012468528597334574, 'duo': 4.3375108392004185, 'or': 1.9128915408224825, 'blind': 5.522663584199584, 'canadas': 4.660345916070095, 'scene': 3.553420040326614, 'venues': 4.3362163406377725, 'cd': 3.769051957528662, 'germanyover': 10.986495389225194, 'festivals': 3.7508762481584443, 'maple': 5.868501576808439, '1982': 2.559664637889348, 'guest': 3.134445181959305, 'working': 2.074561053063768, 'gutsy': 8.789270811888976, 'muddy': 7.728398851203712, 'been': 0.9774773354796025, 'awarded': 2.354189390708452, 'harpdog': 21.972990778450388, 'few': 6.002022064982743, 'time': 1.3253342074200498, 'was': 0.31746314244873386, 'naturally': 6.52058727057061, 'juggles': 9.887883100557085, 'life': 2.1907617832744593, 'that': 0.6614069466714981, 'club': 2.2086316746040255, 'award': 3.2644556968847374, 'released': 2.0078348995855078, 'with': 0.8098745700882427, 'he': 1.06684278829362, 'toronto': 3.3555488083347353, 'combos': 8.278445188122983, '1995': 2.222129668695386, 'canadian': 5.533799870096228, 'up': 1.5635467678501844, 'promoter': 5.6393878585077255, 'edmonds': 7.61919955923872, 'recording': 2.9764678607434605, 'while': 1.8364359481339414, 'crossed': 5.760748715511993, 'many': 1.639964662798746, 'petersen': 7.297615935111258, 'called': 2.0784770664403074, 'and': 0.0052160072863410975, 'seven': 2.7917137507818355, 'classic': 3.635337162794501, 'is': 0.19331375332863993, 'it': 3.9497417504814463, 'covers': 4.5407755698396155, 'states': 1.824400329877006, 'player': 4.26880525149613, 'as': 0.2543390440248236, 'in': 0.006757844450850144, 'graham': 4.536024967081018, 'blues': 36.58353484566659, 'mid1950s': 7.225295273531632, 'united': 1.5679220786705559, 'lemon': 7.035251670643767, 'guitarist': 3.658714850803563, '1': 2.0978765819243166, 'also': 0.4627270916162349, 'vancouver': 4.219152263959802, 'which': 0.7674309670437692, 'browns': 5.4811638532928315, 'nw': 8.683910296231149, 'album': 2.4512658353228582, 'electric': 4.464402591055042, 'juno': 6.04485296661589, 'who': 0.9098952189804214, 'membership': 4.6783969477156635, 'oregon': 4.721194176487485, 'a': 0.039334291308082026, 'singer': 2.7818235602743835, 'bluesgospel': 10.986495389225194, 'well': 1.5295293417875981, 'surveybrown': 10.986495389225194, 'original': 2.9143400810369444, 'traditional': 3.6283016364921616, 'cascade': 7.808441558877249, 'albumhe': 8.042056410058754, 'the': 0.00121893411088696, 'playing': 2.0910027577735617, 'songs': 2.9508926963066124}, {'all': 1.6431112434912472, 'bauforschung': 10.986495389225194, 'just': 2.7007299687108643, 'kurdlawitzpreis': 10.986495389225194, 'german': 6.505623636898587, 'ending': 4.547145018125096, 'produced': 5.023167180131126, 'including': 1.2272824458461182, 'austrian': 5.114377599749778, 'yet': 4.092839034622559, 'hundredth': 8.278445188122983, 'merkur': 10.293348208665249, '1942': 4.106111307039189, 'producing': 3.830318751744579, 'dick': 9.155933196331391, '28': 3.0552106276993034, 'start': 3.281232914358869, 'lower': 4.515695885442592, 'praised': 4.66933070247791, 'listing': 6.332535039067671, 'stanisaw': 7.851001173296044, 'has': 1.285492619232117, 'silverberg': 9.887883100557085, 'trnaslations': 10.986495389225194, 'jeanpierre': 6.532148092971687, 'klein': 5.930249583876886, 'was': 0.2380973568365504, 'translations': 5.570394987020774, 'matters': 4.606372852325429, 'ones': 4.746219544054425, 'wellss': 9.600201028105303, 'negative': 5.348140719891449, 'heinlein': 9.04058524016988, 'years': 3.2257142982741165, 'him': 1.5755843267871936, 'association': 2.1546374540272892, 'brought': 6.700451571691642, 'olaf': 7.115294378317303, 'january': 1.885412003185961, 'university': 0.8473430048211847, 'journalism': 3.7712554104950966, 'herbert': 5.227593615347914, 'book': 4.005609415785018, '1998': 2.0687826320938068, 'served': 1.5362723499305253, 'european': 2.590340526186013, 'zelazny': 10.293348208665249, 'view': 4.2214564124446525, 'reference': 4.67113738770286, 'describedroger': 10.986495389225194, 'series': 1.877080935838972, 'some': 6.592334027112441, 'rottensteiner': 32.95948616767558, 'born': 0.268196273764765, 'books': 4.497764704832687, 'fifty': 5.11719847609142, 'are': 1.7719638126305435, 'die': 4.504918259948763, 'phantastische': 10.986495389225194, 'close': 3.5416621153330006, 'special': 2.7022435916032777, 'phantastischen': 10.986495389225194, 'what': 2.819143402169124, 'total': 3.2767385247710297, 'abe': 6.943444121390644, 'cordwainer': 9.887883100557085, 'strugatski': 10.986495389225194, 'robert': 2.802818806604537, 'since': 1.455376717308041, 'nonwriter': 10.986495389225194, '18': 2.726778428203672, 'critical': 3.480453210707072, 'stapledon': 10.986495389225194, 'new': 1.7743065312250548, 'receiving': 3.73515040585298, 'numerous': 2.4220370053418425, 'public': 2.029113653642407, 'achievements': 4.48670834856934, 'edited': 3.7890600351286032, 'comparable': 7.1363477875151355, 'romane': 9.600201028105303, 'banal': 8.213906666985412, 'journal': 6.050947846683648, 'republished': 6.323056295113127, 'however': 2.41879921563585, 'lem': 35.62821539018143, 'betterknown': 7.941972951501771, 'york': 1.701047544762206, 'advisor': 3.6213152632041816, 'franz': 5.561545371743792, 'from': 0.5875106759712689, 'philip': 8.743539578042867, 'fifteen': 4.660345916070095, 'publisher': 4.286995049063516, 'about': 3.8530257976751474, 'works': 6.643049849940285, 'working': 2.074561053063768, 'language': 3.4581636225179473, 'asimov': 8.347438059609935, 'of': 0.08312352398223048, 'k': 9.00677607553599, 'sterreichisches': 10.986495389225194, 'through': 2.028984878933582, 'controversy': 4.290696472166703, 'his': 1.3130572760682866, 'und': 5.3270131734655735, 'w': 7.305637987135021, 'barry': 4.5126046928729195, 'promotion': 3.8914310119380633, 'introduced': 3.8326615876463515, 'or': 1.9128915408224825, 'fields': 4.109199317727766, 'franke': 8.907053847545358, 'smith': 3.6437161998933485, 'into': 1.6050629424066056, 'number': 6.3690632491627195, 'one': 0.9309307338087167, 'karel': 7.202305755306933, 'fantasy': 4.67113738770286, 'austria': 8.964414431377097, 'kobo': 9.19473591999714, 'another': 2.603062187988481, 'bibliothek': 10.293348208665249, 'aldiss': 9.600201028105303, 'awarded': 2.354189390708452, 'institut': 5.713495830661447, 'illustrated': 4.57961540315588, '1980': 2.5468314003181947, 'to': 0.1408348130453977, 'addition': 2.6415150204546234, 'leading': 5.061504320250359, 'there': 2.3005725945344695, 'three': 1.4915025293575952, 'been': 0.9774773354796025, 'fiction': 25.487913841658276, 'continental': 5.1965352183279405, 'quarber': 10.986495389225194, 'andersonare': 10.986495389225194, 'editor': 5.484848237858817, 'h': 3.9468350393631186, 'hundred': 4.639106179569184, 'assessment': 5.380693322929197, 'until': 1.7591012626831841, 'vienna': 8.899607583267779, 'seriesrottensteiner': 10.986495389225194, 'both': 1.6730570592454443, 'factor': 5.059569363254783, 'andrevonthe': 10.986495389225194, 'authors': 12.962435014328356, 'fantasticrottensteiner': 10.986495389225194, 'highest': 3.43846641929018, 'with': 0.6074059275661821, 'hg': 8.042056410058754, 'he': 1.280211345952344, 'apek': 9.887883100557085, '1995': 2.222129668695386, 'g': 4.262662948403985, 'this': 1.2818856957987381, 'science': 14.068717938668335, 'work': 1.3993637697254548, 'as': 0.8901866540868826, 'us': 1.9319904488071395, 'n': 4.282081034261087, 'lesser': 6.453895896071939, 'suhrkamp': 9.377057476791094, 'travesties': 9.600201028105303, 'doctorate': 3.6866980224670334, 'history': 2.1447579606246094, 'and': 0.009686870674633467, 'nesvadba': 10.986495389225194, 'seven': 2.7917137507818355, 'studied': 2.2933341149871773, 'figure': 3.950346895474658, 'stated': 3.7501560464708503, '1973': 2.808137223619358, 'is': 0.08284875142655997, 'year': 1.3423616371539895, 'it': 1.3165805834938153, 'brothersin': 8.907053847545358, 'an': 0.5964781781637942, 'critic': 3.842877786521073, 'eighteen': 5.377023594040234, 'at': 0.43063857330825733, 'in': 0.01158487620145739, '19791985': 8.347438059609935, 'saw': 3.4321605654994465, 'united': 1.5679220786705559, '1969': 2.8490995591685433, 'anthology': 4.558390116540598, 'latter': 3.9477118478366533, 'american': 1.1273777844250068, 'waidmannsfeld': 10.986495389225194, '1963': 3.2103799121264522, 'anthologies': 5.030658019760364, 'brian': 3.746562797904725, 'writers': 7.088005332861507, 'recognition': 3.6213152632041816, 'which': 0.7674309670437692, 'provoked': 6.696035948076803, 'out': 3.696806362913271, 'stanislaw': 17.003177478874388, 'malzberg': 9.887883100557085, 'statesrottensteiner': 10.986495389225194, 'fr': 5.154612911941677, '2004': 1.6903443608359008, 'verlags': 10.293348208665249, 'gerard': 5.7770092363837735, 'josef': 6.2329051981188295, 'wells': 5.410546286078878, 'produce': 4.200907744217265, 'two': 1.0988831858473562, 'librarian': 6.323056295113127, 'on': 0.7376510134024015, 'a': 0.039334291308082026, 'englishreading': 10.293348208665249, 'for': 0.14572505868657382, '1978': 2.6743602815767815, '1975': 5.524663753174666, 'well': 3.0590586835751963, 'shore': 5.644161137260383, 'greatest': 3.7530399706037554, 'volumes': 9.813124388259208, 'english': 2.239461125047026, 'occasion': 5.020348650101503, 'the': 0.0022347125366260936, 'sf': 12.782751078181208, 'typical': 5.839000912411741}, {'legendary': 4.280856294365192, 'gangstergenka': 10.986495389225194, 'legendaarne': 32.95948616767558, 'oja': 18.38947183999428, 'paul': 5.548396501990852, 'group': 1.9855189451548496, '23': 2.9691878815366133, 'had': 1.177428412308558, 'better': 3.445873860568042, 'real': 3.3707043171893614, 'his': 0.7878343656409719, 'dj': 8.527731188739493, 'big': 2.940266288149817, 'leegion': 10.986495389225194, 'famous': 9.645019887322736, 'were': 4.565934070063887, 'they': 1.8993401178193898, 'during': 1.3174651479035495, 'went': 2.0519085188355186, '1996': 2.135691193468776, 'rapper': 5.11719847609142, '50': 3.3613882409862943, '18': 2.726778428203672, 'rapped': 8.907053847545358, 'back': 2.443829401835926, 'joined': 1.9082017981693435, 'born': 0.268196273764765, 'second': 1.6724258314865346, 'palm': 5.272762583715825, 'year': 2.684723274307979, 'schoolmate': 8.278445188122983, 'album': 14.70759501193715, 'genkas': 10.986495389225194, 'estonian': 18.597010939329444, 'for': 0.14572505868657382, 'tallinn': 19.460057156684787, 'new': 0.8871532656125274, 'droopy': 10.293348208665249, 'europe': 2.7615279103106105, 'solo': 2.838628259301248, 'rapping': 7.157853992736099, 'homophobes': 10.293348208665249, 'henry': 3.953871128197188, 'studio': 3.0810537401649083, 'along': 5.0177499458575605, 'by': 0.37455341206197373, 'on': 0.36882550670120073, 'island': 3.4501314508206833, '30': 2.6266580085851636, 'tour': 5.697029869546111, 'arhm': 32.95948616767558, 'first': 0.6956048713993103, 'own': 4.539919313361398, 'armchair': 8.09612363132903, 'tugitooli': 10.986495389225194, 'producergenka': 10.986495389225194, 'another': 2.603062187988481, 'bankruptcy': 5.760748715511993, 'from': 0.5875106759712689, 'vacation': 6.411784410721811, 'cent': 6.174311033852777, 'next': 2.670439668860552, 'their': 6.080783347972715, 'of': 0.004156176199111524, 'was': 0.31746314244873386, 'toe': 29.840539458436133, 'started': 10.553246057432982, 'company': 2.195009362476192, 'released': 6.023504698756524, 'kuhnja': 10.986495389225194, 'known': 1.4457727152652031, 'with': 0.40493728504412135, 'pankrot': 10.986495389225194, 'warmup': 7.202305755306933, 'performers': 4.844457983637838, 'this': 1.2818856957987381, '1998': 2.0687826320938068, 'gangster': 6.629786562535602, 'record': 4.3057738130686065, 'genka': 54.93247694612597, 'making': 2.7223896254962385, 'called': 2.0784770664403074, 'and': 0.007451438980487283, 'december': 2.001425829579395, 'is': 0.02761625047551999, 'an': 0.2982390890818971, 'curtis': 5.299520032885375, 'tag': 23.042994862047973, 'palmisaar': 10.986495389225194, 'rap': 5.279385124476319, 'in': 0.006757844450850144, 'hit': 2.9751402800639086, 'revo': 19.77576620111417, 'same': 2.2492032766809724, 'also': 0.9254541832324698, 'which': 2.302292901131308, 'promo': 7.157853992736099, 'who': 0.9098952189804214, 'most': 4.255861328695191, 'records': 2.678542843954175, 'a': 0.011238368945166293, 'stagename': 9.377057476791094, 'band': 2.605580216101585, '1974': 2.77067759739274, 'together': 5.684793851773341, '2001': 1.9280249665871378, '2006': 1.520737905384506, '2007': 1.4879730697555795, '2004': 1.6903443608359008, 'the': 0.0008126227405913067, 'songs': 14.754463481533062, 'krvits': 10.293348208665249, 'came': 2.4364478609380096, 'consisted': 5.477107052597217}, {'now': 1.96695239252401, 'currently': 1.637088969126014, 'less': 3.9574078250755322, 'being': 1.7938099524877322, 'nominated': 2.8896779186528754, 'known': 1.4457727152652031, 'xeroxed': 10.293348208665249, 'niblit': 10.986495389225194, 'alternative': 4.208848795590078, 'captain': 3.6799639902856893, 'weekly': 3.608736480997322, 'toilets': 8.588600116426823, 'awardshis': 8.042056410058754, 'but': 1.313994565464302, 'earned': 2.3840423538581335, 'writer': 2.511166401907656, 'character': 3.5492890223539018, 'woodstock': 12.995718038986109, 'writing': 2.61409678271219, 'expert': 3.781602879020521, 'program': 2.393082171897548, '4': 2.437803530749586, 'has': 0.856995079488078, '2010': 1.5928339601219734, 'henderson': 16.819794107054207, 'his': 0.7878343656409719, 'march': 1.9573172463231197, 'penises': 10.986495389225194, 'scratchings': 10.986495389225194, 'scene': 3.553420040326614, 'whistle': 26.7297211840841, 'simmons': 6.111298066024043, 'every': 3.306319748788604, 'not': 1.5880170751336171, 'hendersons': 24.12616923017626, 'him': 1.5755843267871936, 'bigger': 12.268930250611154, 'school': 2.4455745584347035, 'storyboard': 7.37557747658097, 'magic': 17.995245482962336, 'bobby': 4.460000529654404, 'wordless': 8.347438059609935, '2010in': 5.602000326436105, 'nickelodeon': 6.250296940830698, 'red': 3.274498882177525, 'squarepants': 7.941972951501771, 'where': 1.089076212090673, 'wrote': 2.545535503808546, 'spongebob': 7.941972951501771, 'series': 3.754161871677944, 'directing': 4.3544936118295645, 'born': 0.53639254752953, 'college': 3.122732540635111, 'are': 1.7719638126305435, 'year': 1.3423616371539895, 'weiss': 6.555678590381881, 'best': 1.5336010724149785, 'out': 3.696806362913271, 'graduating': 3.3840940535593758, 'humor': 5.266183612617783, 'for': 0.29145011737314763, 'harvey': 4.990043300606173, 'review': 3.227734845067532, 'since': 1.455376717308041, 'historyhenderson': 10.986495389225194, 'ended': 3.3430124821479934, 'contained': 5.369724291558623, 'crude': 7.808441558877249, 'new': 3.5486130624501095, '5009': 32.95948616767558, 'bestknown': 5.497557663068508, 'attended': 2.2859811406819186, 'yorks': 5.094851177399423, 'city': 1.7402088904755424, 'journal': 3.025473923341824, 'arts': 2.183972890940773, 'drew': 8.839645918843907, 'busy': 5.744748374165552, 'awards': 2.283985132035204, 'york': 3.402095089524412, 'starring': 3.6295771468691735, 'graduated': 2.2579073935292953, 'earliest': 5.211943843680786, 'by': 0.37455341206197373, 'on': 0.36882550670120073, 'of': 0.02078088099555762, 'larger': 5.078412451056264, 'programming': 4.613175599648182, 'blobby': 10.986495389225194, 'american': 2.2547555688500136, 'published': 5.775009341956282, 'yorksince': 9.600201028105303, 'hometown': 4.3414044197195505, 'number': 2.12302108305424, 'one': 0.9309307338087167, 'magazines': 3.475517637211099, '1988': 2.4491074905234376, 'strip': 10.552136743700649, 'comic': 21.99472687200199, '1987': 2.4836039825198166, 'little': 3.363342320748292, 'from': 0.29375533798563447, '1980': 2.5468314003181947, 'expressnews': 8.907053847545358, 'boiceville': 10.293348208665249, 'due': 2.7324867986603505, 'been': 1.954954670959205, 'spaz': 10.293348208665249, 'their': 1.5201958369931787, 'asses': 9.600201028105303, 'longestrunning': 6.843360662833661, 'featuring': 3.5737313717986314, 'was': 0.3968289280609173, 'friend': 3.6606878866294212, 'that': 0.6614069466714981, 'some': 2.1974446757041473, 'award': 1.6322278484423687, 'visual': 3.988899406243268, 'lives': 2.532029027717262, 'midtolate': 7.990763115671204, 'with': 0.20246864252206068, 'than': 2.0650378102773113, 'he': 1.493579903611068, '1980s': 2.9688582293167167, '1993': 4.6065588279690095, 'october': 1.9182947844101343, 'hour': 4.467348101284799, 'comics': 10.25712446948347, '18': 2.726778428203672, 'work': 1.3993637697254548, 'cartoonist': 5.673289410183407, 'onteora': 10.986495389225194, '1991henderson': 10.986495389225194, 'animated': 4.556775911186057, 'called': 2.0784770664403074, 'and': 0.0052160072863410975, 'emmy': 4.226080698141766, 'san': 2.591243868614199, 'sam': 4.313197421457541, 'is': 0.05523250095103998, 'turned': 3.394129260705398, 'it': 1.3165805834938153, 'an': 0.2982390890818971, 'high': 1.906719387270128, 'heard': 4.072758038565509, 'as': 0.2543390440248236, 'minicomics': 19.200402056210606, 'in': 0.015446501601943188, 'monroe': 5.787998357959369, 'pink': 5.3200687011127625, '1969': 2.8490995591685433, 'began': 3.454920191420506, 'magazine': 2.456583425401182, 'sic': 6.812108119329557, 'also': 0.4627270916162349, '55th': 7.157853992736099, '19992004in': 10.293348208665249, 'which': 0.7674309670437692, 'comedy': 3.4937350883028153, 'junk': 7.489987827758714, 'collection': 3.2771870558393252, 'bear': 5.14005661416747, 'nomination': 3.7697859025157365, 'man': 2.863827365878787, 'a': 0.039334291308082026, 'special': 2.7022435916032777, 'antonio': 4.383907497035858, '2003': 1.8013702663900752, 'funniest': 7.272923322520886, 'title': 2.5500785078362447, 'the': 0.0010157784257391334, 'selfpublishing': 15.883945903003543, 'reprinted': 6.103693466638823}, {'exclusive': 10.455187230695827, 'producer': 2.6157162162644934, 'tribe': 5.905091024240731, 'being': 1.7938099524877322, 'developed': 3.092923315720292, 'rascals': 7.990763115671204, 'designer': 7.809573606239239, 'produced': 7.534750770196689, 'including': 1.2272824458461182, 'crack': 6.497859019493054, 'classic': 3.635337162794501, 'kanye': 7.054669756500869, 'specialedition': 10.293348208665249, 'york': 1.701047544762206, 'jacobs': 5.799109583384439, 'based': 2.299897032948229, 'jay': 4.577966598165696, 'paris': 3.190026146139137, 'mcs': 7.272923322520886, 'observe': 6.752388884627935, 'show': 4.3378027058988025, 'credited': 7.642003828328697, 'young': 2.4099018545275093, 'created': 2.7752840274321713, 'to': 0.1877797507271969, 'aaron': 24.86390116591196, 'vital': 5.570394987020774, 'black': 3.1224597301527437, 'fade': 7.248825770941826, 'prison': 3.8094766233152946, 'experimenting': 6.543844132734878, 'maryland': 4.251903729252247, 'has': 2.142487698720195, 'local': 2.5829189245959245, 'over': 1.4878231559557336, 'first': 4.869234099795172, 'started': 2.1106492114865967, 'his': 1.3130572760682866, 'dj': 12.791596783109238, 'popularizing': 7.322933743095548, 'wide': 3.5647195955805473, 'nightclubs': 6.629786562535602, 'soulja': 8.683910296231149, 'successful': 2.679282762596886, 'early': 1.929422753652229, 'records': 5.35708568790835, 'verb': 8.153282045168979, 'spank': 9.377057476791094, 'basement': 6.313666554763288, 'report': 3.7697859025157365, 'world': 1.370623244696304, 'cool': 5.6158573610975315, 'clan': 6.207371896113665, 'knock': 6.709829270209139, 'cable': 4.92971137599657, 'success': 2.881489851677948, 'artist': 2.4318137533979653, 'university': 0.8473430048211847, 'mndr': 10.293348208665249, 'jayz': 7.115294378317303, 'this': 1.2818856957987381, 'quest': 5.292763250422495, 'diy': 6.926052378678775, 'reasonable': 6.464706812176154, 'rock': 2.943474503926912, 'cashmere': 8.421546031763658, 'x': 4.523465932304524, 'athletics': 12.47334995275053, 'amanda': 5.683190481166118, 'em': 6.0738405034891425, 'entourage': 7.489987827758714, 'lucien': 7.348909229498808, 'series': 3.754161871677944, 'artists': 2.725227238647546, 'colette': 7.585298007563039, 'born': 0.268196273764765, 'doolittleat': 10.986495389225194, 'productions': 3.482654642526243, 'doubt': 5.76613956414687, 'recently': 2.6417526348076485, 'cocacola': 6.543844132734878, 'album': 2.4512658353228582, 'wire': 17.29841869244061, 'ricks': 8.09612363132903, 'for': 0.43717517605972145, 'eliza': 14.497651541883652, 'boy': 4.012016478200149, 'rascal': 15.702002346592089, 'slick': 7.202305755306933, 'label': 6.603422891404819, 'collaborated': 7.523483966914446, 'performanceslacrate': 10.986495389225194, 'jayzs': 8.347438059609935, 'lacrate': 131.8379446707023, 'milkcrate': 65.91897233535116, 'new': 0.8871532656125274, 'ever': 2.956736868784372, 'sold': 3.3511915029657793, 'shows': 2.8354504435401706, 'videogame': 7.654290879049991, 'hero': 4.915757661222704, 'outkast': 8.907053847545358, 'syracuse': 5.114377599749778, 'tour': 2.8485149347730556, 'interning': 8.421546031763658, 'dazed': 7.552508184740048, 'ferrariwhich': 10.986495389225194, 'highlandtown': 10.986495389225194, 'youngest': 3.825649482560895, 'attending': 4.0347232248262825, 'tracks': 4.181880869162571, 'studio': 3.0810537401649083, 'vinyl': 5.816011394187043, 'starring': 3.6295771468691735, 'tshirtin': 10.986495389225194, 'baltimores': 7.808441558877249, 'by': 0.37455341206197373, 'rocafella': 8.789270811888976, 'on': 0.5532382600518011, 'hbo': 20.8852171457614, 'kids': 4.521907085535233, 'launch': 4.439709978464671, 'of': 0.016624704796446097, 'rogenrecently': 10.986495389225194, 'lacrates': 10.986495389225194, 'nightclub': 6.00976164680462, 'prior': 2.956736868784372, 'american': 1.1273777844250068, 'doolittle': 8.213906666985412, 'soundtrack': 4.507985747016625, 'recordings': 3.6580580363300323, 'where': 2.178152424181346, 'djing': 6.843360662833661, 'streamz': 10.986495389225194, 'road': 3.567114806306502, 'king': 3.452801679376562, 'and': 0.01564802185902329, 'major': 4.116240258743127, 'via': 4.163209266869507, 'features': 3.7830898681420995, 'scene': 3.553420040326614, 'pop': 3.5913878426627086, 'highly': 3.573128053529954, 'gorillaz': 8.347438059609935, 'featured': 2.542518260140216, 'called': 2.0784770664403074, 'delicious': 7.61919955923872, 'blank': 6.682430296021025, 'custom': 5.688178022677158, 'platinum': 4.906562194129604, 'city': 1.7402088904755424, 'from': 0.5875106759712689, 'working': 2.074561053063768, 'marc': 4.696779818316197, 'bmore': 32.95948616767558, 'west': 2.602605045123377, 'create': 3.7071765538105743, 'support': 3.0010110324913715, 'been': 0.9774773354796025, 'mark': 3.0388168179236272, 'parties': 4.495771854722687, 'live': 2.586285553294776, 'jam': 5.384376568345493, 'music': 12.549548608814243, 'house': 2.325548234164261, 'ronson': 7.690658523220865, 'featuring': 3.5737313717986314, 'grew': 2.9691878815366133, 'was': 0.3968289280609173, 'tell': 4.804410482508563, '1999': 1.9968016883646342, 'vans': 7.690658523220865, 'hard': 4.057957571060529, 'that': 0.6614069466714981, 'club': 4.417263349208051, 'ragehbo': 10.986495389225194, 'took': 2.2103282894939196, 'released': 6.023504698756524, 'remixed': 5.982549083279735, 'commissioned': 7.927254606285105, 'level': 3.2406271594329263, 'with': 1.0123432126103034, 'he': 0.853474230634896, '10': 2.3157231098806563, 'television': 2.00180169878134, 'east': 2.8400760661271907, 'mixtapes': 7.225295273531632, 'mc': 5.782488702148399, 'culture': 6.265781152254715, 'collaborations': 4.808551275174594, 'ultramagnetic': 10.986495389225194, 'official': 6.300251257360141, 'up': 1.5635467678501844, 'recording': 2.9764678607434605, 'record': 4.3057738130686065, 'historical': 3.80718741972116, 'mr': 3.4165677339825424, 'making': 2.7223896254962385, 'z': 5.989283115461079, 'dizzee': 27.12175572050964, 'def': 6.475635882708344, 'mz': 9.377057476791094, 'showlacrate': 10.986495389225194, 'fashion': 21.165287353137074, 'pellatfinet': 10.986495389225194, 'played': 1.5454402004384837, 'is': 0.11046500190207996, 'in': 0.008688657151093043, 'confused': 5.739471317064708, 'an': 0.2982390890818971, 'graffiti': 6.391375539090604, 'at': 0.43063857330825733, 'allen': 13.119333512536905, 'madonna': 5.96921555241027, 'wutang': 7.808441558877249, 'eminems': 8.789270811888976, 'clothing': 10.82868271409486, 'campus': 4.687546142369252, 'film': 2.033113917057952, 'summers': 5.771559631616209, 'payday': 7.941972951501771, 'began': 3.454920191420506, 'song': 2.869182927623221, 'brandaaron': 10.986495389225194, 'range': 3.6626648230228773, 'epmd': 9.377057476791094, 'also': 0.9254541832324698, 'other': 1.4424007566948476, 'role': 2.20355903287593, 'which': 0.7674309670437692, 'toured': 3.527156494004899, 'nyc': 10.41150374686573, 'many': 1.639964662798746, 'lily': 19.08456772782277, 'life': 4.3815235665489185, 'even': 3.282585179608883, 'used': 2.7818235602743835, 'star': 2.9854754279015427, 'tshirts': 14.230588756634607, 'collaboration': 3.7154868509442025, 'time': 1.3253342074200498, 'upon': 3.3017114457024097, 'helping': 3.9398481113764383, 'most': 1.4186204428983973, 'director': 1.6150570969066835, 'throughout': 2.8924222411558422, 'sponsored': 4.439709978464671, 'streetwear': 19.200402056210606, 'a': 0.03371510683549888, 'rakim': 27.12175572050964, 'gutter': 15.790905871733758, 'tramps': 8.907053847545358, 'age': 2.138848033513307, 'later': 1.4294496043477696, 'seth': 6.422147197757358, 'spent': 2.418609083493439, 'baltimoreaaron': 10.986495389225194, 'vegas': 4.536024967081018, 'ferrari': 6.371374872383935, 'baltimore': 33.93948962331078, 'writing': 2.61409678271219, '2005': 1.6425861253275964, 'the': 0.0015236676386087002, '2009': 1.5644364836042695, 'original': 2.9143400810369444}, {'taxi': 6.0520214560945025, 'salon': 6.134465125305577, 'gangs': 6.9975113426609195, 'being': 1.7938099524877322, 'text': 4.656774483702498, 'agreed': 4.663930149297911, 'authors': 4.320811671442786, 'thomas': 3.3202734635624696, 'nominated': 2.8896779186528754, 'years': 1.0752380994247055, 'four': 1.922106072733316, 'chair': 2.9874881759812397, 'enjoyed': 4.373111170845635, 'before': 1.4967823726683713, 'one': 3.723722935234867, '1': 2.0978765819243166, 'du': 4.286995049063516, 'also': 1.8509083664649395, 'chosen': 3.80186623650788, 'regarded': 4.1226919977722405, 'april': 3.9784489659834934, 'fergusons': 16.69487611921987, 'writing': 5.22819356542438, 'to': 0.46944937681799226, 'critical': 3.480453210707072, 'kinkajou': 10.986495389225194, 'developed': 3.092923315720292, 'alberta': 4.90885314587616, 'vancouver': 4.219152263959802, 'equally': 5.793538538334984, 'has': 2.570985238464234, '2011': 1.7023470901042919, 'lake': 4.0270968770912186, '2013': 1.9545642372230505, '2012': 1.7938099524877322, '11': 2.5966811271387873, 'twentieth': 5.574849337370154, 'his': 2.1008916417092585, 'march': 1.9573172463231197, 'returned': 2.224475435663605, 'returning': 3.587709113805246, 'very': 2.811792506755733, 'pages': 5.276068371850324, 'wood': 4.835892620778915, 'commemorate': 6.381325203237103, 'timekeeper': 9.19473591999714, 'five': 2.2137301792754096, 'ferguson': 11.713193348604241, 'canadian': 8.300699805144342, 'fall': 3.6340542889816114, 'press': 2.722647257856288, 'penname': 7.728398851203712, 'burns': 5.67822769182399, 'burnt': 7.402976450769084, 'day': 2.661431695593997, 'worlds': 3.5737313717986314, 'awaited': 7.585298007563039, 'indeed': 5.469042492760487, 'prize': 2.7453192387302345, 'countries': 3.3746529896447774, '54': 5.11719847609142, 'university': 1.6946860096423695, 'book': 4.005609415785018, 'history': 4.289515921249219, 'die': 4.504918259948763, 'served': 1.5362723499305253, 'raised': 3.059531844362216, 'river': 12.315252256747977, 'where': 2.178152424181346, 'dir': 15.105016369480095, 'county': 2.8412358227083288, 'declared': 4.504918259948763, 'work': 1.3993637697254548, 'often': 2.862641126119281, 'fair': 4.432561985199383, 'people': 2.5151367341527564, 'house': 2.325548234164261, 'some': 2.1974446757041473, 'play': 11.351437090366709, 'literature': 3.391108110371222, 'past': 3.272264244376109, 'born': 0.268196273764765, 'second': 5.017277494459604, 'been': 4.887386677398013, 'quebec': 9.100690041711534, 'miniseries': 5.306322780208127, 'novelist': 8.89381886721505, 'tenth': 4.932056042955824, 'rooke': 9.600201028105303, 'special': 2.7022435916032777, 'canada': 8.05715528512122, 'fiction': 10.923391646424976, 'said': 2.808417925375587, 'plays': 2.869182927623221, 'appear': 4.067800170204723, 'banff': 6.332535039067671, 'livres': 9.377057476791094, 'arguably': 5.782488702148399, 'novels': 12.019258345990293, 'martins': 5.384376568345493, 'anniversary': 4.099963746694684, 'ice': 8.183650699583426, 'creative': 3.464636137023565, 'written': 2.3444332157630874, 'won': 1.3836400683164753, 'produced': 7.534750770196689, '1947': 7.513313222147888, 'new': 3.5486130624501095, 'ever': 2.956736868784372, 'sold': 3.3511915029657793, '2015in': 8.042056410058754, 'europe': 2.7615279103106105, 'be': 4.218744013624684, 'sixteen': 4.96062941539988, 'who': 0.9098952189804214, 'run': 2.7873060301471266, 'lee': 3.5895467866041804, 'journal': 3.025473923341824, 'night': 6.558865467709443, 'sprung': 16.556890376245967, 'french': 2.9278516770095764, 'water': 4.0270968770912186, 'york': 5.103142634286618, 'ontario': 3.738702807457348, 'studio': 6.162107480329817, 'no': 2.396423552396384, 'become': 4.992935731763453, 'genre': 4.5632484256916745, 'november': 3.9453132752336004, 'teaches': 3.7712554104950966, 'by': 2.6218738844338163, 'starborn': 10.986495389225194, 'stage': 2.894255982500984, 'received': 3.0793219631333275, 'dunne': 7.431147327735781, 'would': 2.136551161989598, 'language': 3.4581636225179473, 'invit': 9.19473591999714, 'of': 0.09559205257956506, 'invited': 6.8601348395698825, 'infinitheatre': 32.95948616767558, 'americas': 4.168664817771044, 'paris': 6.380052292278274, 'farrow': 26.05173088869345, 'place': 2.5222811225998436, 'published': 5.775009341956282, 'aka': 4.654993539331503, 'qubcoise': 9.19473591999714, 'settled': 4.687546142369252, 'seaforth': 9.887883100557085, 'montreal': 23.576750445163817, 'first': 2.782419485597241, 'among': 4.599794065896458, 'major': 2.0581201293715634, 'hudson': 10.746734565674249, 'bunkhousesin': 10.986495389225194, 'canadas': 9.32069183214019, 'claim': 4.70049729471633, 'into': 3.2101258848132113, 'paperback': 5.989283115461079, 'leon': 5.341048491581956, 'highly': 7.146256107059908, 'sienna': 8.153282045168979, 'guests': 5.477107052597217, 'village': 3.9268777609338112, 'union': 2.8368936534890397, 'ninth': 4.656774483702498, 'crime': 8.052295320247774, 'bestseller': 5.137170609278335, 'city': 5.220626671426627, 'arts': 4.367945781881546, '1985': 2.5324550128142254, 'from': 0.29375533798563447, 'publication': 7.9148156501510645, 'working': 2.074561053063768, 'hes': 4.589565734009048, 'festival': 2.5815754402917426, 'west': 2.602605045123377, 'trilogy': 5.131423467022767, 'three': 2.9830050587151904, 'long': 8.34711136995346, 'next': 2.670439668860552, 'few': 3.0010110324913715, 'barnacle': 9.19473591999714, 'twenties': 6.475635882708344, 'storm': 4.980142229623461, 'which': 0.7674309670437692, 'travelled': 4.744272123770029, 'john': 7.965321595656177, 'was': 0.8730236417340181, 'until': 1.7591012626831841, 'opens': 6.509158574746988, 'more': 1.673778950632145, 'ziet': 10.293348208665249, 'option': 5.905091024240731, 'both': 1.6730570592454443, 'successful': 2.679282762596886, 'masque': 8.789270811888976, 'under': 1.8347379746813095, 'onyx': 8.278445188122983, 'award': 1.6322278484423687, '2014city': 10.986495389225194, 'guy': 9.202601980454938, 'zarathustra': 9.19473591999714, 'library': 4.04144432549936, 'booklist': 7.520759486425468, '20000': 5.804711838933109, 'maclennan': 8.683910296231149, 'sun': 4.156701651712769, 'worked': 3.107783706724218, 'highest': 3.43846641929018, 'with': 0.40493728504412135, 'than': 2.0650378102773113, 'today': 3.4098856222521565, 'he': 2.560422691904688, 'toronto': 3.3555488083347353, 'high': 1.906719387270128, 'novel': 19.91883113487065, '17': 2.7543211528412543, '1999': 1.9968016883646342, 'lacadmie': 8.09612363132903, 'des': 8.255860708867658, 'will': 5.359552692045008, 'books': 2.2488823524163437, 'nine': 3.2624907325491286, 'time': 1.3253342074200498, 'praise': 5.012685777355933, 'midteens': 8.501588739437194, 'newspaper': 3.4624739740190695, 'murders': 5.658619220435613, 'called': 8.31390826576123, 'dennis': 4.640859028396599, 'and': 0.009686870674633467, 'bridge': 4.810628119119433, 'acclaim': 4.279633052622447, 'preeminent': 6.629786562535602, 'century': 3.516271253325228, 'best': 10.73520750690485, 'is': 0.13808125237759994, 'it': 5.266322333975261, 'an': 0.2982390890818971, 'states': 1.824400329877006, 'as': 0.3815085660372354, 'concordia': 6.250296940830698, 'caused': 4.346619555398658, 'at': 1.7225542932330293, 'in': 0.03958166035497942, 'seen': 3.4810031144877702, 'huron': 7.728398851203712, 'cited': 4.149162574539603, 'film': 2.033113917057952, 'booksst': 10.986495389225194, 'dhonneur': 6.812108119329557, 'united': 1.5679220786705559, 'author': 2.2935018580052677, 'things': 4.001779069106929, 'began': 1.727460095710253, 'that': 0.6614069466714981, 'hugh': 5.16938422926199, 'for': 0.36431264671643454, 'write': 7.725645208041174, 'chants': 7.654290879049991, 'writers': 10.632007999292261, 'writersextraordinary': 10.986495389225194, 'northwest': 4.958216868994496, 'fourth': 3.078843794514105, 'beach': 4.012016478200149, 'company': 2.195009362476192, 'all': 1.6431112434912472, 'towards': 4.038558320610226, 'simon': 3.916621260766622, 'writerinresidence': 6.709829270209139, 'theatre': 8.980652525721265, 'centre': 3.0261717600763545, 'may': 1.7899497282712007, 'english': 2.239461125047026, 'upon': 3.3017114457024097, 'schuster': 6.134465125305577, 'coproduced': 4.86181199833099, 'france': 6.015683320284927, 'germany': 3.0033964485143017, 'faculty': 3.0247765732438294, 'throughout': 2.8924222411558422, 'frequently': 3.7653902910426984, 'on': 0.36882550670120073, 'railway': 5.708380729994677, 'a': 0.0730493981435809, 'early': 1.929422753652229, 'short': 2.711119014388788, 'driving': 4.407244177215094, 'third': 7.051892004274375, 'gravitated': 8.421546031763658, '1977': 2.6962027980008787, 'age': 2.138848033513307, 'later': 1.4294496043477696, 'well': 1.5295293417875981, 'cultural': 3.3132722681034856, 'trevor': 16.383127350280233, '2002': 7.501250355128921, '2006': 1.520737905384506, 'series': 5.631242807516916, '2004': 1.6903443608359008, '2005': 1.6425861253275964, 'lives': 2.532029027717262, 'the': 0.005383625656417407, '1000': 4.534446434787968}, {'houston': 3.935505942157149, 'frankie': 6.037735498847026, 'labels': 4.808551275174594, 'hardcores': 10.986495389225194, 'produced': 2.511583590065563, 'roy': 4.520350664987575, 'london': 4.439578782949358, 'teamed': 4.91345085512479, 'asylum': 5.850696952174933, 'godfathers': 8.683910296231149, '27': 3.110616229728885, 'lutricia': 9.887883100557085, 'also': 0.9254541832324698, 'producing': 3.830318751744579, 'mcneal': 9.377057476791094, 'including': 2.4545648916922365, 'to': 0.23472468840899613, 'present': 3.679293074460456, 'under': 1.8347379746813095, 'heavies': 8.907053847545358, 'has': 0.856995079488078, 'gave': 3.274946409596048, 'do': 3.162449378368902, 'his': 1.3130572760682866, 'dj': 8.527731188739493, 'producernelson': 10.986495389225194, 'continues': 3.3444509863519367, 'early': 3.858845507304458, 'records': 8.035628531862525, 'birth': 4.309411927978059, 'breakbeat': 7.585298007563039, 'using': 3.288012601344248, 'name': 2.433549028103139, 'level': 3.2406271594329263, 'james': 5.524663753174666, 'roll': 4.581266931194353, 'garage': 6.0738405034891425, 'vibes': 15.790905871733758, 'bebel': 8.683910296231149, 'always': 3.859604580326386, 'x': 4.523465932304524, 'guy': 4.601300990227469, 'dodger': 7.157853992736099, 'tei': 8.683910296231149, 'bump': 15.535239128713988, 'house': 4.651096468328522, '19901993': 7.460134864609033, 'some': 4.3948893514082945, 'thelma': 7.348909229498808, 'gilberto': 6.843360662833661, 'born': 0.268196273764765, 'n': 8.564162068522174, 'juliet': 5.771559631616209, 'delivered': 9.840774598242893, 'jodeci': 10.293348208665249, 'knight': 4.8002867653247, 'since': 1.455376717308041, 'liberty': 5.279385124476319, 'label': 9.90513433710723, 'consistently': 5.266183612617783, 'bass': 3.857197840295821, 'wishdokta': 32.95948616767558, 'then': 1.4309354361561304, 'new': 0.8871532656125274, 'champagne': 7.035251670643767, 'numerous': 2.4220370053418425, 'red': 3.274498882177525, 'rosie': 6.642689967371511, 'evolved': 5.402999080443495, 'kelis': 8.683910296231149, 'aaliyah': 9.19473591999714, 'others': 2.8003093949991116, 'along': 2.5088749729287803, 'by': 0.37455341206197373, 'on': 0.5532382600518011, 'hits': 3.7189699613970224, 'brown': 7.309560838997457, 'of': 0.02078088099555762, 'legendary': 4.280856294365192, 'days': 3.0398778259807213, 'goldie': 7.61919955923872, 'april': 1.9892244829917467, 'negrocan': 10.986495389225194, 'gaines': 7.489987827758714, 'into': 1.6050629424066056, 'scene': 3.553420040326614, 'one': 0.9309307338087167, 'misteeq': 9.887883100557085, 'simply': 4.736520146965711, 'another': 2.603062187988481, 'artists': 2.725227238647546, '1990s': 3.1109960967799863, 'flex': 15.616883117754497, 'city': 1.7402088904755424, 'knuckles': 8.042056410058754, 'from': 0.29375533798563447, 'alterego': 7.851001173296044, '2step': 9.887883100557085, 'top': 2.2740644157484557, 'they': 1.8993401178193898, 'due': 2.7324867986603505, 'ayers': 7.690658523220865, 'few': 3.0010110324913715, 'music': 1.7927926584020348, 'evelyn': 6.401527910554623, 'biggest': 8.2245937875438, 'heralded': 6.961143698490045, 'was': 0.15873157122436693, 'hardcoredrum': 10.986495389225194, 'happy': 9.680332263112593, 'head': 2.464115671121657, 'remixer': 7.1363477875151355, 'that': 1.3228138933429963, 'club': 2.2086316746040255, 'brand': 4.497290457899877, 'remixed': 5.982549083279735, 'known': 2.8915454305304062, 'worked': 1.553891853362109, 'with': 0.8098745700882427, 'funk': 5.526909875081035, 'he': 0.426737115317448, 'sound': 3.5265806229840893, 'king': 3.452801679376562, '1993': 2.3032794139845048, 'include': 2.174545211685214, '1997': 2.1298344522079455, 'musicsome': 9.600201028105303, 'scenes': 4.932056042955824, 'towa': 9.19473591999714, 'roberts': 5.151684652162589, 'record': 4.3057738130686065, 'uk': 8.371829993731193, 'artful': 7.941972951501771, 'and': 0.006706295082438554, 'ripe': 17.003177478874388, 'remained': 3.3202734635624696, 'nng': 10.986495389225194, 'ah': 7.402976450769084, 'is': 0.05523250095103998, 'delivers': 6.766987684049088, 'it': 1.3165805834938153, 'an': 0.2982390890818971, 'drawer': 8.347438059609935, 'as': 0.3815085660372354, '1971': 2.857025624440964, 'in': 0.0028962190503643475, 'beverley': 7.851001173296044, 'inc': 3.612493529875034, 'grant': 3.978894775273341, 'began': 1.727460095710253, 'when': 1.3806055739282235, 'traxin': 10.986495389225194, 'started': 2.1106492114865967, 'sunday': 3.909841573781243, 'other': 1.4424007566948476, 'which': 0.7674309670437692, 'anthems': 7.489987827758714, 'garagehe': 10.986495389225194, 'nice': 11.20400065287221, 'swing': 5.41434135704743, 'produce': 4.200907744217265, 'a': 0.011238368945166293, 'for': 0.07286252934328691, 'together': 2.8423969258866704, 'up': 1.5635467678501844, 'faithless': 8.501588739437194, 'so': 2.661916544088344, 'nelson': 18.563436113586395, 'jamiroquai': 8.153282045168979, 'english': 2.239461125047026, 'the': 0.0010157784257391334, 'kickin': 8.421546031763658, 'agnes': 6.5797761419609415}, {'phenomenon': 5.750053426395245, 'deborash': 10.986495389225194, 'innovative': 4.48821323974876, 'still': 2.700225936442129, 'jay': 4.577966598165696, 'cornell': 4.897450513778348, 'writing': 2.61409678271219, 'to': 0.04694493768179923, 'treatment': 4.3571321357877455, '2014': 2.2073995783446634, '2013': 1.9545642372230505, 'explorations': 6.655762048938863, 'good': 3.0884556984605758, 'conversations': 6.002888767516858, '1955': 3.556974546438732, 'listening': 5.5883326877074415, 'press': 5.445294515712576, 'h': 3.9468350393631186, 'university': 4.236715024105924, 'truths': 7.157853992736099, 'shoshana': 8.789270811888976, 'ashes': 6.391375539090604, 'roger': 4.257866776140492, 't': 4.286995049063516, 'yale': 7.968678869643146, 'where': 1.089076212090673, 'felman': 10.293348208665249, 'leaders': 3.8741679445142827, 'reference': 4.67113738770286, 'catastrophic': 7.348909229498808, 'empirical': 6.323056295113127, 'born': 0.268196273764765, 'see': 3.465177409025954, 'taught': 2.8485149347730556, 'our': 3.573128053529954, 'humane': 6.351766400995558, 'forthcoming': 5.535456935659494, 'what': 2.819143402169124, 'for': 0.07286252934328691, 'ways': 4.390714875263883, 'coeditor': 4.977682203782599, 'robert': 2.802818806604537, 'critical': 6.960906421414144, 'trauma': 37.6079791075806, 'encounters': 6.207371896113665, 'she': 6.327948952685501, 'we': 3.3825974207033136, 'theory': 7.054312988009798, 'unconscious': 6.908957945319475, 'frank': 3.6866980224670334, 'harvard': 3.3170001382175003, 'twentieth': 5.574849337370154, 'both': 1.6730570592454443, 'traumas': 8.588600116426823, 'hopkins': 24.555746790682548, 'of': 0.045717938190226765, 'experience': 6.325698916780484, 'cathy': 6.21581076475953, '173182': 10.986495389225194, 'fictions': 7.585298007563039, 'letters': 4.370430204092377, 'previously': 2.7704072905928783, 'johns': 21.52193395887693, 'one': 0.9309307338087167, 'lifton': 9.600201028105303, 'pp': 6.555678590381881, 'from': 0.29375533798563447, 'her': 3.100430757265601, 'emory': 6.2329051981188295, 'century': 3.516271253325228, 'question': 4.850930498143455, '1988': 2.4491074905234376, 'scholars': 4.790051261430674, 'describes': 4.564873121418676, 'call': 4.110230777334428, 'editor': 2.7424241189294083, 'memory': 4.763919121153825, 'mysterious': 6.0738405034891425, 'that': 0.6614069466714981, 'caruth': 10.986495389225194, 'with': 0.40493728504412135, 'appointed': 2.2003443343554556, '1991': 2.3750835225699753, 'md': 4.75012579902149, '1995': 4.444259337390772, '1996': 2.135691193468776, 'also': 0.4627270916162349, 'work': 1.3993637697254548, 'up': 7.817733839250922, 'freud': 7.115294378317303, 'rutgers': 5.296135934901134, 'rhodes': 5.683190481166118, 'history': 4.289515921249219, 'and': 0.010432014572682195, 'locke': 7.157853992736099, 'esch': 9.19473591999714, 'is': 0.11046500190207996, 'received': 1.5396609815666638, 'helped': 2.9399460319421156, 'deconstructive': 9.887883100557085, 'as': 0.1271695220124118, 'at': 0.645957859962386, 'in': 0.006757844450850144, 'comparative': 9.629589583628558, 'kant': 7.225295273531632, 'author': 2.2935018580052677, 'trials': 4.726913925160272, 'discussion': 5.154612911941677, 'wordsworth': 8.09612363132903, 'caruths': 10.986495389225194, 'build': 4.568130453288983, 'department': 2.3398541306220704, 'luckhurst': 10.293348208665249, 'literature': 10.173324331113665, 'narrative': 4.965472039875668, 'phd': 2.5603215961961254, 'most': 1.4186204428983973, 'unclaimed': 9.600201028105303, 'on': 0.5532382600518011, 'juridical': 8.09612363132903, 'a': 0.005619184472583146, 'conceptualizing': 8.907053847545358, 'professor': 2.010865204934687, 'departments': 4.850930498143455, '2002': 1.8753125887822302, 'perceiving': 9.19473591999714, 'responsibility': 4.451254118211536, 'english': 2.239461125047026, 'the': 0.0009142005831652201, 'n3': 10.293348208665249}]
people['tfidf'] = tfidf
people.head()
URI | name | text | word_count |
---|---|---|---|
<http://dbpedia.org/resou rce/Digby_Morrell> ... |
Digby Morrell | digby morrell born 10 october 1979 is a former ... |
{'since': 1, 'carltons': 1, 'being': 1, '2005' ... |
<http://dbpedia.org/resou rce/Alfred_J._Lewy> ... |
Alfred J. Lewy | alfred j lewy aka sandy lewy graduated from ... |
{'precise': 1, 'thomas': 1, 'closely': 1, ... |
<http://dbpedia.org/resou rce/Harpdog_Brown> ... |
Harpdog Brown | harpdog brown is a singer and harmonica player who ... |
{'just': 1, 'issued': 1, 'mainly': 1, 'nominat ... |
<http://dbpedia.org/resou rce/Franz_Rottensteiner> ... |
Franz Rottensteiner | franz rottensteiner born in waidmannsfeld lower ... |
{'all': 1, 'bauforschung': 1, ... |
<http://dbpedia.org/resou rce/G-Enka> ... |
G-Enka | henry krvits born 30 december 1974 in tallinn ... |
{'legendary': 1, 'gangstergenka': 1, ... |
<http://dbpedia.org/resou rce/Sam_Henderson> ... |
Sam Henderson | sam henderson born october 18 1969 is an ... |
{'now': 1, 'currently': 1, 'less': 1, 'being' ... |
<http://dbpedia.org/resou rce/Aaron_LaCrate> ... |
Aaron LaCrate | aaron lacrate is an american music producer ... |
{'exclusive': 2, 'producer': 1, 'tribe': ... |
<http://dbpedia.org/resou rce/Trevor_Ferguson> ... |
Trevor Ferguson | trevor ferguson aka john farrow born 11 november ... |
{'taxi': 1, 'salon': 1, 'gangs': 1, 'being': 1, ... |
<http://dbpedia.org/resou rce/Grant_Nelson> ... |
Grant Nelson | grant nelson born 27 april 1971 in london ... |
{'houston': 1, 'frankie': 1, 'labels': 1, ... |
<http://dbpedia.org/resou rce/Cathy_Caruth> ... |
Cathy Caruth | cathy caruth born 1955 is frank h t rhodes ... |
{'phenomenon': 1, 'deborash': 1, ... |
tfidf |
---|
{'since': 1.455376717308041, ... |
{'precise': 6.44320060695519, ... |
{'just': 2.7007299687108643, ... |
{'all': 1.6431112434912472, ... |
{'legendary': 4.280856294365192, ... |
{'now': 1.96695239252401, 'currently': ... |
{'exclusive': 10.455187230695827, ... |
{'taxi': 6.0520214560945025, ... |
{'houston': 3.935505942157149, ... |
{'phenomenon': 5.750053426395245, ... |
Examine the tf-idf for obama article
obama = people[people['name'] == 'Barack Obama']
obama[['tfidf']].stack('tfidf',new_column_name=['word','tfidf']).sort('tfidf',ascending=False)
word | tfidf |
---|---|
obama | 43.2956530721 |
act | 27.678222623 |
iraq | 17.747378588 |
control | 14.8870608452 |
law | 14.7229357618 |
ordered | 14.5333739509 |
military | 13.1159327785 |
involvement | 12.7843852412 |
response | 12.7843852412 |
democratic | 12.4106886973 |
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
Manually compute distances between few people
clinton = people[people['name'] == 'Bill Clinton']
clinton
URI | name | text | word_count |
---|---|---|---|
<http://dbpedia.org/resou rce/Bill_Clinton> ... |
Bill Clinton | william jefferson bill clinton born william ... |
{'rating': 1, 'serving': 1, 'surplus': 1, ... |
tfidf |
---|
{'rating': 5.377023594040234, ... |
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.
beckham = people[people['name'] == 'David Beckham']
beckham
URI | name | text | word_count |
---|---|---|---|
<http://dbpedia.org/resou rce/David_Beckham> ... |
David Beckham | david robert joseph beckham obe bkm born 2 ... |
{'fifa': 3, 'bending': 1, 'six': 2, 'beckham': 8, ... |
tfidf |
---|
{'fifa': 14.135200103949765, ... |
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.
Is obama closer to clinton than to Beckham
graphlab.distances.cosine(obama['tfidf'][0],clinton['tfidf'][0])
0.8339854936884276
graphlab.distances.cosine(obama['tfidf'][0],beckham['tfidf'][0])
0.9791305844747478
Build a nearest neighbour model for document retrieval
knn_model = graphlab.nearest_neighbors.create(people,features=['tfidf'],label='name')
PROGRESS: Starting brute force nearest neighbors model training.
Applyting the nearest neghbour model for retrieval
who is closest to obama
knn_model.query(obama)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0 | 1 | 0.00169288 | 19.001ms |
PROGRESS: | Done | | 100 | 613.793ms |
PROGRESS: +--------------+---------+-------------+--------------+
query_label | reference_label | distance | rank |
---|---|---|---|
0 | Barack Obama | 0.0 | 1 |
0 | Joe Biden | 0.794117647059 | 2 |
0 | Joe Lieberman | 0.794685990338 | 3 |
0 | Kelly Ayotte | 0.811989100817 | 4 |
0 | Bill Clinton | 0.813852813853 | 5 |
Other examples of document retrieval
swift = people[people['name'] == 'Taylor Swift']
knn_model.query(swift)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0 | 1 | 0.00169288 | 5.37ms |
PROGRESS: | Done | | 100 | 618.717ms |
PROGRESS: +--------------+---------+-------------+--------------+
query_label | reference_label | distance | rank |
---|---|---|---|
0 | Taylor Swift | 0.0 | 1 |
0 | Carrie Underwood | 0.76231884058 | 2 |
0 | Alicia Keys | 0.764705882353 | 3 |
0 | Jordin Sparks | 0.769633507853 | 4 |
0 | Leona Lewis | 0.776119402985 | 5 |
jolie = people[people['name'] == 'Angelina Jolie']
knn_model.query(jolie)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0 | 1 | 0.00169288 | 5.051ms |
PROGRESS: | Done | | 100 | 599.313ms |
PROGRESS: +--------------+---------+-------------+--------------+
query_label | reference_label | distance | rank |
---|---|---|---|
0 | Angelina Jolie | 0.0 | 1 |
0 | Brad Pitt | 0.784023668639 | 2 |
0 | Julianne Moore | 0.795857988166 | 3 |
0 | Billy Bob Thornton | 0.803069053708 | 4 |
0 | George Clooney | 0.8046875 | 5 |
arnold = people[people['name'] == 'Arnold Schwarzenegger']
knn_model.query(arnold)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0 | 1 | 0.00169288 | 5.336ms |
PROGRESS: | Done | | 100 | 605.323ms |
PROGRESS: +--------------+---------+-------------+--------------+
query_label | reference_label | distance | rank |
---|---|---|---|
0 | Arnold Schwarzenegger | 0.0 | 1 |
0 | Jesse Ventura | 0.818918918919 | 2 |
0 | John Kitzhaber | 0.824615384615 | 3 |
0 | Lincoln Chafee | 0.833876221498 | 4 |
0 | Anthony Foxx | 0.833910034602 | 5 |
Compare top words according to word counts to TF-IDF:
john = people[people['name'] == 'Elton John']
print john
+-------------------------------+------------+-------------------------------+
| URI | name | text |
+-------------------------------+------------+-------------------------------+
| <http://dbpedia.org/resour... | Elton John | sir elton hercules john cb... |
+-------------------------------+------------+-------------------------------+
+-------------------------------+-------------------------------+
| word_count | tfidf |
+-------------------------------+-------------------------------+
| {'all': 1, 'six': 1, 'prod... | {'all': 1.6431112434912472... |
+-------------------------------+-------------------------------+
[? rows x 5 columns]
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.
john['word_count']
dtype: dict
Rows: ?
[{'all': 1, 'six': 1, 'producer': 1, 'heavily': 1, 'over': 2, 'named': 1, 'fifty': 1, 'four': 1, 'openly': 1, 'including': 1, 'highestprofile': 1, 'years': 1, 'its': 2, 'impact': 1, 'westminster': 1, '27': 1, '21': 2, 'wed': 1, 'had': 1, '1947': 1, 'abbey': 1, 'winning': 1, 'late': 1, 'to': 4, 'commander': 1, 'about': 1, 'born': 1, '2014': 1, 'as': 2, 'has': 9, '2013': 1, 'his': 4, 'march': 1, 'than': 3, 'song': 1, 'songwriter': 2, 'continues': 1, 'records': 1, 'five': 1, 'occasional': 1, 'they': 1, 'inception': 1, 'world': 1, 'brit': 1, 'him': 3, 'datein': 1, 'hall': 2, 'fivedecade': 1, 'knighthood': 1, 'bestselling': 2, 'artist': 1, 'be': 1, '1996': 1, 'list': 1, 'roll': 2, 'hercules': 1, 'announced': 1, 'rock': 2, 'become': 1, 'bernie': 1, 'outstanding': 1, 'england': 1, 'composer': 1, 'queens': 1, 'foundation': 2, 'diana': 1, 'globe': 1, 'artists': 2, 'culture': 1, 'been': 3, '49': 1, 'year': 1, 'billboard': 4, 'aids': 2, 'empire': 1, 'honors': 1, 'oscar': 1, 'elizabeth': 1, 'composers': 1, 'established': 1, 'elton': 3, 'for': 5, 'record': 1, '58': 1, 'since': 5, 'legal': 1, 'collaborated': 1, 'outside': 1, 'consecutive': 2, 'funeral': 1, 'disney': 1, 'solo': 1, 'marriage': 2, 'who': 1, '25': 1, '300': 1, 'sold': 2, 'million': 3, 'lgbt': 1, 'ranked': 2, 'awards': 3, 'alltime': 1, '100': 3, 'overallelton': 1, 'legend': 1, 'received': 2, 'hits': 1, 'english': 1, '33': 1, 'involved': 1, 'industry': 1, '30': 1, 'against': 1, 'david': 1, 'became': 1, 'tonightcandle': 1, 'social': 1, 'samesex': 1, 'contribution': 1, 'lasting': 1, 'dwight': 1, 'first': 1, 'golden': 1, 'raised': 1, 'grammy': 1, 'civil': 1, 'taupin': 1, 'into': 3, 'lyricist': 1, 'number': 2, 'one': 3, 'services': 2, 'ii': 1, 'kennedy': 1, 'least': 1, 'inducted': 1, 'parties': 1, 'tony': 1, '19702000': 1, 'concert': 1, 'jubilee': 1, 'from': 1, 'kenneth': 1, 'top': 4, 'cbe': 1, 'copies': 1, '1988': 1, 'fight': 1, '2': 1, 'music': 3, 'way': 1, 'bisexual': 1, 'hollywood': 1, 'john': 7, 'was': 2, 'songwriters': 2, 'more': 3, 'brits': 1, 'british': 3, 'diamond': 1, 'champion': 1, 'gay': 2, 'on': 6, 'successful': 1, 'academy': 3, 'stone': 1, 'award': 5, 'buckingham': 1, 'authors': 1, 'worked': 1, 'fellow': 1, 'with': 2, 'entered': 1, 'he': 7, '10': 1, '1992': 1, '1994': 1, '1997': 2, '40': 2, '1998': 1, 'hosting': 1, 'us': 1, 'career': 1, 'nine': 1, 'era': 1, 'royal': 1, 'of': 13, 'making': 1, 'male': 1, '31': 1, 'something': 1, 'and': 15, 'seven': 1, 'annual': 1, 'palace': 2, 'look': 1, 'december': 2, 'is': 4, 'partnership': 1, 'an': 3, '1980s': 1, 'single': 2, 'performed': 1, 'have': 1, 'in': 18, 'partner': 1, 'fame': 2, 'film': 1, 'movements': 1, 'sir': 1, 'no': 3, 'began': 1, '1967': 1, 'inductee': 1, 'actor': 1, '1': 2, 'hot': 2, 'musicians': 1, 'which': 1, 'influential': 1, 'party': 2, 'you': 1, 'pianist': 1, 'events': 1, 'worldwide': 2, '200': 1, 'princess': 1, 'time': 1, 'singles': 1, 'albums': 2, 'after': 1, 'most': 1, 'two': 1, 'rolling': 1, 'such': 1, '2008': 1, 'icon': 1, 'a': 10, 'singer': 1, 'center': 1, 'third': 1, '2012he': 1, '1976': 1, 'later': 1, 'reginald': 1, 'having': 1, '2002': 1, 'charitable': 1, 'wind': 1, '2004': 2, '2005': 1, 'at': 4, 'the': 27, 'order': 1, 'furnish': 2}, ... ]
john_word_count_table = john[['word_count']].stack('word_count',new_column_name = ['word','count'])
john_word_count_table.sort('count', ascending=False)
word | count |
---|---|
the | 27 |
in | 18 |
and | 15 |
of | 13 |
a | 10 |
has | 9 |
he | 7 |
john | 7 |
on | 6 |
since | 5 |
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
john_tfidf_table = john[['tfidf']].stack('tfidf',new_column_name = ['tfidf','count'])
john_tfidf_table
tfidf | count |
---|---|
movements | 5.03065801976 |
social | 2.62268650471 |
champion | 3.17654830275 |
wed | 6.90895794532 |
legal | 3.424333758 |
became | 1.33005993305 |
after | 0.944333442001 |
2005 | 1.64258612533 |
december | 4.00285165916 |
furnish | 18.38947184 |
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
john_tfidf_table.sort('count',ascending=False)
tfidf | count |
---|---|
furnish | 18.38947184 |
elton | 17.48232027 |
billboard | 17.3036809575 |
john | 13.9393127924 |
songwriters | 11.250406447 |
overallelton | 10.9864953892 |
tonightcandle | 10.9864953892 |
19702000 | 10.2933482087 |
fivedecade | 10.2933482087 |
aids | 10.262846934 |
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.
Measuring cosine distance:
victoria = people[people['name'] == 'Victoria Beckham']
paul = people[people['name'] == 'Paul McCartney']
victoria
URI | name | text | word_count |
---|---|---|---|
<http://dbpedia.org/resou rce/Victoria_Beckham> ... |
Victoria Beckham | victoria caroline beckham ne adams born 17 april ... |
{'millionin': 1, 'saying': 1, 'cameo': 1, ... |
tfidf |
---|
{'millionin': 7.728398851203712, ... |
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.
paul
URI | name | text | word_count |
---|---|---|---|
<http://dbpedia.org/resou rce/Paul_McCartney> ... |
Paul McCartney | sir james paul mccartney mbe born 18 june 1942 is ... |
{'all': 1, 'gold': 1, 'over': 1, 'kintyre': 1, ... |
tfidf |
---|
{'all': 1.6431112434912472, ... |
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.
john
URI | name | text | word_count |
---|---|---|---|
<http://dbpedia.org/resou rce/Elton_John> ... |
Elton John | sir elton hercules john cbe born reginald ken ... |
{'all': 1, 'six': 1, 'producer': 1, 'heavi ... |
tfidf |
---|
{'all': 1.6431112434912472, ... |
Note: Only the head of the SFrame is printed. This SFrame is lazily evaluated.
You can use len(sf) to force materialization.
graphlab.distances.cosine(john['tfidf'][0],victoria['tfidf'][0])
0.9567006376655429
graphlab.distances.cosine(john['tfidf'][0],paul['tfidf'][0])
0.8250310029221779
Building nearest neighbors models with different input features and setting the distance metric:
knn_model_word_count_cosine = graphlab.nearest_neighbors.create(people,features=['word_count'],distance='cosine',label='name')
PROGRESS: Starting brute force nearest neighbors model training.
knn_model_tfidf_cosine = graphlab.nearest_neighbors.create(people,features=['tfidf'],distance='cosine',label='name')
PROGRESS: Starting brute force nearest neighbors model training.
knn_model_word_count_cosine.query(john)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0 | 1 | 0.00169288 | 14.903ms |
PROGRESS: | Done | | 100 | 534.846ms |
PROGRESS: +--------------+---------+-------------+--------------+
query_label | reference_label | distance | rank |
---|---|---|---|
0 | Elton John | 2.22044604925e-16 | 1 |
0 | Cliff Richard | 0.16142415259 | 2 |
0 | Sandro Petrone | 0.16822542751 | 3 |
0 | Rod Stewart | 0.168327165587 | 4 |
0 | Malachi O'Doherty | 0.177315545979 | 5 |
knn_model.query(john)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0 | 1 | 0.00169288 | 21.318ms |
PROGRESS: | 0 | 49937 | 84.5373 | 1.02s |
PROGRESS: | Done | | 100 | 1.20s |
PROGRESS: +--------------+---------+-------------+--------------+
query_label | reference_label | distance | rank |
---|---|---|---|
0 | Elton John | 0.0 | 1 |
0 | Phil Collins | 0.76399026764 | 2 |
0 | Rod Stewart | 0.773333333333 | 3 |
0 | Annie Lennox | 0.776623376623 | 4 |
0 | Barry Gibb | 0.780952380952 | 5 |
knn_model_tfidf_cosine.query(john)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0 | 1 | 0.00169288 | 12.228ms |
PROGRESS: | Done | | 100 | 923.925ms |
PROGRESS: +--------------+---------+-------------+--------------+
query_label | reference_label | distance | rank |
---|---|---|---|
0 | Elton John | -2.22044604925e-16 | 1 |
0 | Rod Stewart | 0.717219667893 | 2 |
0 | George Michael | 0.747600998969 | 3 |
0 | Sting (musician) | 0.747671954431 | 4 |
0 | Phil Collins | 0.75119324879 | 5 |
knn_model_word_count_cosine.query(victoria)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0 | 1 | 0.00169288 | 6.18ms |
PROGRESS: | Done | | 100 | 523.584ms |
PROGRESS: +--------------+---------+-------------+--------------+
query_label | reference_label | distance | rank |
---|---|---|---|
0 | Victoria Beckham | -2.22044604925e-16 | 1 |
0 | Mary Fitzgerald (artist) | 0.207307036115 | 2 |
0 | Adrienne Corri | 0.214509782788 | 3 |
0 | Beverly Jane Fry | 0.217466468741 | 4 |
0 | Raman Mundair | 0.217695474992 | 5 |
knn_model_tfidf_cosine.query(victoria)
PROGRESS: Starting pairwise querying.
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | Query points | # Pairs | % Complete. | Elapsed Time |
PROGRESS: +--------------+---------+-------------+--------------+
PROGRESS: | 0 | 1 | 0.00169288 | 11.484ms |
PROGRESS: | Done | | 100 | 921.288ms |
PROGRESS: +--------------+---------+-------------+--------------+
query_label | reference_label | distance | rank |
---|---|---|---|
0 | Victoria Beckham | 1.11022302463e-16 | 1 |
0 | David Beckham | 0.548169610263 | 2 |
0 | Stephen Dow Beckham | 0.784986706828 | 3 |
0 | Mel B | 0.809585523409 | 4 |
0 | Caroline Rush | 0.819826422919 | 5 |