No edit summary |
No edit summary |
||
(One intermediate revision by the same user not shown) | |||
Line 6: | Line 6: | ||
''"The limits of my language are the limits of my world." (Ludwig Wittgenstein)'' | ''"The limits of my language are the limits of my world." (Ludwig Wittgenstein)'' | ||
Idiomaggio is a twitter bot that detects the language of tweets and responds automatically in the right tongue. In a multilingual world there can be many different languages surround you. With the help of Idiomaggio you can figure out which language is spoken | Idiomaggio is a twitter bot that detects the language of tweets and responds automatically in the right tongue. In a multilingual world there can be many different languages surround you. With the help of Idiomaggio you can figure out which language is spoken. | ||
The twitter bot is working with the NLTK library; mainly with the built-in function ''stop words''. Linking words, conjunctions, articles and pronouns are words that make a language speakable. However, they don't create the meaning of a text. Idiomaggio filters the stop words out and detects their language. In a second step, the tweet will be responded in the correct language. | The twitter bot is working with the NLTK library; mainly with the built-in function ''stop words''. Linking words, conjunctions, articles and pronouns are words that make a language speakable. However, they don't create the meaning of a text. Idiomaggio filters the stop words out and detects their language. In a second step, the tweet will be responded in the correct language. | ||
Idiamaggio understands the European languages Swedish, Danish, Hungarian, Finnish, Portugese, German, Dutch, French, Spanish, Norwegian, English, Russian, Turkish and Italian. | Idiamaggio understands the European languages Swedish, Danish, Hungarian, Finnish, Portugese, German, Dutch, French, Spanish, Norwegian, English, Russian, Turkish and Italian. It responds with a single phrase. A further possibility to create more sophisticated responds is the usage of the AIML library. AIML is a markup language which creates phrases and meaning. However, AIML is not available in all of the 14 languages which the NLTK library uses. Therefore I decided to create a Twitter Bot, which doesn't pretend to be human, but responds correctly. | ||
'''[https://twitter.com/idiomaggio Find Idiomaggio on Twitter]''' | '''[https://twitter.com/idiomaggio Find Idiomaggio on Twitter]''' | ||
Line 203: | Line 203: | ||
[[File:myrobot.png|100px100px|thumb|left]] | [[File:myrobot.png|100px100px|thumb|left]] | ||
The Grammar Bot creates some random sentences. | The Grammar Bot creates some random sentences. It is a first step towards a "Language Bot". | ||
Grab the code at pastebin: [http://pastebin.com/GvgyfDUU Grammar Bot] | Grab the code at pastebin: [http://pastebin.com/GvgyfDUU Grammar Bot] |
Latest revision as of 18:08, 18 October 2015
Idiomaggio - a language detection bot
"The limits of my language are the limits of my world." (Ludwig Wittgenstein)
Idiomaggio is a twitter bot that detects the language of tweets and responds automatically in the right tongue. In a multilingual world there can be many different languages surround you. With the help of Idiomaggio you can figure out which language is spoken.
The twitter bot is working with the NLTK library; mainly with the built-in function stop words. Linking words, conjunctions, articles and pronouns are words that make a language speakable. However, they don't create the meaning of a text. Idiomaggio filters the stop words out and detects their language. In a second step, the tweet will be responded in the correct language.
Idiamaggio understands the European languages Swedish, Danish, Hungarian, Finnish, Portugese, German, Dutch, French, Spanish, Norwegian, English, Russian, Turkish and Italian. It responds with a single phrase. A further possibility to create more sophisticated responds is the usage of the AIML library. AIML is a markup language which creates phrases and meaning. However, AIML is not available in all of the 14 languages which the NLTK library uses. Therefore I decided to create a Twitter Bot, which doesn't pretend to be human, but responds correctly.
The Code
#!/usr/bin/env python2
# -*- coding: utf-8 -*- #
from twitterbot import TwitterBot
import keys
import nltk
from nltk import wordpunct_tokenize
from nltk.corpus import stopwords
class Idiomaggio(TwitterBot):
def bot_init(self):
"""
Use your own consumer key to make the bot alive.
"""
############################
# REQUIRED: LOGIN DETAILS! #
############################
self.config['api_key'] = keys.consumer_key
self.config['api_secret'] = keys.consumer_secret
self.config['access_key'] = keys.access_token
self.config['access_secret'] = keys.access_token_secret
######################################
# SEMI-OPTIONAL: OTHER CONFIG STUFF! #
######################################
# how often to tweet, in seconds
self.config['tweet_interval'] = 1 * 10 # default: 30 minutes
# use this to define a (min, max) random range of how often to tweet
# e.g., self.config['tweet_interval_range'] = (5*60, 10*60) # tweets every 5-10 minutes
self.config['tweet_interval_range'] = None
# only reply to tweets that specifically mention the bot
self.config['reply_direct_mention_only'] = True
# only include bot followers (and original tweeter) in @-replies
self.config['reply_followers_only'] = True
# fav any tweets that mention this bot?
self.config['autofav_mentions'] = False
# fav any tweets containing these keywords?
self.config['autofav_keywords'] = []
# follow back all followers?
self.config['autofollow'] = False
###########################################
# CUSTOM: your bot's own state variables! #
###########################################
def on_scheduled_tweet(self):
pass
def on_mention(self, tweet, prefix):
text = tweet.text
print(text)
print(type(text))
percentage = {}
tokens = wordpunct_tokenize(text)
words = []
for word in tokens:
words.append(word.lower())
for language in stopwords.fileids():
stopwords_set = set(stopwords.words(language))
words_set = set(words)
most_common = words_set.intersection(stopwords_set)
percentage[language] = len(most_common)
most_probable = max(percentage, key=percentage.get)
if most_probable == "danish":
response = u'Hej! Taler du dansk?'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "dutch":
response = u'Hi! Groeten uit Holland.'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "english":
response = u'Hey! I speak some English.'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "finnish":
response = u'Hei! Terveisiä Suomi.'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "french":
response = u'Salut! Parlez-vous français?'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "german":
response = u'Hey! Sprichst du deutsch?'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "hungarian":
response = u'Hello! Beszélsz magyarul?'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "italian":
response = u'Ciao! Saluti da Italia.'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "norwegian":
response = u'Hei! Jeg snakker norsk.'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "portuguese":
response = u'Olá! Você fala português?'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "russian":
response = u'Привет! Привет из России.'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "spanish":
response = u'¡Hola! Saludos desde España.'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "swedish":
response = u'Hej! Talar du svenska?'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
if most_probable == "turkish":
response = u'Merhaba! Türkçe biliyor musun?'
prefixed = prefix + ' ' + response
self.post_tweet(prefixed, reply_to=tweet)
# print(percentage)
# print(stopwords_set)
# print(words_set)
print("The language of your text is %s" % most_probable)
def on_timeline(self, tweet, prefix):
pass
if __name__ == '__main__':
bot = Idiomaggio()
bot.run()
Myrobot and Grammar Bot
See my first steps with Python and Processing:
The Grammar Bot creates some random sentences. It is a first step towards a "Language Bot".
Grab the code at pastebin: Grammar Bot
You can find some more interesting bots on GitHub - all created by Martin Schneider.
Thanks for visiting my website. My name is Christopher Marx and I am studying Media Studies at Bauhaus University Weimar.