banner
You are not using a standards compliant browser. Because of this you may notice minor glitches in the rendering of this page. Please upgrade to a compliant browser for optimal viewing:
Firefox
Internet Explorer 7
Safari (Mac and PC)
Press Release
Research finds regional dialects are alive and well on Twitter
Friday, January 7, 2011


(Photo: Jakub Krechowicz/STOCK.XCHNG)

Microbloggers may think they're interacting in one big Twitterverse, but researchers at Carnegie Mellon University's School of Computer Science find that regional slang and dialects are as evident in tweets as they are in everyday conversations.

Postings on Twitter reflect some well-known regionalisms, such as Southerners' "y'all," and Pittsburghers' "yinz," and the usual regional divides in references to soda, pop and Coke. But Jacob Eisenstein, a post-doctoral fellow in CMU's Machine Learning Department, said the automated method he and his colleagues have developed for analyzing Twitter word use shows that regional dialects appear to be evolving within social media.

In northern California, something that's cool is "koo" in tweets, while in southern California, it's "coo." In many cities, something is "sumthin," but tweets in New York City favor "suttin." While many of us might complain in tweets of being "very" tired, people in northern California tend to be "hella" tired, New Yorkers "deadass" tired and Angelenos are simply tired "af."

The "af" is an acronym that, like many others on Twitter, stands for a vulgarity. LOL is a commonly used acronym for "laughing out loud," but Twitterers in Washington, D.C., seem to have an affinity for the cruder LLS.

Eisenstein said some of this usage clearly is shaped by the 140-character limit of Twitter messages, but geography's influence also is apparent. The statistical model the CMU team used to recognize regional variation in word use and topics could predict the location of a microblogger in the continental United States with a median error of about 300 miles.

Eisenstein will present the study on Jan. 8 at the Linguistic Society of America annual meeting in Pittsburgh. The paper is available online at http://people.csail.mit.edu/jacobe/papers/emnlp2010.pdf.

Studies of regional dialects traditionally have been based primarily on oral interviews, Eisenstein said, noting that written communication often is less reflective of regional influences because writing, even in blogs, tends to be formal and thus homogenized. But Twitter offers a new way of studying regional lexicon, he explained, because tweets are informal and conversational. Furthermore, people who tweet using mobile phones have the option of geotagging their messages with GPS coordinates.

For this study, Eisenstein and his co-authors — Eric P. Xing, associate professor of machine learning, Noah A. Smith, assistant professor in the Language Technologies Institute (LTI), and Brendan O'Connor, machine learning graduate student — collected a week's worth of Twitter messages in March 2010, and selected geotagged messages from Twitter users who wrote at least 20 messages. That yielded a data base of 9,500 users and 380,000 messages.

Though the researchers could pinpoint the users' locations using the geotags, they can only guess as to their profiles. Eisenstein said it's reasonable to assume that people sending lots of tweets from mobile phones are younger than the average Twitter user and the topics discussed by these users seem to reflect that.

Automated analysis of Twitter message streams offers linguists an opportunity to watch regional dialects evolve in real time. "It will be interesting to see what happens. Will 'suttin' remain a word we see primarily in New York City, or will it spread?" Eisenstein asked.

It might be a mistake to assume that the greater interconnectivity afforded by computer networks and sites such as Twitter will necessarily result in more homogeneity in language. The social circles maintained by social networks such as Twitter often are geographically focused, he noted. Also, many people use the Internet to seek out like-minded people with similar interests, rather than expose themselves to a broader range of ideas and experiences.

 

###

Carnegie Mellon University: http://www.cmu.edu



Thanks to Carnegie Mellon University for this article.

This press release was posted to serve as a topic for discussion. Please comment below. We try our best to only post press releases that are associated with peer reviewed scientific literature. Critical discussions of the research are appreciated. If you need help finding a link to the original article, please contact us on twitter or via e-mail.



This press release has been viewed 16557 time(s).

Comments
Marija

Guest Comment
Mon, Sep 26, 2011, 7:46 pm CDT

It's very interesting to see how English is evolving with Internet.  Speling and grammar in English only became standardized with the advent of the printing press, however, people nowadays are beginning to diverge from the norm.

Sarah

Guest Comment
Thu, Sep 29, 2011, 10:43 pm CDT
I wonder if there's anyone out there researching the prevalence of continued intelligence, spelling and grammar on the Internet... Then again, there might not be a big enough sample size to conduct any kind of accurate study.
Ciarán

Guest Comment
Mon, Oct 03, 2011, 6:34 pm CDT

There needs to be more people interested in dialectology especially concerning the prevelant written dialects online. I believe that, with the internet, a lot more people are writing in their own dialects and all linguists would probably agree that when a language (or dialect) becomes a written one, norms can form (which could lead to standardisation, which I am thorougly against :P) and the language can develop into its own. Being from Ireland, having to type in a standard way is somewhat frustating but the internet allows me to communicate in my dialect in ways that just weren't possible before. So I'm really excited about this study and hope more are conducted in the future.

Add Comment?
Comments are closed 2 weeks after initial post.
Friends