A semiotic approach to data mining on social media

It is seldom that researchers dedicated to social media analysis adopt, on a theoretical or methodological level, a linguistic/semiotic approach to identify and select the subject of such analyses on a large scale (into the realms of big data) — whether it’s images, videos, tweets or Facebook posts. As several publications from established institutions and authors demonstrate (e.g. Scott 2013; Pozzi et al 2016; Daas & Puts 2014; Majó-Vázquez et al 2017), it is usual that, on articles which use social media publications to analyze political, economic or cultural events, the techniques applied to process and categorize the data gathered from online public platforms rely mostly on mathematical and algorithmic patterns. For this purpose, they get only posts filtered by hashtags or few isolated keywords arbitrarily chosen to proceed with the research, even when the topic studied is wide-ranging or abstract —public security, education, corruption, religion or politics, as opposed to a concrete or restrictive one (a specific event, a brand, a name, an online campaign). In regards to this issue, we propose here a French Semiotics-oriented procedure to classify the textual corpus selected from social media sources. This procedure comprises a methodology for establishing the thematic layers of the intended subject of study and a subset of operations applied to elaborate search rules (stringlines) which can filter the correspondent data provided by users. Thus, taking into consideration the semiotic understanding that natural languages consist of cultural, social and collective manifestations of concepts and meanings (with words and expressions being graphic representations of verbal plans of expression, which recover semantic concepts). In order to structure a stringline capable of gathering the different posts and opinions about any given topic, it is necessary to identify the isotopes — and discursive figures — related to the overall subject. To do so, we proceed to retrieve the main theoretical standpoints established by authors such as Greimas (2011, 2012), Hjelmslev (1978), Saussure (2011) and Fontanille (2015) regarding the semiotic object and the textual and social construction of meaning effects, on a conceptual framework. Besides that, we introduce the procedures for building a linguistic oriented stringline based on the employment of syntactic boolean operators to filter the maximum volume (quantitatively and qualitatively) of posts accurately pertaining to the theme intended for study. For this, it is essential as well to understand the timeframe of the search; the specific vocabulary used on social media (contractions and geographic, cultural or age-related variations); polysemic occurrences; commonly used ironic or idiomatic expressions; and the social and political contexts associated with the thematic corpus chosen — authorities, influencers, institutions, brands, media-covered events. As a case study to demonstrate the methodology proposed, we present the elaboration of a stringline dedicated to monitor, in real time, the debate in Brazilian Portuguese — during the month of September, 2018 — about Jair Bolsonaro, then a Presidential candidate in the Brazilian general election. The stringline obeys to the boolean operators and the syntax available at the GNIP API subscription, Twitter’s official platform for gathering and processing data from its platform.
País: 
Brasil
Temas y ejes de trabajo: 
Semióticas de los discursos doxológicos (político, religioso, periodístico)
Semiótica de las mediatizaciones
Institución: 
Fundação Getulio Vargas
Mail: 
lucas.calil@fgv.br

Estado del abstract

Estado del abstract: 
Accepted
Desarrollado por gcoop.