ISLE7_program_satelite event
Satelite Event
Australian Text Analytics Platform: New tools for text analysis
The Australian Text Analytics Platform (ATAP) is an open source environment that provides researchers with tools and training for analysing, processing, and exploring text. The main means of delivery for ATAP is Jupyter notebooks and this half-day workshop will begin with a brief introduction to notebooks for participants not already familiar with the technology. The main body of the workshop will be hands-on sessions introducing two tools made available by ATAP, and the session will end with a short summary of other tools being developed in recent work.
Discursis is a tool for tracking topics in linguistic interaction (Angus, Smith, and Wiles 2012; Angus and Wiles 2018). It was originally made available as part of a commercial package, but ATAP (in association with our partner the Sydney Informatics Hub [SIH]) has re-engineered the tool as an open source package, accessible as a notebook. This part of the workshop will introduce the analytic ideas which underlie Discursis, demonstrate the functionality of the tool, including visualisation possibilities, and allow participants to gain hands-on experience in using it.
The QuotationTool has also been developed in association with SIH based on previous work by Canadian researchers (Asr et al. 2021). This tool can be used to extract quotes from a text. In addition to extracting the quotes, the tool also provides information about who the speakers are, the location of the quotes (and the speakers) within the text, and the identified named entities, all of which can contribute to text analysis. Results of the analysis are stored as a dataframe which can be downloaded and can also be viewed on screen with elements of interest highlighted (using the displaCy library). Again, this part of the workshop will introduce the analytic basis of the tool, demonstrate it, and allow participants to work through the notebook.
The last section of the workshop will consist of a brief overview of the work of the project in the first half of 2023. Development of tools is ongoing; it is hard to predict what state will be reached for specific tools this far in advance, but this section will give participants an idea of what will be available by June 2023, or will be available soon after that date. This will be done by description and demonstration.
Organisers
- Simon Musgrave (Contact person: s.musgrave@uq.edu.au)
- Ben Foley
- Sam Hames
References
Angus, Daniel, Andrew E. Smith, and Janet Wiles. 2012. “Human Communication as Coupled Time Series: Quantifying Multi-Participant Recurrence.” IEEE Transactions on Audio, Speech, and Language Processing 20 (6): 1795–1807. https://doi.org/10.1109/TASL.2012.2189566.
Angus, Daniel, and Janet Wiles. 2018. “Social Semantic Networks: Measuring Topic Management in Discourse Using a Pyramid of Conceptual Recurrence Metrics.” Chaos: An Interdisciplinary Journal of Nonlinear Science 28 (8): 085723. https://doi.org/10.1063/1.5024809.
Asr, Fatemeh Torabi, Mohammad Mazraeh, Alexandre Lopes, Vasundhara Gautam, Junette Gonzales, Prashanth Rao, and Maite Taboada. 2021. “The Gender Gap Tracker: Using Natural Language Processing to Measure Gender Bias in Media.” PLOS ONE 16 (1): e0245533. https://doi.org/10.1371/journal.pone.0245533.