A large part of contemporary big data is free textual, wherefore in its analysis ever more is being talked about the application of natural language processing (NLP). The spheres rich by free texts include, for example, medicine where the doctors’ work creates a large number of descriptions of the patients’ treatment and welfare.

Using of NLP technologies in the analysis of free textual data is rather widespread in the world. Using of the existing solutions is often hampered, because the developed solutions are mostly language specific, regional lexical resources are missing (dictionaries, thesauruses), which could help to  analyse the data, or the developed means do not correspond sufficiently to the data volumes in order to effectively use them in big data analysis.

In order to eliminate the aforementioned deficiencies, we created TEXTA Toolkit, which enables to extract from the textual body the specific terminology typical to the represented field, based on the latter to create concept-based terminological resources, to identify textual fragments  referring to concepts in text documents and to visualise the results according to data files found in data system.  TEXTA toolkit is not restricted to any field, therefore it could be used for processing data systems in various (meta)languages. The developed software adapts well also to data volumes: its operational ability enables to analyse millions of text documents in real time.



Software scientists in STACC in cooperation with Microsoft/Skype have developed the methods to calculate the shortest path of giant graphs.


STACC helped to create Demograft platform that enables to gather and analyse data-intensive passive positioning events dealt by the mobile operators’ systems.


Based on the results of STACC research project, Plumbr developed a unique algorithm for JAVA applications to automatically detect performance problems.


STACC created an anonymisation tool that enables to anonymise various documents.