Corpus tools developed by BFSUCRG members (北外语料库团队开发的语料库工具)

GUI tools created using ChatGPT and Python
* Please configure your antivirus to trust these newly developed tools. If execution is not allowed, use the right-click menu option to grant administrator privileges. 请配置您的杀毒软件,使之信任以下新开发的工具。若系统弹窗,显示无法运行软件,请在程序文件上,点击鼠标右键,选择“以管理员身份运行”即可。

- BFSU Bilingual Alignment Keeper beta : adds line-end markers to keep sentence alignment after POS tagging.
- BFSU Chinese Lexical Complexity Analyzer 1.2 Beta : calculates the number of Chinese characters, word tokens, word types, sentence count, average sentence length (ASL), average word length (AWL), type-token ratio (TTR), and standardized type-token ratio (STTR).
- BFSU Detagger 2 : an updated version of the DeTagging Tool, originally created by Yunlong Jia. Detagger strips off tags in four different formats: underscore, forward slash, angle brackets, and square brackets from annotated texts.
- BFSU EdgeDetector : applies edge detection to images and saves the results.
- BFSU HanLP Chinese Tokenizer and POS Tagger 1.0 : performs word segmentation and part-of-speech tagging on Chinese texts.
-
BFSU Inter-rater Agreement Gauge : compute inter-rater agreement measures among two or more raters.
- BFSU Knowledge GraphGenerator V1 : creates knowledge graphs based on written texts.
- BFSU Logistic Regression Tool : performs logistic regression on user-provided datasets.
- BFSU Log-likelihood_Calculator_with_ES : compares the frequency counts across two corpora.
- BFSU One Hot Encoder : converts categorical values into binary data for logistic regression.
- BFSU Precision Recall Evaluator : computes performance metrics for binary classification models.
- BFSU P atCount 3.3 : counts linguistic features and outputs matrices for statistical analysis.
- BFSU Readability Analyzer 3 : measures lexical complexity and readability indices of written texts.
- BFSU Spoken Utterances Extractor : extracts spoken utterances from literary works.
- BFSU TAG : enables batch text annotation via API using custom prompts.
- BFSU Texcel : extracts text data from Excel files and saves as individual text files.
- BFSU Text Merger : merges the selected files into one text file.
- BFSU Text Randomizer 2 : extracts random samples from an uploaded text or multiple texts within a folder.
- BFSU Text Segmenter : splits text files into smaller individual files.
- BFSU TextTile Annotator : identifies topic boundaries in text and generate a visualization.
- BFSU XML2TXT Converter : transforms (BNC 2014 spoken) XML files into *.txt files.
- RTF2TXT Converter : batch converts *.rtf documents to *.txt files. The tool was developed by Wanbo Ren from the School of Foreign Languages at Northwest University, China.

使用大模型开发软件的操作视频Tutorial video on developing software using large language models

Please download the tutorial document for personal practice. NB: The installation of the Python installer is required.

GitHub BFSUNLP Page : https://github.com/bfsunlp

R and Python scripts and tutorials

Concordancers and query tools ( 语料库检索工具 )

  • - BFSU PowerConc 1.0 beta25b . PowerConc video tutorial 操作视频 。网友自制操作视频

  • - BFSU CQPweb online concordancer (download CQPweb tutorial here . 请下载CQPweb简明图文使用手册 ).
    CQP syntax高级检索使用说明

  • - BFSU ParaConc 1.2.1: A freeware parallel concordancer

  • - Colligator 2.0 : A colligation query and analysis tool (1.4MB)

  • - SearchSubtitle : A programme for video-based time-aligned subtitle concordancing (Chinese user interface). The tool was designed by Wenzhong Li and programmed by Zhaoyang Han (533KB).

  • - PatCount 1.0 : PatCount is the abbreviated form of 'pattern counting'. It is a query tool of counting the frequency of lexical, syntactic, and discoursal features in texts. The result of the tool is shown and can be exported as 'feature(s) x text(s)' matrices, which is most suitable for follow-up advanced (inferential) statistical analyses. Regular expressions are fully supported in the tool. Microsoft .Net framework is required before you run the tool. The tool was designed by Maocheng Liang and Wenxin Xiong and programmed by Wenxin Xiong (3.6MB).

  • - PatCount 2.0 (wxPatCount) : An updated version of PatCount, written by Professor Maocheng Liang.

  • - BFSU ConcGram Lite : This is straightforward and easy-to-use tool for retrieving contiguous and non-contiguous bigrams with directonal variations based on the search of two target words.

Annotation tools ( 语料库标注工具 )

Statistical tools for corpus analysis ( 语料库统计工具 )

Specialised corpus tools ( 语料库分析专用工具 )

Data driven learning tools and resources ( 数据驱动学习工具 )

  • - BFSU Sentence Collector is a pedagogically motivated concordancing tool which allows users to refine search results according to sentence length and lexical difficulty. The results of the tool are displayed in complete sentences instead of the KWIC mode. To customise your own textual data for text collection. Please first of all segment the English texts on your own hard drive with BFSU Sentence Segmenter 1.0 , and then mark up the unknown/new words based on a base word list with BFSU NewWords Marker 1.0 and save the data as an *.idx file into the index folder of BFSU Sentence Collector.

Useful tools and resources that were not developed by BFSU FLERIC members