Text Comparison - word frequencies calculation URGENT

Folyamatban Kiadva: 7 évvel ezelőtt Kiszállításkor fizetve
Folyamatban Kiszállításkor fizetve

Need it within a week...

Need to implent tool described in CollGram Profile (Bestgen & Granger 2014) [login to view URL] These measures are the average t-score and MI score for all bigrams in the student’s text, calculated based on a reference text corpus.

I have a set of 300 essays written by students at different levels, and I would like to calculate the Collgram profiles for them, based on the COCA corpus here example [login to view URL] alredy got it. The lists of bigrams and word frequencies are of course available on the website, so calculating the MI and t-score for each bigram will not be difficult, in concept.

For each pair of words (bigrams), two collocability ratios (MI and t) should be calculated, based on the frequency of constituent words. A calculation based on formulas is mathematically simple. I am attaching it in *.docx

• I am at the disposal of around 300 texts with lengths of 20-600 words written by English students (the orthography has been corrected). From each text, all of the bigrams must be extracted (punctuation symbols are the threshold of bigrams).

• The extracted bigrams must be found in the reference list (COCA corpus), which is discussed in point (1). If they are found, their two collocability ratios must be checked.

• For each text, four lists must be produced: a list of the found n-grams (of specimens and types), along with two collocability ratios, and a list of bigrams that were not found (of specimens and types).

• For each text, the following values must be produced: the average of two ratios – for the specimens and for the types separately, as well as a percentage of bigrams that have not been found in the general number of bigrams (for the specimens and types).

• The last operation should produce a nice table for the batch of text.

If really necessary I can proviede [login to view URL] license for POS tagging.

I got samples of output for an analysed file:

1 2 3 4 5 6 7 8

freq_text freq_COCA mean freq_COCA MI MI>3 t t>2,54

bigram1

bigram2

bigram3

bigram4

bigram5

Ö.

col 1 - a list of all bi-grams retrieved from a learner text (without punctuation marks)

col 2 - frequenecy of the bigram in a learner text

col 3 - frequency of the bigram in COCA. If blank, the bigram does not occur in COCA.

col 4 - mean frequency of the bigram in COCA per 1 million. If blank, the bigram does not occur in COCA.

col 5 - MI for the bigram calcualted based on COCA. If blank, the bigram does not occur in COCA.

col 6 - "*" if MI>3

col 7 - t for the bigram calcualted based on COCA. If blank, the bigram does not occur in COCA.

col 8 - "*" if t>2

Also input files may have <> tags that should be removed.

I also want to be able to load multiple files and if I load more than one to program and also get for each file analisis as above plus cummulative results as:

This only beginning I hope freelancer doing this will be willing to continue develop this tool as next projects.

C# programozás C++ Programozás Java Perl Python

Projektazonosító: #10141081

A projektről

8 ajánlat Távolról teljesíthető projekt Utoljára aktív: 7 évvel ezelőtt

Odaítélve:

Vlzinch

Hello! i reviewed Quantifying the development of phraseological competence methodology and i want offer you python script what will calculate MI and t-score . So you will just need put all documents to folder, scr Továbbiak

$631 USD 10 napon belül
(18 értékelés)
7.1

8 szabadúszó tett átlagosan 480$ összegű árajánlatot erre a munkára

hbxfnzwpf

I am very proficient in c and c++. I have 16 years c++ developing experience now, and have worked for more than 6 years. My work is online game developing, and mainly focus on server side, using c++ under linux environ Továbbiak

$250 USD 7 napon belül
(120 vélemény)
6.9
SuiGenSolutions

Hi, We have a team of Data Mining and Web Scraping experts. We have worked on many Data Mining techniques including Association Rule Mining, Clustering, Outlier Mining, Sentiment Analysis etc extensively in the pas Továbbiak

$526 USD 5 napon belül
(76 vélemény)
6.4
ZhangDaLong

Hi, client. I am a C++ programmer and mathematician. Please check my Profile/RecordList and tell me details. Looking forward to your response. Thanks.

$1000 USD 7 napon belül
(26 vélemény)
4.5
antonparfeniuk

A proposal has not yet been provided

$350 USD 7 napon belül
(11 vélemény)
4.2
surbhibohra

Hi, I have over 5 years of experience in Java development here is my detailed plan. 1. The application will be GUI Application 2. You will have the facility to select an individual file or a group of files or a Továbbiak

$500 USD 10 napon belül
(3 vélemény)
2.8
evgn

hi there! I'm software engineer, having skills in python .. as I can see, you need strong parse instrument to work with text files. I advice you to make gui program for that. and I can help you with it in few days or l Továbbiak

$250 USD 4 napon belül
(8 vélemény)
2.1