Infographic for @naturenews Published at: http://www.nature.com/news/the-top-100-papers-1.16224#/interactive
- Get lifecylce data for Radloff, L. S., Rank 52
- Remove link and add DOI if necessary?
- Add comments to javascript comments
- Add data link for status message
Download the Google spreadsheet as Microsoft Excel file (.xlsx).
In Excel remove all formatting.
Create a new sheet to contain the amended data.
Create links to authors, title, journal, volume, page, pub-year, doi, hyperlink and discipline cells.
Convert the volume, page, pub-year values to text as opposed to numbers.
=TEXT(Sheet1!E2,"#")
In the first column, lead_author, combine the lead authors first name with the pub-year to allow cross referencing with the lifecycle data file names ie. "Lowry_1951".
=CONCATENATE(PROPER(LEFT(Sheet1!B2,SEARCH(",",Sheet1!B2)-1)), "_", (Sheet1!G2))
The world is not a perfect place so you'll have to amend some by hand.
Replace all em dashes with —
and en dashes –
Replace minus signs in page ranges with en dashes –
Replace quotation marks:
"Mini-mental state": A practical method for grading cognitive state of patients for clinician.
"
Mini-mental state"
: A practical method for grading cognitive state of patients for clinician.
ö with ö
’ with ’
ä with ä
ü with ü
Add commas:
"Sheldrick_1990" needs a comma
"Lee_1988" needs a comma
"Sanger_1977" needs a comma
"Kirkpatrick_1983" needs a comma.
** Remove hidden characters **
Radloff, L. S. remove hidden character
Amend
Köhler_1975 = Kohler_1975
O’Regan_1991 = Oregan_1991
Yanisch-Perron_1985 = Yanischperron_1985
Brünger_1998 = Brunger_1998
Böyum_1968 = Boyum_1968
O’Farrell_1975 = OFarrell_1975
Altschul_1997 = Atlschul_1997
Altschul_1990 = Atlschul_1990
Scatchard_1949 = Scratchard_1949
Folstein_1975 = Fulstein_1975
Kaplan_1958 = Kaplan_Meier_1958
Blöchl_1994 = Blochl_1994
Bimboim_1979 = Birnboim_1979
Friedewald_1972 = Friedewa_1972
The lower of the two Kresse_1996 = Kresse_1996_2
Finally save out as a csv file.
First use Automator app to batch convert from xls to xlsx
Then use xlsx to csv converter github.com/dilshod/xlsx2csv
Installation:
sudo easy_install xlsx2csv
To convert multiple files:
for f in *xlsx;
do xlsx2csv "$f" "${f%.xlsx}.csv";
done
Then remove all those pesky excel files:
rm *.xlsx
Remove the opening 'N_' from the filenames:
for file in *;
do mv "${file}" "${file/N_/}";
done
Remove '_Annual_Citation_Counts' from the filenames:
for file in *;
do mv "${file}" "${file/_Annual_Citation_Counts/}";
done
Remove leading spaces before numbers:
for file in *;
do
cat "${file}" | tr -d "[:blank:]" > temp && mv temp "${file}";
done
Append file name, excluding file extension as a value for the first column into every line of the file
for file in *;
do
blap="${file%.csv},";
sed -e "s/^/$blap/" $file > temp && mv temp "${file}";
done
Combine all csvs into one new file to rule them all:
cat * > combined.csv
Add a header row of 'lead_author, year, cites' to this new file
echo 'lead_author,year,cites' | cat - combined.csv > temp && mv temp combined.csv;
Find and replace all instances of "Total": to "total". and "total:" to "total".
Use this tool to www.convertcsv.com/csv-to-json.htm convert your excel files to arrays of javascript objects.
Get lead author first name (presuming it is the first word and followed by a comma):
=PROPER(LEFT(Sheet1!A2,SEARCH(",",Sheet1!A2)-1))
Capitalise the first letter of each work in author list:
=PROPER(Sheet1!A2)
Capitalise the first letter the title
=UPPER(LEFT(Sheet1!B2,1))&LOWER(RIGHT(Sheet1!B2,LEN(Sheet1!B2)-1))