Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrape some sort of physical properties database #2

Open
whitequark opened this issue Sep 27, 2016 · 12 comments
Open

Scrape some sort of physical properties database #2

whitequark opened this issue Sep 27, 2016 · 12 comments
Labels
core For issues related to core Rink functionality enhancement units For issues about the unit definitions

Comments

@whitequark
Copy link
Contributor

I tried to do this today, expecting something with densities:

> 3.7 billion l * water -> ton
Conformance error: 7256921/200000, approx. 36.28460 giganewton (force) != 45359237/50000, approx. 907.1847 kilogram (mass)
Suggestions: divide left side by acceleration, multiply right side by acceleration

Then I tried to see what various substance names map to, and it's kind of a mess...

> water
Definition: water = gram force / cm^3 = 9806.65 pascal / meter (kg / m^2 s^2)
> mercury
Definition: mercury = 200.59 g / mol = 0.20059 kilogram / mole (molar_mass; kg / mol)
> milk
Definition: milk = 242 g / uscup = approx. 1022.874 kilogram / meter^3 (density; kg / m^3)
> oil
Definition: oil = 7.5 oz / uscup = approx. 898.6982 kilogram / meter^3 (density; kg / m^3)
> gasoline
Definition: gasoline_HHV = 125000 btu / usgallon = approx. 34.83953 gigapascal (pressure; kg / m s^2)
> air
Definition: air = 78.08 % nitrogen 2 + 20.95 % oxygen 2 + 9340 ppm argon + 400 ppm (carbon + oxygen 2) + 18.18 ppm neon + 5.24 ppm helium + 1.7 ppm (carbon + 4 hydrogen) + 1.14 ppm krypton + 0.55 ppm hydrogen 2 = approx. 0.02896790 kilogram / mole (molar_mass; kg / mol)
@whitequark
Copy link
Contributor Author

whitequark commented Sep 27, 2016

I'm actually not even sure at all what the heck the water unit refers to; mmH2O? If yes why doesn't mercury do the same...

@whitequark
Copy link
Contributor Author

I guess what I want is basically the totality of [insert chemical database here], accessible via f(formula | substance_name) where f = density, molar_weight, ...

@tiffany352
Copy link
Owner

Water seems to be the weight per volume of water, and it is used to define mmH2O by multiplying it with length. Confusingly, "mercury" and "Hg" actually refer to different things.

I would like to source data from a chemical database, but I haven't really been able to find one which either lets me download the data or has an API.

However, whether or not I have such a database I should still be able to resolve this problem by introducing an explicit notion of substances with multiple properties.

@whitequark
Copy link
Contributor Author

I would like to source data from a chemical database, but I haven't really been able to find one which either lets me download the data or has an API.

Summoning @bofh453

@tiffany352
Copy link
Owner

I've added substances in a branch. It looks like this:

> density of test
1200 kilogram / meter^3 (density)
> mass of ml test
1.2 gram (mass)

(test being a made up substance)

Next I have to update the definitions file to use them.

@bofh453
Copy link

bofh453 commented Sep 27, 2016

This turns out to be shockingly hard. Like, in general (especially for "engineering" properties such as the bulk/elastic/shear/Young's moduli), the only solution is scraping papers and patents (this often requires OCRing first, especially for the patents. Okay, less so now, but you still sometimes need to double-check Google Patents OCR'd the thing correctly).

That being said, there's a bunch of stopgaps. That's the good thing. The bad thing is most require at minimum registration. The two big ones to start with are http://www.chemnetbase.com/ (ChemNetBase) and http://www.chemspider.com/ (RSC ChemSpider). Both have fairly comprehensive web APIs for bulk-fetching of data, I believe the latter's is still open to anyone without login, but I'm not sure.

Edit: NIST has already nicely scraped the CRC handbook into a DB for you. Available here: https://www.nist.gov/pml/productsservices/physical-reference-data

For simple stuff, such as atoms and simple compounds, just scrape all of the CRC Handbook of Chemistry and Physics into a textfile. It's probably been done before to something structured, though you can already get something almost easily parsable just by grabbing the 2014 copy of it from libgen and running pdftotext -raw CRCHandbook.pdf CRCHandbook.txt.

This handbook, btw, is the source of most of the periodic table data you've seen anywhere, though it may have hopped through 3-15 reprints to get there. Turns out both getting and aggregating experimental data is hard.

Other useful things:

  • There's an extremely nice dataset for melting points of... basically bloody everything, courtesy Alfa Aesar. I used it recently to validate an expression for approximate melting point of a metal (i.e. Fermi Liquid) given its electrical resistivity and its bulk modulus (which turned out to be quite good... should put that somewhere). Avilable here: http://usefulchem.blogspot.ca/2011/02/alfa-aesar-melting-point-data-now.html and that blog also has other links that may be of use for other datasources. It's also an interesting read.
  • Matweb has a ton of engineering properties of materials on its site, largely scraped from various places over about a decade: http://www.matweb.com/ Sadly to do anything more than lookup 1-3 things at a time with ratelimits that make twitter's look sane, you need to purchase a subscription. A pity, since otherwise I'd say we should support it in SolveSpace given it's the major materials info source in both Autodesk Simulation and SolidWorks.

@bofh453
Copy link

bofh453 commented Sep 27, 2016

Oh, one more thing: NIST has a ton of spectral data easily available: http://webbook.nist.gov/chemistry/name-ser.html

No official API, but seeing as I can basically programmatically fetch things by hand using extremely trivial curl POST requests, and there are no ratelimits, well, yeah.

@tiffany352
Copy link
Owner

Not quite as convenient as I was hoping for, but I'll definitely having a go at obtaining the data from these sources.

@tiffany352
Copy link
Owner

I've pushed support for substances to master. The original issue should be resolved now, but I will leave this open for the second part about sourcing data.

> 3.7 billion l * water -> ton
water: volume = 3700000 meter^3; mass = approx. 4078551.8 shortton
> water
water: density = 1 gram -> 1000 millimeter^3; fusion_heat = 8352666/25000, approx. 334.1066 joule -> 1 gram; pressure_column = 98.0665 pascal -> 10 millimeter; pressure_column_0C = approx. 98.05375 pascal -> 10 millimeter; pressure_column_100C = approx. 93.98497 pascal -> 10 millimeter; pressure_column_10C = approx. 98.04002 pascal -> 10 millimeter; pressure_column_15C = approx. 97.98118 pascal -> 10 millimeter; pressure_column_18C = approx. 97.93116 pascal -> 10 millimeter; pressure_column_20C = approx. 97.89292 pascal -> 10 millimeter; pressure_column_25C = approx. 97.77916 pascal -> 10 millimeter; pressure_column_50C = approx. 96.89656 pascal -> 10 millimeter; pressure_column_5C = approx. 98.06551 pascal -> 10 millimeter; specific_heat = 4.1868 kilogray -> 1 kelvin; vaporization_heat = 1.16 kilojoule -> 1 gram
> mercury
mercury: density = 13.5951 gram -> 1000 millimeter^3; molar_mass = 200.59 gram -> 1 mole; pressure_column = approx. 1.333223 kilopascal -> 10 millimeter; pressure_column_10C = approx. 1.330840 kilopascal -> 10 millimeter; pressure_column_20C = approx. 1.328428 kilopascal -> 10 millimeter; pressure_column_23C = approx. 1.327683 kilopascal -> 10 millimeter; pressure_column_30C = approx. 1.326025 kilopascal -> 10 millimeter; pressure_column_40C = approx. 1.323632 kilopascal -> 10 millimeter; pressure_column_60F = approx. 1.329526 kilopascal -> 10 millimeter; specific_heat = 140 gray -> 1 kelvin
> milk
milk: density = 242 gram -> 473176473/2000, approx. 236588.2 millimeter^3
> oil
oil: specific_energy = 41.868 gigajoule -> 45359237/50000, approx. 907.1847 kilogram (Ton oil equivalent.  A conventional value for the energy released by burning one metric ton of oil. [18,E2] Note that energy per mass of petroleum products is fairly constant. Variations in volumetric energy density result from variations in the density (kg/m^3) of different fuels. This definition is given by the IEA/OECD.)
> gasoline
gasoline: energy_density_HHV = approx. 131.8819 megajoule -> 473176473/125, approx. 3785411.7 millimeter^3; energy_density_LHV = approx. 121.3314 megajoule -> 473176473/125, approx. 3785411.7 millimeter^3; specific_heat = 2.22 kilogray -> 1 kelvin
> air
air: Average molecular weight of air. molar_mass = approx. 28.96790 gram -> 1 mole

@tiffany352
Copy link
Owner

@bofh453 I'm having some difficulty with these sources. The NIST data only seems to have a small subset of what the CRC handbook offers - it doesn't seem to have any properties other than stuff like molar mass and ionization energy of the elements. I already have molar masses for all the elements, but the data isn't cited. CHEMnetBase wants me to login with a subscribing organization to view data, and Chemspider seems to only have predicted properties for the queries I've tried so far - are these predicted properties accurate?

Unless I'm missing something, I may have to obtain a PDF of the CRC handbook and get the data out like you said.

For reference, here's some of the properties I'm interested in (don't necessarily need or want all of them at the same time):

  • Molar mass
  • Density
  • Melting, boiling (via vaporization heat), and triple points
  • Specific heat
  • Ignition temperature
  • Specific energy (oxidation energy)
  • Surface tension as liquid
  • Electrical properties (resistance, capacitance)
  • Mechanical properties (strength, modulus, etc)

As far as what I'm interested in the properties of, I'd like to get all of the elements (possibly for more than one isotope? e.g. uranium-238 and uranium-235) as well as a number of common materials like stone, wood, glass, steel, oil, gasoline.

Does the CRC handbook even have this data? You did say engineering properties are difficult to come by, and that's pretty much exactly what I'm looking for... I'm not sure where to start with OCRing patents, but that sounds like quite a lot of manual work to extract that for 118 elements. Should I give up on getting this data for elements in general and focus on the materials I mentioned? That way the data set is small enough that I can enter it by hand.

@whitequark
Copy link
Contributor Author

@whitequark whitequark changed the title Weirdness with substance names Scrap some sort of physical properties database Feb 12, 2017
@tiffany352 tiffany352 added core For issues related to core Rink functionality enhancement units For issues about the unit definitions labels Aug 7, 2020
@JasperWallace
Copy link

Dwarf Fortress has been slowly building a list of material properties with help from the players on the forums, I'm not sure about the license on that collection tho:

https://dwarffortresswiki.org/index.php/DF2014:Material_definition_token

@whitequark whitequark changed the title Scrap some sort of physical properties database Scrape some sort of physical properties database Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core For issues related to core Rink functionality enhancement units For issues about the unit definitions
Projects
None yet
Development

No branches or pull requests

4 participants