- 业务上需要处理从数据库中导出的Excel,所以选择开源的Apache POI处理
- questionnaire_simple文件简要记录了原数据和处理后的数据的格式和字段,在此解释需要处理的核心字段,其余字段均可忽略。
- questionnaire_complete文件包含了全部的问卷调查样本。
field | meaning |
---|---|
orderNum | 用户提交的问卷编号 |
titleKey | 问题序号 |
titleName | 问题 |
answer | 答案序号 |
- 原数据中用户提交问卷的编号对应多条记录,回答每一个问题均对应一条记录,需要将多条记录合并成一条。合并后的表格头由原数据表格基本信息和问题组成。例如:
原数据
orderNum | submitTime | titleKey | titleName | answer | other |
---|---|---|---|---|---|
79 | 9/26/2018 18:26:32 | 1 | 1.资产规模大致范围? | 3 | …… |
79 | 9/26/2018 18:26:32 | 2 | 2.理财规模大致范围? | 2 | …… |
79 | 9/26/2018 18:26:32 | 3 | 3.面临调整压力是? | 4 | …… |
合并后
序号 | 提交时间 | 1.资产规模大致范围? | 2.理财规模大致范围? | 3.面临调整压力是? | other |
---|---|---|---|---|---|
1 | 9/26/2018 18:26:32 | 3 | 2 | 4 | …… |
其中,第三列,第四列,第五列分别对应问题的答案序号。
-
每个用户回答的问题可能并不是所有的问题,因为问卷可能会根据用户回答的不同进而跳跃到不同的选项中,例如一共10题问卷调查,用户一回答了1,3,5,7,8,10题,但用户二可能回答了1,2,4,5,9,10题,也有可能用户回答了所有问题。
-
问卷可能会有多选。
项目提供了 ExcelPoi.jar
- JRE1.8或JDK1.8或以上
java -jar ExcelPoi.jar fileUrl saveUrl saveFileName
- fileUrl表示原数据路径,例如/home/root/question.xlsx或C:\question.xlsx
- saveUrl表示数据处理后保存的路径,例如/home/root/或C:\
- saveFileName表示数据处理后最终保存的文件名,可选,如果不填写,则默认处理后的文件名为"原文件名_处理后.xlsx"
- How to create a new workbook
- How to create a sheet
- How to create cells
- Iterate over rows and cells
- Getting the cell contents
- Reading and writing
- 添加poi依赖,支持xls格式
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.17</version>
</dependency>
- 添加poi-ooxml依赖,支持xlsx格式
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.17</version>
</dependency>
- 如果出现
java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlObject
异常,仍需要添加xmlbeans依赖
<!-- https://mvnrepository.com/artifact/org.apache.xmlbeans/xmlbeans-->
<dependency>
<groupId>org.apache.xmlbeans</groupId>
<artifactId>xmlbeans</artifactId>
<version>3.0.2</version>
</dependency>
如果原文件是.xls格式,则使用HSSFWorkbook创建Workbook,如果是.xlsx格式,则用XSSFWorkbook创建。
- Workbook:Excel工作簿
- Sheet:工作表
- Row:行
- Cell:单元格
如下图所示:
- Since we need to process Excel exported from database, we choose open source Apache POI for processing.
- questionnaire_simple file briefly records the format and fields of original data and processed data. The core fields that need to be processed are explained here, and the remaining fields can be ignored.
- questionnaire_complete file contains a full sample of questionnaires.
field | meaning |
---|---|
orderNum | questionnaire number submitted by user |
titleKey | question number |
titleName | question |
answer | answer number |
- In the original data, the number of the questionnaire submitted by user corresponds to multiple records. Each question corresponds to one record, and multiple records need to be merged into one. The merged table header consists of basic data and problems of the original data table. E.g:
original data
orderNum | submitTime | titleKey | titleName | answer | other |
---|---|---|---|---|---|
79 | 9/26/2018 18:26:32 | 1 | 1.question1? | 3 | …… |
79 | 9/26/2018 18:26:32 | 2 | 2.question2? | 2 | …… |
79 | 9/26/2018 18:26:32 | 3 | 3.question3? | 4 | …… |
after merging
number | submitTime | 1.question1? | 2.question2? | 3.question3? | other |
---|---|---|---|---|---|
1 | 9/26/2018 18:26:32 | 3 | 2 | 4 | …… |
Among them, the third column, the fourth column, and the fifth column respectively correspond to the answers to the question.
-
The questions answered by each user may not be all the questions, because the questionnaire may jump to different options according to the user's answer, for example, a total of 10 questions questionnaire, the first user answers 1, 3, 5, 7, 8 , 10 questions, but the second user may have answered 1, 2, 4, 5, 9, 10 questions, and it is also possible that the user answers all questions.
-
There may be multiple choices in the questionnaire。
ExcelPoi project provides ExcelPoi.jar
- JRE1.8 or JDK1.8 or above
java -jar ExcelPoi.jar fileUrl saveUrl saveFileName
- fileUrl represents the original data path, such as /home/root/question.xlsx or C:\question.xlsx
- saveUrl represents the path saved after data processing, such as /home/root/ or C:\
- saveFileName represents file name that is finally saved. Optional.
- How to create a new workbook
- How to create a sheet
- How to create cells
- Iterate over rows and cells
- Getting the cell contents
- Reading and writing
- Add poi dependencies to support.xls format
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>3.17</version>
</dependency>
- Add poi-ooxml dependency to support .xlsx format
<!-- https://mvnrepository.com/artifact/org.apache.poi/poi-ooxml -->
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>3.17</version>
</dependency>
- If the exception of
java.lang.NoClassDefFoundError: org/apache/xmlbeans/XmlObject
occurs, you still need to add the xmlbeans dependency.
<!-- https://mvnrepository.com/artifact/org.apache.xmlbeans/xmlbeans-->
<dependency>
<groupId>org.apache.xmlbeans</groupId>
<artifactId>xmlbeans</artifactId>
<version>3.0.2</version>
</dependency>
If origin file is an .xls format, use the HSSFWorkbook to create a Workbook, and if it is an .xlsx format, create it with XSSFWorkbook.
- Workbook:Excel workbook
- Sheet:Excel sheet
- Row:Excel row
- Cell:Excel cell
As shown below: